Running on many samples
To run the pipeline on a few samples if relatively straightforward, but when we need to run it on 100s of samples it can get a bit unwieldy to type every command out. For this purpose it is a good idea to write a wrapper script to handle the running.
As an example let's assume that you have the following files:
First we will need to create a list of the samples prefixes that you would like to run. To do this we can run the following command.
This will put all the file prefixes into a file called samples.txt. We can then use parallel to run our tb-profiler command for each sample in the file like this. Before we run tb-profiler we should make the folders where it will store the bam vcf and result files. We have to do this because ottherwise the multiple instances of tb-profiler run by parallel will all try to create the same folders at the same time and you will run into an error.
Now we are ready to run tb-profiler in parallel.
You can adjust the -j
parameter to allow for more jobs to run in parallel. I have set this to 2 but if you have a HPC or powerful computer you can increase this.
Last updated