The goal of Parabricks software is to get the highest performance for bioinformatics and genomic analysis. There are a few key system options that a user can tune to achieve maximum performance.
Parabricks software operates with two kinds of files:
Input/Output files specified by user
Temporary files created during execution and deleted at the end
Best performance is achieved when both kind of files mentioned above are on a fast local SSD. However, it is possible that the Input/Output files are placed on a fast network storage. But it is highly recommended that for tools and pipelines that use temporary files, a fast local storage such as SSD is used.
Users can specify the
--tmp-dir option to specify where the temporary files will be stored.
DGX comes with an SSD mounted generally on
/raid. Please use that disk and use a directory on that disk as
--tmp-dir . For initial testing you can even copy the Input files to this disk to eliminate variability in performance.
Please connect a Persistent SSD of 2048 GB to your instance to see the best performance. You will be given an option to connect an SSD from the Google Cloud Marketplace solution page.
You can choose the number of GPUs to run using the commandline option
--num-gpus for certain tools and pipelines. To select specific GPUs, please also set the environment variable
$ NVIDIA_VISIBLE_DEVICES="0,1" pbrun fq2bam --num-gpus 2 --ref Ref.fa --in-fq S1_1.fastq.gz --in-fq S1_2.fastq.gz