deepvariant

Run GPU-accelerated deepvariant algorithm

Parabricks has accelerated Google Deepvariant to extensively use GPUs and finish 30x WGS analysis in 25 minutes. The Parabricks flavor of Deepvariant is more like other commandline tools that users are familiar with. It takes the BAM and reference as inputs and produces variants as outputs. In the next versions, we will allow users to choose the exact model to use.

Quickstart

CLI
$ pbrun deepvariant --ref Homo_sapiens_assembly38.fasta \
--in-bam sample.bam \
--out-variants output.vcf

Compatible Google Deepvariant commands

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

# Run make_examples in parallel
seq 0 $((N_SHARDS-1)) | \
parallel --eta --halt 2 --joblog "${LOGDIR}/log" --res "${LOGDIR}" \
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref "${REF}" \
--reads "${BAM}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--regions '"chr20:10,000,000-10,010,000"' \
--task {}
# Run call_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/call_variants \
--outfile "${CALL_VARIANTS_OUTPUT}" \
--examples "${OUTPUT_DIR}/examples.tfrecord@${N_SHARDS}.gz" \
--checkpoint "${MODEL}"
# Run postprocess_variants in parallel
sudo docker run \
-v ${HOME}:${HOME} \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/postprocess_variants \
--ref "${REF}" \
--infile "${CALL_VARIANTS_OUTPUT}" \
--outfile "${FINAL_OUTPUT_VCF}"

Options

Option

Description

--ref (required)

The reference genome in fasta format.

--in-bam (required)

Path to the input BAM file.

--out-variants (required)

Name of output vcf file.

--pb-model-file

Path of a non-default parabricks model file for deepvariant.

--interval (-L)

Interval within which to call the variants from the bam file. This option can be used multiple times. All intervals will have a padding of 100 and overlapping intervals will be combined. The intervals can be specified in a file using the BED file format or GATK style format. e.g. "-L chr1 -L chr2:1000-3100" or "-L interval.bed"

--gvcf

Generate variant calls in gvcf format.

--tmp-dir

Defaults to ..

Full path to the directory where temporary files will be stored.

--num-gpus

Defaults to number of GPUs in the system.

The number of GPUs to be used for this analysis task.