haplotypecaller

GPU accelerated haplotypecaller

This tool runs GPU accelerated haplotypecaller. Users can provide an optional BQSR report to fix the BAM similar to ApplyBQSR. In that case the updated base qualities will be used.

Quick Start

CLI
$ pbrun haplotypecaller --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--in-recal-file recal_gpu.txt \
--out-variants result.vcf

Compatible GATK4 command

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

# Run ApplyBQSR Step
$ gatk ApplyBQSR --java-options -Xmx30g -R Ref/Homo_sapiens_assembly38.fasta \
-I=mark_dups_cpu.bam --bqsr-recal-file=recal_file.txt -O=cpu_nodups_BQSR.bam
#Run Haplotype Caller
$ gatk HaplotypeCaller --java-options -Xmx30g --input cpu_nodups_BQSR.bam --output \
result_cpu.vcf --reference Ref/Homo_sapiens_assembly38.fasta \
--native-pair-hmm-threads 16

Options

Option

Description

--ref (required)

The reference genome in fasta format.

--in-bam (required)

Path to the input bam file.

--out-variants (required)

Path of .vcf, g.vcf, or gvcf file.

--in-recal-file

Path to the input BQSR report. Only required if ApplyBQSR step is needed.

--haplotypecaller-options

Pass supported haplotype caller options as one string. Current original haplotypecaller supported options: -min-pruning <int>, -standard-min-confidence-threshold-for-calling <int>, -max-reads-per-alignment-start <int>, -min-dangling-branch-length <int>, and -pcr-indel-model <NONE, HOSTILE, AGGRESSIVE, CONSERVATIVE>.

--static-quantized-quals

Use static quantized quality scores to a given number of levels. Repeat this option multiple times for multiple bins.

--ploidy

Defaults to 2.

Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported.

--interval (-L)

Interval within which to call the variants from the bam file. This option can be used multiple times. All intervals will have a padding of 100 and overlapping intervals will be combined. The intervals can be specified in a file using the BED file format or GATK style format. e.g. "-L chr1 -L chr2:1000-3100" or "-L interval.bed".

--gvcf

Defaults to False.

Generate variant calls in gvcf format. When using this option --out-variants file should end with g.vcf or g.vcf.gz. If the --out-variants file ends in gz, the tool will generate gvcf.gz and index for it.

--batch

Given an input list of BAMs, run the variant calling of each BAM using one GPU, and process BAMs in parallel based on how many GPUs the system has.

--disable-read-filter

Disable the read filters for bam entries. Currently supported read filters that can be disabled are: MappingQualityAvailableReadFilter, MappingQualityReadFilter, and NotSecondaryAlignmentReadFilter. This option can be repeated multiple times.

--tmp-dir

Defaults to ..

Full path to the directory where temporary files will be stored.

--num-gpus

Defaults to number of GPUs in the system.

The number of GPUs to be used for this analysis task.