vqsr

Accelerated variant filteration using VQSR

Build a recalibration model to score variant quality and apply a score cutoff to filter variants.

Quick Start

CLI
$ pbrun vqsr --in-vcf sample.vcf \
--out-vcf output.vcf
--out-recal output.recal \
--out-tranches output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
--annotation QD --annotation MQ --annotation MQRankSum -annotation ReadPosRankSum

Compatible GATK4 command

gatk VariantRecalibrator -V sample.vcf \
-O output.recal \
--tranches-file output.tranches \
--resource omni,known=false,training=true,truth=true,prior=12.0:1000G_omni2.5.hg38.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum \
--mode BOTH
gatk ApplyVQSR -V sample.vcf \
--recal-file output.recal \
--tranches-file output.tranches \
-O output.vcf \
--mode BOTH

Options

Option

Description

--in-vcf (required)

Path to the input vcf file.

--out-vcf (required)

Path to the output vcf file.

--out-recal (required)

Path to the output recal file.

--out-tranches (required)

Path to the output tranches file.

--resource (required)

Known, truth, and training sets. The format string is <set name>,known=<boolean>,training=<boolean>,truth=<boolean>,prior=<float>:<path to the vcf file>. There must be at least one resource that is training and one resource that is truth. Any resource can be both. This option can be used multiple times.

--annotation (required)

Annotation which should be used for calculations. This option can be used multiple times.

--mode

Defaults to BOTH.

Type of variants to include in the recalibration. Possible values are SNP, INDEL, orBOTH.

--max-gaussians

Defaults to 8.

Max number of Gaussians for the positive model.

--truth-sensitivity-level

The truth sensitivity level at which to start filtering.

--lod-score-cutoff

The VQSLOD score below which to start filtering.