Population Studies

Use the Parabricks Genomics Database tool to perform population studies. Create a genomic database for multiple samples and import data into it.

Overview

The population studies pipeline can be used as shown below. Optionally the Germline step can be removed, if you already have all the g.vcf.gz generated during the variant calls.

Quick Start

CLI
# Create a genomics database
pbrun creategenomicsdb –dir <genomics db address>
# Populate the database with data:
pbrun importgvcftodb –dir < genomics db address> --in-gvcf <input GVCF> --in-gvcf <input GVCF> --
in-gvcf <input GVCF>
# Select variants from the database
$ pbrun selectvariants --ref <Reference Genome> -dir < genomics db address> --out-gvcf <output
GVCF>

creategenomicsdb Options

CLI

Option

Description

-dir (required)

Path to directory where the database will be stored.

importgvcftodb Options

CLI

Option

Description

-dir (required)

Directory of the database to which the gvcf data will be imported.

--in-gvcf (required)

It should be gvcf.gz format ( It should be either generated by Parabricks germline pipeline or bzip).

selectvariants Options

CLI

Option

Description

--ref (required)

Reference human genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.

-dir (required)

Location of the genomics database which will be used to select variants.

--out-gvcf

Path to the file where the merged GVCF result will be stored.