Somatic Pipeline

Run a somatic variant pipeline workflow

The somatic pipeline process the tumor fastq files and optionally normal fastq files and knownSites files and generates tumor or tumor/normal analysis. The output is in vcf format.

Quick Start

CLI
# The commandline below will run tumor-only analysis.
$ pbrun somatic --ref Ref/Homo_sapiens_assembly38.fasta \
--in-tumor-fq sample1-0.fq.gz sample1-1.fq.gz \
--out-vcf output.vcf \
--out-tumor-bam tumor.bam
# The commandline below will run tumor-normal analysis.
$ pbrun somatic --ref Ref/Homo_sapiens_assembly38.fasta \
--knownSites knownsites.vcf.gz "@RG\tID:sm_tumor_rg1\tLB:lib1\tPL:bar\tSM:sm_tumor\tPU:sm_tumor_rg1" \
--in-tumor-fq tumor0.fq.gz tumor1.fq.gz \
--out-vcf output.vcf \
--out-tumor-bam tumor.bam \
--out-tumor-recal-file recal.txt \
--in-normal-fq normal0.fq.gz normal1.fq.gz "@RG\tID:sm_normal_rg1\tLB:lib1\tPL:bar\tSM:sm_normal\tPU:sm_normal_rg1" \
--out-normal-bam normal.bam

Options

CLI

Option

Description

--ref (required)

The reference genome in fasta format. We assume that the indexing required to run bwa has been completed by the user.

--in-tumor-fq (required)

Full path to the pair ended fastq files (in gz or fastq format) followed by read group with quotes. (Example: "@RG\tID:foo\tLB:lib1\tPL:bar\tSM:20"). Files can be in fastq or fastq.gz format. This option can be repeated multiple times.

--out-vcf (required)

Path of VCF file after Variant Calling.

--out-tumor-bam (required)

Path of bam file for tumor reads.

--out-tumor-recal-file

Path of report file after Base Quality Score Recalibration for tumor sample.

--knownSites

Known indel files in .vcf.gz format. These should be compressed vcf files for known SNPs and indels. You can use this option multiple times. If you provide this option, then you must also provide an --out-recal-file (see below for details).

--in-normal-fq

Full path to the pair ended fastq files (in gz or fastq format) followed by read group with quotes. (Example: "@RG\tID:foo\tLB:lib1\tPL:bar\tSM:20"). Files can be in fastq or fastq.gz format. This option can be repeated multiple times.

--out-normal-bam

Path of bam file for normal reads.

--tmp-dir

Defaults to ..

Full path to the directory where temporary files will be stored.

--num-gpus

Defaults to 8.

The number of GPUs to be used for this analysis task.

--no-markdups

Defaults to False.

Do not mark duplicates, generate bam after co-ordinate sorting.

--ploidy

Defaults to 2. ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported.

--bwa-options

Pass supported bwa mem options as one string. Current original bwa mem supported options, -M, -Y, -T.