VCFtools

As of July 2015, the VCFtools project has been moved to github! Please visit the new website here: vcftools.github.io/perl_examples.html

The Perl modules examples

This page provides usage examples for the Perl modules. Extended documentation for all of the options can be found in the full documentation.

Annotating

# Add custom annotations
cat in.vcf | vcf-annotate -a annotations.gz \
   -d key=INFO,ID=ANN,Number=1,Type=Integer,Description='My custom annotation' \
   -c CHROM,FROM,TO,INFO/ANN > out.vcf

# Apply SnpCluster filter
cat in.vcf | vcf-annotate --filter SnpCluster=3,10 > out.vcf

Comparing

vcf-compare A.vcf.gz B.vcf.gz C.vcf.gz
vcf check A.vcf.gz B.vcf.gz

Concatenating

vcf-concat A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

Converting

# Convert between VCF versions
zcat file.vcf.gz | vcf-convert -r reference.fa | bgzip -c > out.vcf.gz

# Convert from VCF format to tab-delimited text file
zcat file.vcf.gz | vcf-to-tab > out.tab

Filtering

# Filter by QUAL and minimum depth
vcf-annotate --filter Qual=10/MinDP=20

Intersections, complements

# Include positions which appear in at least two files
vcf-isec -o -n +2 A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

# Exclude from A positions which appear in B and/or C
vcf-isec -c A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

# Fast hstlib implementation vcf isec -n =2 A.vcf.gz B.vcf.gz

Merging

vcf-merge A.vcf.gz B.vcf.gz | bgzip -c > C.vcf.gz
vcf merge A.vcf.gz B.vcf.gz

Querying

vcf-query file.vcf.gz 1:10327-10330 -c NA0001

Reordering columns

vcf-shuffle-cols -t template.vcf.gz file.vcf.gz > out.vcf

Stats

vcf-stats file.vcf.gz
vcf check file.vcf.gz > file.vchk && plot-vcfcheck file.vchk -p plot/

Stripping columns

vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz

Useful shell one-liners

This sections lists some usefull one line commands. Note that there are also dedicated convenience scripts vcf-sort and vcf-concat which do the same but also perform some basic sanity checks. All examples in BASH.

# Replace VCF header. The file must be compressed by bgzip.
tabix -r header.txt in.vcf.gz > out.vcf.gz

# Sort VCF file keeping the header. The head command is for performance.
(zcat file.vcf.gz | head -100 | grep ^#;
zcat file.vcf.gz | grep -v ^# | sort -k1,1d -k2,2n;) \
| bgzip -c > out.vcf.gz

# Merge (that is, concatenate) two VCF files into one, keeping the header
# from first one only.
(zcat A.vcf.gz | head -100 | grep ^#; \
zcat A.vcf.gz | grep -v ^#; \
zcat B.vcf.gz | grep -v ^#; ) \
| bgzip -c > out.vcf.gz

VCF validation

Both vcftools and Vcf.pm can be used for validation. The first validates VCFv4.0, the latter is able to validate the older versions as well.

perl -MVcf -e validate example.vcf
perl -I/path/to/the/module/ -MVcf -e validate example.vcf
vcf-validator example.vcf