Documentation

The VCFtools package is broadly split into two sections:

  • The vcftools binary program, generally used to analyse VCF files.
  • The Vcf.pm perl module, which is a general Perl API containing a core of the utilities vcf-convert, vcf-merge, vcf-compare, vcf-isec, and others.

Documentation

Examples of usage by topic

Installation

The VCFtools package can be decompressed by the command

tar -xzf vcftools_version_number_source.tar.gz

To build the vcftools executable, type "make" in the vcftools folder.

The Perl scripts require that VCF files are compressed by bgzip and indexed by tabix (both tools are part of the tabix package, available for download here). Both tools must be in directories that are listed in the PATH environment variable. For running the Perl scripts, the PERL5LIB environment variable must be set to include the Vcf.pm module

export PERL5LIB=/path/to/your/vcftools-directory/perl

The tools can be tested by running the script

/path/to/your/vcftools-directory/perl/test.t

If the command complains about missing Test::Most perl module, do not worry, it is needed only for testing, not for running VCFtools.
Annotating

# Add custom annotations
cat in.vcf | vcf-annotate -a annotations.gz \
   -d key=INFO,ID=ANN,Number=1,Type=Integer,Description='My custom annotation' \
   -c CHROM,FROM,TO,INFO/ANN > out.vcf

# Apply SnpCluster filter
cat in.vcf | vcf-annotate --filter SnpCluster=3,10 > out.vcf

Comparing

vcf-compare A.vcf.gz B.vcf.gz C.vcf.gz
vcf check A.vcf.gz B.vcf.gz

Concatenating

vcf-concat A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

Converting

# Convert between VCF versions
zcat file.vcf.gz | vcf-convert -r reference.fa | bgzip -c > out.vcf.gz

# Convert from VCF format to tab-delimited text file
zcat file.vcf.gz | vcf-to-tab > out.tab

Filtering

# Filter by QUAL and minimum depth
vcf-annotate --filter Qual=10/MinDP=20

Intersections, complements

# Include positions which appear in at least two files
vcf-isec -o -n +2 A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

# Exclude from A positions which appear in B and/or C
vcf-isec -c A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

# Fast hstlib implementation vcf isec -n =2 A.vcf.gz B.vcf.gz

Merging

vcf-merge A.vcf.gz B.vcf.gz | bgzip -c > C.vcf.gz
vcf merge A.vcf.gz B.vcf.gz

Querying

vcf-query file.vcf.gz 1:10327-10330 -c NA0001

Reordering columns

vcf-shuffle-cols -t template.vcf.gz file.vcf.gz > out.vcf

Stats

vcf-stats file.vcf.gz
vcf check file.vcf.gz > file.vchk && plot-vcfcheck file.vchk -p plot/

Stripping columns

vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz

Useful shell one-liners

This sections lists some usefull one line commands. Note that there are also dedicated convenience scripts vcf-sort and vcf-concat which do the same but also perform some basic sanity checks. All examples in BASH.

# Replace VCF header. The file must be compressed by bgzip.
tabix -r header.txt in.vcf.gz > out.vcf.gz

# Sort VCF file keeping the header. The head command is for performance.
(zcat file.vcf.gz | head -100 | grep ^#;
zcat file.vcf.gz | grep -v ^# | sort -k1,1d -k2,2n;) \
| bgzip -c > out.vcf.gz

# Merge (that is, concatenate) two VCF files into one, keeping the header
# from first one only.
(zcat A.vcf.gz | head -100 | grep ^#; \
zcat A.vcf.gz | grep -v ^#; \
zcat B.vcf.gz | grep -v ^#; ) \
| bgzip -c > out.vcf.gz

VCF validation

Both vcftools and Vcf.pm can be used for validation. The first validates VCFv4.0, the latter is able to validate the older versions as well.

perl -MVcf -e validate example.vcf
perl -I/path/to/the/module/ -MVcf -e validate example.vcf
vcf-validator example.vcf

...and more

This page gives just a list of basic capabilities. For more, please go to the vcftools's options page and the Perl API and scripts page.