Documentation
The VCFtools package is broadly split into two sections:
- The vcftools binary program, generally used to analyse VCF files.
- The Vcf.pm perl module, which is a general Perl API containing a core of the utilities vcf-convert, vcf-merge, vcf-compare, vcf-isec, and others.
Documentation
Examples of usage by topic
The VCFtools package can be decompressed by the command
tar -xzf vcftools_version_number_source.tar.gz
To build the vcftools executable, type "make" in the vcftools folder.
The Perl scripts require that VCF files are compressed by bgzip and indexed by tabix (both tools are part of the tabix package, available for download here). Both tools must be in directories that are listed in the PATH environment variable. For running the Perl scripts, the PERL5LIB environment variable must be set to include the Vcf.pm module
export PERL5LIB=/path/to/your/vcftools-directory/perl
The tools can be tested by running the script/path/to/your/vcftools-directory/perl/test.t
If the command complains about missing Test::Most perl module, do not worry, it is needed only for testing, not for running VCFtools.
# Add custom annotations
cat in.vcf | vcf-annotate -a annotations.gz \
-d key=INFO,ID=ANN,Number=1,Type=Integer,Description='My custom annotation' \
-c CHROM,FROM,TO,INFO/ANN > out.vcf
# Apply SnpCluster filter
cat in.vcf | vcf-annotate --filter SnpCluster=3,10 > out.vcf
vcf-compare A.vcf.gz B.vcf.gz C.vcf.gz
vcf check A.vcf.gz B.vcf.gz
vcf-concat A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
# Convert between VCF versions
zcat file.vcf.gz | vcf-convert -r reference.fa | bgzip -c > out.vcf.gz
# Convert from VCF format to tab-delimited text file
zcat file.vcf.gz | vcf-to-tab > out.tab
# Filter by QUAL and minimum depth
vcf-annotate --filter Qual=10/MinDP=20
# Include positions which appear in at least two files
vcf-isec -o -n +2 A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
# Exclude from A positions which appear in B and/or C
vcf-isec -c A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
# Fast hstlib implementation
vcf isec -n =2 A.vcf.gz B.vcf.gz
vcf-shuffle-cols -t template.vcf.gz file.vcf.gz > out.vcf
vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz
This sections lists some usefull one line commands. Note that there are also dedicated convenience scripts vcf-sort and vcf-concat which do the same but also perform some basic sanity checks. All examples in BASH.
# Replace VCF header. The file must be compressed by bgzip.
tabix -r header.txt in.vcf.gz > out.vcf.gz
# Sort VCF file keeping the header. The head command is for performance.
(zcat file.vcf.gz | head -100 | grep ^#;
zcat file.vcf.gz | grep -v ^# | sort -k1,1d -k2,2n;) \
| bgzip -c > out.vcf.gz
# Merge (that is, concatenate) two VCF files into one, keeping the header
# from first one only.
(zcat A.vcf.gz | head -100 | grep ^#; \
zcat A.vcf.gz | grep -v ^#; \
zcat B.vcf.gz | grep -v ^#; ) \
| bgzip -c > out.vcf.gz
Both vcftools and Vcf.pm can be used for validation. The first validates VCFv4.0, the latter is able to validate the older versions as well.
perl -MVcf -e validate example.vcf
perl -I/path/to/the/module/ -MVcf -e validate example.vcf
vcf-validator example.vcf
This page gives just a list of basic capabilities. For more, please go to the vcftools's options page and the Perl API and scripts page.