The VCFtools package is broadly split into two sections:
- The vcftools binary program, generally used to analyse VCF files.
- The Vcf.pm perl module, which is a general Perl API containing a core of the utilities vcf-convert, vcf-merge, vcf-compare, vcf-isec, and others.
Examples of usage by topic
The VCFtools package can be decompressed by the command
tar -xzf vcftools_version_number_source.tar.gz
To build the vcftools executable, type "make" in the vcftools folder.
The Perl scripts require that VCF files are compressed by bgzip and indexed by tabix (both tools are part of the tabix package, available for download here). Both tools must be in directories that are listed in the PATH environment variable. For running the Perl scripts, the PERL5LIB environment variable must be set to include the Vcf.pm module
export PERL5LIB=/path/to/your/vcftools-directory/perlThe tools can be tested by running the script
/path/to/your/vcftools-directory/perl/test.tIf the command complains about missing Test::Most perl module, do not worry, it is needed only for testing, not for running VCFtools.
# Add custom annotations
cat in.vcf | vcf-annotate -a annotations.gz \
-d key=INFO,ID=ANN,Number=1,Type=Integer,Description='My custom annotation' \
-c CHROM,FROM,TO,INFO/ANN > out.vcf
# Apply SnpCluster filter
cat in.vcf | vcf-annotate --filter SnpCluster=3,10 > out.vcf
# Include positions which appear in at least two files
vcf-isec -o -n +2 A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
# Exclude from A positions which appear in B and/or C
vcf-isec -c A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
# Fast hstlib implementation vcf isec -n =2 A.vcf.gz B.vcf.gz
This sections lists some usefull one line commands. Note that there are also dedicated convenience scripts vcf-sort and vcf-concat which do the same but also perform some basic sanity checks. All examples in BASH.
# Replace VCF header. The file must be compressed by bgzip.
tabix -r header.txt in.vcf.gz > out.vcf.gz
# Sort VCF file keeping the header. The head command is for performance.
(zcat file.vcf.gz | head -100 | grep ^#;
zcat file.vcf.gz | grep -v ^# | sort -k1,1d -k2,2n;) \
| bgzip -c > out.vcf.gz
# Merge (that is, concatenate) two VCF files into one, keeping the header
# from first one only.
(zcat A.vcf.gz | head -100 | grep ^#; \
zcat A.vcf.gz | grep -v ^#; \
zcat B.vcf.gz | grep -v ^#; ) \
| bgzip -c > out.vcf.gz
Both vcftools and Vcf.pm can be used for validation. The first validates VCFv4.0, the latter is able to validate the older versions as well.