Variant Effect Predictor Running VEP


VEP is run on the command line as follows (assuming you are in the ensembl-vep directory):

 ./vep [options] 

where [options] represent a set of flags and options. A basic set of flags can be listed using --help:

 ./vep --help 
VEP can be run in the following modes:
  • For optimum performance, download a cache file for your species of interest, using either the installer or by following the VEP Cache documentation, and run VEP with either the --cache or --offline option.

  • By connecting to the public Ensembl database servers in place of a cache. This can be adequate when annotating small files, but the database servers can become busy and slow. To enable this option, use --database.

  • To run VEP using your own species and assembly, please use a --fasta file and --gff or --gtf annotation.

To run VEP with default options, use the following command:

 ./vep --cache -i input.txt -o output.txt 

where input.txt contains data in one of the compatible input formats and output.txt is the output file to be created.

Options can be passed as the full string (e.g. --format), or as the shortest unique string among the options (e.g. --form for --format, since there is another option --force_overwrite).

You may use one or two hypen ("-") characters before each option name; --cache or -cache.

VEP options can also be read from:

  • Configuration files using --config. Options set in configuration files are overriden if specified on the command line.
  • Environment variables that start with prefix VEP_. For instance, you can set the cache flag with export VEP_CACHE=1 and the input flag with export VEP_INPUT="/path/to/input.txt" before running ./vep. Options set in environment variables are overriden if specified in configuration files or on the command line.

Running VEP on non-vertebrate species

To use VEP on non-vertebrate species, you need to use extra options:

  • Database access: add the option --genomes to point to the correct database server.

     ./vep -i input.txt -o output.txt --species triticum_aestivum --database --genomes 

    Some of the non-vetebrate species databases (mainly bacteria and protists) are composed of a collection of species. For these databases the flag "--is_multispecies 1" is required.

     ./vep -i input.txt -o output.txt --species protists_euglenozoa1 --database --genomes --is_multispecies 1 
  • Cache access: use the option "--cache_version eg_version" where eg_version is the Ensembl genomes version number, which differs from the Ensembl vertebrates/VEP version numbers (113).

     ./vep -i input.txt -o output.txt --species triticum_aestivum --cache --cache_version 42

    More information about VEP cache is available here.



Basic options

Flag Alternate Description Incompatible with
--help
  Display help message and quit  
--quiet
-q
Suppress warning messages.Not used by default --verbose
--verbose
-v
Print out a bit more information while running. Not used by default --quiet
--config [filename]
 

Load configuration options from a config file. The config file should consist of whitespace-separated pairs of option names and settings e.g.:

output_file   my_output.txt
species       mus_musculus
format        vcf
host          useastdb.ensembl.org
A config file can also be implicitly read; save the file as $HOME/.vep/vep.ini (or equivalent directory if using --dir). Any options in this file will be overridden by those specified in a config file using --config, and in turn by any options specified on the command line. You can create a quick version file of this by setting the flags as normal and running VEP in verbose (-v) mode. This will output lines that can be copied to a config file that can be loaded in on the next run using --config. Not used by default
 
--everything
-e
Shortcut flag to switch on all of the following:

--sift b, --polyphen b, --ccds, --hgvs, --symbol, --numbers, --domains, --regulatory, --canonical, --protein, --biotype, --af, --af_1kg, --af_esp, --af_gnomade, --af_gnomadg, --max_af, --pubmed, --uniprot, --mane, --tsl, --appris, --variant_class, --gene_phenotype, --mirna

 
--species [species]
  Species for your data. This can be the latin name e.g. "homo_sapiens" or any Ensembl alias e.g. "mouse". Specifying the latin name can speed up initial database connection as the registry does not have to load all available database aliases on the server. Default = "homo_sapiens"  
--assembly [name]
-a
Select the assembly version to use if more than one available. If using the cache, you must have the appropriate assembly's cache file installed. If not specified and you have only 1 assembly version installed, this will be chosen by default. Default = use found assembly version  
--input_file [filename]
-i
Input file name. If not specified, VEP will attempt to read from STDIN. Can use compressed file (gzipped).  
--input_data [string]
--id

Raw input data as a string. May be used, for example, to input a single rsID or HGVS notation quickly to vep:

--input_data rs699
 
--format [format]
 

Input file format - one of "ensembl", "vcf", "hgvs", "id", "region", "spdi".
By default, VEP auto-detects the input file format. Using this option you can specify the input file is Ensembl, VCF, IDs, HGVS, SPDI or region format. Can use compressed version (gzipped) of any file format listed above. Auto-detects format by default

 
--output_file [filename]
-o
Output file name. Results can write to STDOUT by specifying 'STDOUT' as the output file name - this will force quiet mode. Default = "variant_effect_output.txt"  
--force_overwrite
--force
By default, VEP will fail with an error if the output file already exists. You can force the overwrite of the existing file by using this flag. Not used by default  
--no_stats
  Don't generate a stats file. Provides marginal gains in run time.  
--stats_file [filename]
--sf
Summary stats file name. This file contains a summary of the VEP run. If stats are returned in an HTML file (default), the filename should end in .html or .htm. Default = "variant_effect_output.txt_summary.html"  
--stats_html
  Generate a HTML stats file (default).  
--stats_text
  Generate a plain text stats file. Can be combined with --stats_html to generate both plain text and HTML stats files.  
--warning_file [filename]
  File name to write warnings and errors to. Default = STDERR (standard error)  
--skipped_variants_file [filename]
  File name to write skipped variants to. Default = STDERR (standard error)  
--max_sv_size
  Extend the maximum Structural Variant size VEP can process. Default = 10000000  
--no_check_variants_order
  Permit the use of unsorted input files. However running VEP on unsorted input files slows down the tool and requires more memory.  
--fork [num_forks]
  Enable forking, using the specified number of forks. Forking can dramatically improve runtime. Not used by default  
--safe
  By default, a VEP run is successful even when a plugin reports issues. Use this flag to ensure VEP fails if a plugin raises warnings or generates compilation errors. This is particularly useful to ensure plugins run successfully when using VEP in pipelines. Not used by default  


Cache options

Flag Alternate Description Output fields Incompatible with
--cache
  Enables use of the cache. Add --refseq or --merged to use the refseq or merged cache, (if installed).  
--dir [directory]
  Specify the base cache/plugin directory to use. Default = "$HOME/.vep/"    
--dir_cache [directory]
  Specify the cache directory to use. Default = "$HOME/.vep/"    
--dir_plugins [directory]
  Specify the plugin directory to use. Default = "$HOME/.vep/"    
--offline
  Enable offline mode. No database connections will be made, and a cache file or GFF/GTF file is required for annotation. Add --refseq to use the refseq cache (if installed). Not used by default  
--fasta [file|dir]
--fa
Specify a FASTA file or a directory containing FASTA files to use to look up reference sequence. The first time you run VEP with this parameter an index will be built which can take a few minutes. This is required if fetching HGVS annotations (--hgvs) or checking reference sequences (--check_ref) in offline mode (--offline), and optional with some performance increase in cache mode (--cache). See documentation for more details. Not used by default    
--refseq
 

Specify this option if you have installed the RefSeq cache in order for VEP to pick up the alternate cache directory. This cache contains transcript objects corresponding to RefSeq transcripts. Consequence output will be given relative to these transcripts in place of the default Ensembl transcripts (see documentation)

REFSEQ_MATCH, BAM_EDIT
--merged
 

Use the merged Ensembl and RefSeq cache. Consequences are flagged with the SOURCE of each transcript used.

REFSEQ_MATCH, BAM_EDIT, SOURCE
--cache_version
  Use a different cache version than the assumed default (the VEP version). This should be used with Ensembl Genomes caches since their version numbers do not match Ensembl versions. For example, the VEP/Ensembl version may be 88 and the Ensembl Genomes version 35. Not used by default    
--show_cache_info
  Show source version information for selected cache and quit    
--buffer_size [number]
  Sets the internal buffer size, corresponding to the number of variants that are read in to memory simultaneously. Set this lower to use less memory at the expense of longer run time, and higher to use more memory with a faster run time. Default = 5000    


Other annotation sources

Flag Alternate Description Output fields
--plugin [plugin name]
  Use named plugin. Plugin modules should be installed in the Plugins subdirectory of the VEP cache directory (defaults to $HOME/.vep/). Multiple plugins can be used by supplying the --plugin flag multiple times. See plugin documentation. Not used by default Plugin-dependent
--custom file=[filename]
  Add custom annotation to the output. Files must be tabix indexed or in the bigWig format. Multiple files can be specified by supplying the --custom flag multiple times. See here for full details. Not used by default SOURCE, Custom file dependent
--gff [filename]
  Use GFF transcript annotations in [filename] as an annotation source. Requires a FASTA file of genomic sequence. Not used by default SOURCE
--gtf [filename]
  Use GTF transcript annotations in [filename] as an annotation source. Requires a FASTA file of genomic sequence. Not used by default SOURCE
--bam [filename]
  ADVANCED Use BAM file of sequence alignments to correct transcript models not derived from reference genome sequence. Used to correct RefSeq transcript models. Enables --use_transcript_ref; add --use_given_ref to override this behaviour. Not used by default BAM_EDIT
--use_transcript_ref
  By default VEP uses the reference allele provided in the input file to calculate consequences for the provided alternate allele(s). Use this flag to force VEP to replace the provided reference allele with sequence derived from the overlapped transcript. This is especially relevant when using the RefSeq cache, see documentation for more details. The GIVEN_REF and USED_REF fields are set in the output to indicate any change. Not used by default GIVEN_REF, USED_REF
--use_given_ref
  Using --bam or a BAM-edited RefSeq cache by default enables --use_transcript_ref; add this flag to override this behaviour and use the provided reference allele from the input. Not used by default  
--custom_multi_allelic
  By default, comma separated lists found within the INFO field of custom annotation VCFs are assumed to be allele specific. For example, a variant with allele_string A/G/C with associated custom annotation 'single,double,triple' will associate triple with C, double with G and single with A. This flag instructs VEP to return all annotations for all alleles. Not used by default  


Output format options

Flag Alternate Description Output fields Incompatible with
--vcf
 

Writes output in VCF format. Consequences are added in the INFO field of the VCF file, using the key "CSQ". Data fields are encoded separated by "|"; the order of fields is written in the VCF header. Output fields in the "CSQ" INFO field can be selected by using --fields.

If the input format was VCF, the file will remain unchanged save for the addition of the CSQ field (unless using any filtering).

Custom data added with --custom are added as separate fields, using the key specified for each data file.

Commas in fields are replaced with ampersands (&) to preserve VCF format.

Not used by default
 
--tab
  Writes output in tab-delimited format. Not used by default  
--json
  Writes output in JSON format. Not used by default  
--compress_output [gzip|bgzip]
  Writes output compressed using either gzip or bgzip. Not used by default    
--fields [list]
 

Configure the output format using a comma separated list of fields.
Can only be used with tab (--tab) or VCF format (--vcf) output.
For the tab format output, the selected fields may be those present in the default output columns, or any of those that appear in the Extra column (including those added by plugins or custom annotations) if the appropriate output is available (e.g. use --show_ref_allele to access 'REF_ALLELE'). Output remains tab-delimited.
For the VCF format output, the selected fields are those present within the "CSQ" INFO field.

Example of command for the tab output:

--tab --fields "Uploaded_variation,Location,Allele,Gene"

Example of command for the VCF format output:

--vcf --fields "Allele,Consequence,Feature_type,Feature"
Not used by default
   
--minimal
  Convert alleles to their most minimal representation before consequence calculation i.e. sequence that is identical between each pair of reference and alternate alleles is trimmed off from both ends, with coordinates adjusted accordingly.
Note this may lead to discrepancies between input coordinates and coordinates reported by VEP relative to transcript sequences; to avoid issues, use --allele_number and/or ensure that your input variants have unique identifiers. The MINIMISED flag is set in the VEP output where relevant. For an insertion/deletion, the allele is minimised by default. To access the input allele before minimisation, use --uploaded_allele.
Not used by default
MINIMISED


Output options

Flag Alternate Description Output fields Incompatible with
--variant_class
  Output the Sequence Ontology variant class. Not used by default VARIANT_CLASS  
--sift [p|s|b]
  Species limited SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. VEP can output the prediction term, score or both. Not used by default SIFT
--polyphen [p|s|b]
  Human only PolyPhen is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. VEP can output the prediction term, score or both. VEP uses the humVar score by default - use --humdiv to retrieve the humDiv score. Not used by default PolyPhen
--humdiv
  Human only Retrieve the humDiv PolyPhen prediction instead of the default humVar. Not used by default PolyPhen  
--nearest [transcript|gene|symbol]
 

Retrieve the transcript or gene with the nearest protein-coding transcription start site (TSS) to each input variant. Use "transcript" to retrieve the transcript stable ID, "gene" to retrieve the gene stable ID, or "symbol" to retrieve the gene symbol. Note that the nearest TSS may not belong to a transcript that overlaps the input variant, and more than one may be reported in the case where two are equidistant from the input coordinates.

Currently only available when using a cache annotation source, and requires the Set::IntervalTree perl module.

Not used by default
NEAREST  
--distance [bp_distance(,downstream_distance)]
  Modify the distance up and/or downstream between a variant and a transcript for which VEP will assign the upstream_gene_variant or downstream_gene_variant consequences. Giving one distance will modify both up- and downstream distances; prodiving two separated by commas will set the up- (5') and down- (3') stream distances respectively. Default: 5000    
--overlaps
  Report the proportion and length of a transcript overlapped by a structural variant in VCF format.    
--gene_phenotype
  Indicates if the overlapped gene is associated with a phenotype, disease or trait. See list of phenotype sources. Not used by default GENE_PHENO  
--regulatory
  Look for overlaps with regulatory regions. VEP can also report if a variant falls in a high information position within a transcription factor binding site. Output lines have a Feature type of RegulatoryFeature or MotifFeature. Not used by default MOTIF_NAME, MOTIF_POS, HIGH_INF_POS, MOTIF_SCORE_CHANGE  
--cell_type
  Report only regulatory regions that are found in the given cell type(s). Can be a single cell type or a comma-separated list. The functional type in each cell type is reported under CELL_TYPE in the output. To retrieve a list of cell types, use --cell_type list. Not used by default CELL_TYPE  
--individual [all|ind list]
  Consider only alternate alleles present in the genotypes of the specified individual(s). May be a single individual, a comma-separated list or "all" to assess all individuals separately. Individual variant combinations homozygous for the given reference allele will not be reported. Each individual and variant combination is given on a separate line of output. Only works with VCF files containing individual genotype data; individual IDs are taken from column headers. Not used by default IND, ZYG
--individual_zyg [all|ind list]
  Consider alternate and reference alleles present in the genotypes of the specified individual(s). May be a single individual, a comma-separated list or "all" to assess all individuals separately. Returns a list of individuals and their zygosity. Only works with VCF files containing individual genotype data; individual IDs are taken from column headers. Not used by default ZYG
--phased
  Force VCF genotypes to be interpreted as phased. For use with plugins that depend on phased data. Not used by default    
--allele_number
  Identify allele number from VCF input, where 1 = first ALT allele, 2 = second ALT allele etc. Useful when using --minimal Not used by default ALLELE_NUM  
--show_ref_allele
  Adds the reference allele in the output (after minimisation). Mainly useful for the VEP "default" and tab-delimited output formats. Not used by default REF_ALLELE  
--uploaded_allele
  Adds the uploaded allele string in the output (before minimisation). UPLOADED_ALLELE  
--total_length
  Give cDNA, CDS and protein positions as Position/Length. Not used by default    
--numbers
  Adds affected exon and intron numbering to to output. Format is Number/Total. Not used by default EXON, INTRON
--mirna
  Reports where the variant lies in the miRNA secondary structure (only for Ensembl/GENCODE transcripts). Not used by default    
--no_escape
  Don't URI escape HGVS strings. Default = escape    
--keep_csq
  Don't overwrite existing CSQ entry in VCF INFO field. Overwrites by default    
--vcf_info_field [CSQ|ANN|(other)]
  Change the name of the INFO key that VEP write the consequences to in its VCF output. Use "ANN" for compatibility with other tools such as snpEff. Default: CSQ    
--terms [SO|display|NCBI]
-t
The type of consequence terms to output. The Ensembl terms are described here. The Sequence Ontology is a joint effort by genome annotation centres to standardise descriptions of biological sequences. Default = "SO"    
--no_headers
  Don't write header lines in output files. Default = add headers    
--shift_3prime [0|1]
  Right aligns all variants relative to their associated transcripts prior to consequence calculation.
An example using this option can be found here.
Default = 0
--shift_genomic [0|1]
  Right aligns all variants, including intergenic variants, before consequence calculation and updates the Location field.
An example using this option can be found here.
Default = 0
--shift_length
  Reports the distance each variant has been shifted when used in conjuction with --shift_3prime    


Identifiers

Flag Alternate Description Output fields Incompatible with
--hgvs
  Add HGVS nomenclature based on Ensembl stable identifiers to the output. Both coding and protein sequence names are added where appropriate. To generate HGVS identifiers when using --cache or --offline you must use a FASTA file and --fasta. HGVS notations given on Ensembl identifiers are versioned. Not used by default HGVSc, HGVSp, HGVS_OFFSET  
--hgvsg
  Add genomic HGVS nomenclature based on the input chromosome name. To generate HGVS identifiers when using --cache or --offline you must use a FASTA file and --fasta. Not used by default HGVSg  
--hgvsg_use_accession
  Force --hgvsg to return RefSeq reference sequence. For example, reports NC_000002.11 for human chromosome 2 (build GRCh38). HGVSg  
--hgvsp_use_prediction
  Force --hgvs to return the HGVSp notation in predicted format. For example, ENSP00000233741.4:p.Thr367AsnfsTer13 will be returned as ENSP00000233741.4:p.(Thr367AsnfsTer13). HGVSp  
--ambiguous_hgvs [0|1]
  Allow input HGVSp to resolve to all genomic locations. Otherwise, most likely transcript will be selected. Default: 0 (most likely transcript selected)    
--spdi
  Add genomic SPDI notation. To generate SPDI when using --cache or --offline you must use a FASTA file and --fasta. Not used by default SPDI  
--ga4gh_vrs
  Add GA4GH Variation Representation Specification (VRS) notation. To generate GA4GH VRS when using --cache or --offline you must use a FASTA file and --fasta. Not used by default GA4GH_VRS
--shift_hgvs [0|1]
  Enable or disable 3' shifting of HGVS notations. HGVS nomenclature requires an ambiguous sequence change to be described at the most 3' possible location. When enabled, this causes "shifting" to the most 3' possible coordinates (relative to the transcript sequence and strand) before the HGVS notations are calculated; the flag HGVS_OFFSET is set to the number of bases by which the variant has shifted, relative to the input genomic coordinates. If HGVS_OFFSET is equals to 0, no value will be added to HGVS_OFFSET column. To disable the changing of location at transcript level set --shift_hgvs to 0. Default: 1 (shift)  
--transcript_version
  Add version numbers to Ensembl transcript identifiers    
--gene_version
  Add version numbers to Ensembl gene identifiers    
--protein
  Add the Ensembl protein identifier to the output where appropriate. Not used by default ENSP
--symbol
  Adds the gene symbol (e.g. HGNC) (where available) to the output. Some gene symbol, e.g. HGNC, are only available in merged and Ensembl caches and therefore should not be used with the --refseq cache option. Not used by default SYMBOL, SYMBOL_SOURCE, HGNC_ID
--ccds
  Adds the CCDS transcript identifer (where available) to the output. Not used by default CCDS
--uniprot
  Adds best match accessions for translated protein products from three UniProt-related databases (SWISSPROT, TREMBL and UniParc) to the output. Not used by default SWISSPROT, TREMBL, UNIPARC, UNIPROT_ISOFORM
--tsl
  Adds the transcript support level for this transcript to the output. Not used by default

Note

Only available for human on the GRCh38 assembly
TSL
--appris
  Adds the APPRIS isoform annotation for this transcript to the output. Not used by default

Note

Only available for human on the GRCh38 assembly
APPRIS
--canonical
  Adds a flag indicating if the transcript is the canonical transcript for the gene. Not used by default CANONICAL
--mane
  Adds a flag indicating if the transcript is the MANE Select or MANE Plus Clinical transcript for the gene. If --cache or --database annotation source is used, the alternative transcript stable ID is also added. Not used by default

Note

Only available for human on the GRCh38 assembly
MANE, MANE_SELECT, MANE_PLUS_CLINICAL
--mane_select
  Adds a flag indicating if the transcript is the MANE Select transcript for the gene. If --cache or --database annotation source is used, the alternative transcript stable ID is also added. Not used by default

Note

Only available for human on the GRCh38 assembly
MANE, MANE_SELECT
--biotype
  Adds the biotype of the transcript or regulatory feature. Not used by default BIOTYPE
--domains
  Adds names of overlapping protein domains to output. Not used by default DOMAINS
--xref_refseq
  Output aligned RefSeq mRNA identifier for transcript. Not used by default

Note

The RefSeq and Ensembl transcripts aligned in this way MAY NOT, AND FREQUENTLY WILL NOT, match exactly in sequence, exon structure and protein product
RefSeq
--synonyms [file]
  Load a file of chromosome synonyms. File should be tab-delimited with the primary identifier in column 1 and the synonym in column 2. Synonyms allow different chromosome identifiers to be used in the input file and any annotation source (cache, database, GFF, custom file, FASTA file). Not used by default    


Co-located variants

Flag Alternate Description Output fields Incompatible with
--check_existing
  Checks for the existence of known variants that are co-located with your input. By default the alleles are compared and variants on an allele-specific basis - to compare only coordinates, use --no_check_alleles.

Some databases may contain variants with unknown (null) alleles and these are included by default; to exclude them use --exclude_null_alleles.

See this page for more details.

Not used by default
Existing_variation, CLIN_SIG, SOMATIC, PHENO  
--check_svs
  Checks for the existence of structural variants that overlap your input. Currently requires database access. Not used by default SV --offline
--clin_sig_allele [1|0]
  Return allele specific clinical significance. Setting this option to 0 will provide all known clinical significance values at the given locus. Default: 1 (Provide allele-specific annotations) CLIN_SIG  
--exclude_null_alleles
  Do not include variants with unknown alleles when checking for co-located variants. Our human database contains variants from HGMD and COSMIC for which the alleles are not publically available; by default these are included when using --check_existing, use this flag to exclude them. Not used by default    
--no_check_alleles
  When checking for existing variants, by default VEP only reports a co-located variant if none of the input alleles are novel. For example, if your input variant has alleles A/G, and an existing co-located variant has alleles A/C, the co-located variant will not be reported.

Strand is also taken into account - in the same example, if the input variant has alleles T/G but on the negative strand, then the co-located variant will be reported since its alleles match the reverse complement of input variant.

Use this flag to disable this behaviour and compare using coordinates alone. Not used by default
   
--af
  Add the global allele frequency (AF) from 1000 Genomes Phase 3 data for any known co-located variant to the output. For this and all --af_* flags, the frequency reported is for the input allele only, not necessarily the non-reference or derived allele. Not used by default AF  
--max_af
  Report the highest allele frequency observed in any population from 1000 genomes, ESP or gnomAD. Not used by default MAX_AF, MAX_AF_POPS
--af_1kg
  Add allele frequency from continental populations (AFR,AMR,EAS,EUR,SAS) of 1000 Genomes Phase 3 to the output. Must be used with --cache. Not used by default AFR_AF, AMR_AF, EAS_AF, EUR_AF, SAS_AF
--af_esp
  Include allele frequency from NHLBI-ESP populations. Must be used with --cache. Deprecated. AA_AF, EA_AF
--af_gnomade
--af_gnomad
Include allele frequency from Genome Aggregation Database (gnomAD) exome populations. Note only data from the gnomAD exomes are included; to retrieve data from the additional genomes data set, see this guide. Must be used with --cache Not used by default gnomADe_AF, gnomADe_AFR_AF, gnomADe_AMR_AF, gnomADe_ASJ_AF, gnomADe_EAS_AF, gnomADe_FIN_AF, gnomADe_NFE_AF, gnomADe_OTH_AF, gnomADe_SAS_AF
--af_gnomadg
  Include allele frequency from Genome Aggregation Database (gnomAD) genome populations. Note only data from the gnomAD genomes are included; to retrieve data from the additional genomes data set, see this guide. Must be used with --cache Not used by default gnomADg_AF, gnomADg_AFR_AF, gnomADg_AMI_AF, gnomADg_AMR_AF, gnomADg_ASJ_AF, gnomADg_EAS_AF, gnomADg_FIN_AF, gnomADg_MID_AF, gnomADg_NFE_AF, gnomADg_OTH_AF, gnomADg_SAS_AF
--af_exac
  Include allele frequency from ExAC project populations. Must be used with --cache. Deprecated. ExAC_AF, ExAC_Adj_AF, ExAC_AFR_AF, ExAC_AMR_AF, ExAC_EAS_AF, ExAC_FIN_AF, ExAC_NFE_AF, ExAC_OTH_AF, ExAC_SAS_AF
--pubmed
  Report Pubmed IDs for publications that cite existing variant. Must be used with --cache. Not used by default PUBMED
--var_synonyms
  Report known synonyms for co-located variants. Must be used with --cache. Not used by default VAR_SYNONYMS
--failed [0|1]
  When checking for co-located variants, by default VEP will exclude variants that have been flagged as failed. Set this flag to include such variants. Default: 0 (exclude)    


Filtering and QC options

NOTE: The filtering options here filter your results before they are written to your output file. Using VEP's filtering script, it is possible to filter your results after VEP has run. This way you can retain all of the results and run multiple filter sets on the same results to find different data of interest.

Flag Alternate Description Output fields Incompatible with
--gencode_basic
  Limit your analysis to transcripts belonging to the GENCODE basic set. This set has fragmented or problematic transcripts removed. Not used by default  
--gencode_primary
  Limit your analysis to transcripts belonging to the GENCODE primary set. This set covers all human exons in a minimal set of transcripts. Not used by default

Note

Only available for human on the GRCh38 assembly
 
--exclude_predicted
  When using the RefSeq or merged cache, exclude predicted transcripts (i.e. those with identifiers beginning with "XM_" or "XR_").

Note

We do not support predicted RefSeq transcripts for GRCh37.
   
--transcript_filter
 

ADVANCED Filter transcripts according to any arbitrary set of rules. Uses similar notation to filter_vep.

You may filter on any key defined in the root of the transcript object; most commonly this will be "stable_id":

--transcript_filter "stable_id match N[MR]_"
   
--check_ref
  Force VEP to check the supplied reference allele against the sequence stored in the Ensembl Core database or supplied FASTA file. Lines that do not match are skipped. Checking is done on the minimised sequence. Example chr13 32900399 . AGT A . the As are removed and the reference sequence is checked from 32900400 to see if it matches GTNot used by default  
--lookup_ref
  Force overwrite the supplied reference allele with the sequence stored in the Ensembl Core database or supplied FASTA file. Not used by default  
--dont_skip
  Don't skip input variants that fail validation, e.g. those that fall on unrecognised sequences.
Combining --check_ref with --dont_skip will add a CHECK_REF output field when the given reference does not match the underlying reference sequence.
CHECK_REF  
--allow_non_variant
  When using VCF format as input and output, by default VEP will skip non-variant lines of input (where the ALT allele is null). Enabling this option the lines will be printed in the VCF output with no consequence data added.    
--chr [list]
  Select a subset of chromosomes to analyse from your file. Any data not on this chromosome in the input will be skipped. The list can be comma separated, with "-" characters representing an interval.
For example, to include chromosomes 1, 2, 3, 10 and X you could use --chr 1-3,10,X Not used by default
   
--coding_only
  Only return consequences that fall in the coding regions of transcripts. Not used by default  
--no_intergenic
  Do not include intergenic consequences in the output. Not used by default  
--pick
  Pick one line or block of consequence data per variant, including transcript-specific columns.
Consequences are chosen according to the criteria described here, and the order the criteria are applied may be customised with --pick_order. This is the best method to use if you are interested only in one consequence per variant. Not used by default
 
--pick_allele
  Like --pick, but chooses one line or block of consequence data per variant allele. Will only differ in behaviour from --pick when the input variant has multiple alternate alleles. Not used by default  
--per_gene
  Output only the most severe consequence per gene. The transcript selected is arbitrary if more than one has the same predicted consequence. Uses the same ranking system as --pick. Not used by default    
--pick_allele_gene
  Like --pick_allele, but chooses one line or block of consequence data per variant allele and gene combination. Not used by default    
--flag_pick
  As per --pick, but adds the PICK flag to the chosen block of consequence data and retains others. Not used by default PICK
--flag_pick_allele
  As per --pick_allele, but adds the PICK flag to the chosen block of consequence data and retains others. Not used by default PICK
--flag_pick_allele_gene
  As per --pick_allele_gene, but adds the PICK flag to the chosen block of consequence data and retains others. Not used by default PICK  
--pick_order [c1,c2,...,cN]
 

Customise the order of criteria (and the list of criteria) applied when choosing a block of annotation data with one of the following options: --pick, --pick_allele, --per_gene, --pick_allele_gene, --flag_pick, --flag_pick_allele, --flag_pick_allele_gene. See this page for the default order.
Valid criteria are: mane_select, mane_plus_clinical, canonical, appris, tsl, biotype, ccds, rank, length, ensembl, refseq. e.g.:

--pick --pick_order tsl,appris,rank
   
--most_severe
  Output only the most severe consequence per variant. Transcript-specific columns will be left blank. Consequence ranks are given in this table.
To include regulatory consequences, use the --regulatory option in combination with this flag.
Not used by default
 
--summary
  Output only a comma-separated list of all observed consequences per variant. Transcript-specific columns will be left blank. Not used by default  
--flag_gencode_primary
  Flags transcripts as GENCODE primary using a boolean value. Not used by default

Note

Only available for human on the GRCh38 assembly
GENCODE_PRIMARY  
--filter_common
  Shortcut flag for the filters below - this will exclude variants that have a co-located existing variant with global AF > 0.01 (1%). May be modified using any of the following freq_* filters. Not used by default FREQS  
--check_frequency
  Turns on frequency filtering. Use this to include or exclude variants based on the frequency of co-located existing variants in the Ensembl Variation database. You must also specify all of the --freq_* flags below. Frequencies used in filtering are added to the output under the FREQS key in the Extra field. Not used by default FREQS  
--freq_pop [pop]
  Name of the population to use in frequency filter. This must be one of the following:

NameDescription
1KG_ALL1000 genomes combined population (global)
1KG_AFR1000 genomes combined African population
1KG_AMR1000 genomes combined American population
1KG_EAS1000 genomes combined East Asian population
1KG_EUR1000 genomes combined European population
1KG_SAS1000 genomes combined South Asian population
gnomADegnomAD exomes combined population
gnomADe_AFRgnomAD exomes African/African American population
gnomADe_AMRgnomAD exomes Latino population
gnomADe_ASJgnomAD exomes Ashkenazi Jewish population
gnomADe_EASgnomAD exomes East Asian population
gnomADe_FINgnomAD exomes Finnish population
gnomADe_NFEgnomAD exomes non-Finnish European population
gnomADe_OTHgnomAD exomes other population
gnomADe_SASgnomAD exomes South Asian population
gnomADggnomAD genomes combined population
gnomADg_AFRgnomAD genomes African/African American population
gnomADg_AMRgnomAD genomes Latino population
gnomADg_AMIgnomAD genomes Amish population
gnomADg_ASJgnomAD genomes Ashkenazi Jewish population
gnomADg_EASgnomAD genomes East Asian population
gnomADg_FINgnomAD genomes Finnish population
gnomADg_MIDgnomAD genomes Mid-eastern population
gnomADg_NFEgnomAD genomes non-Finnish European population
gnomADg_OTHgnomAD genomes other population
gnomADg_SASgnomAD genomes South Asian population

   
--freq_freq [freq]
  Allele frequency to use for filtering. Must be a float value between 0 and 1    
--freq_gt_lt [gt|lt]
  Specify whether the frequency of the co-located variant must be greater than (gt) or less than (lt) the value specified with --freq_freq    
--freq_filter [exclude|include]
  Specify whether to exclude or include only variants that pass the frequency filter    


Database options

Flag Alternate Description Output fields Incompatible with
--database
  Enable VEP to use local or remote databases.  
--host [hostname]
  Manually define the database host to connect to. Users in the US may find connection and transfer speeds quicker using our East coast mirror, useastdb.ensembl.org. Default = "ensembldb.ensembl.org"    
--user [username]
-u
Manually define the database username. Default = "anonymous"    
--password [password]
--pass
Manually define the database password. Not used by default    
--port [number]
  Manually define the database port. Default = 5306    
--genomes
  Override the default connection settings with those for the Ensembl Genomes public MySQL server. Required when using any of the Ensembl Genomes species. Not used by default    
--is_multispecies [0|1]
  Some of the Ensembl Genomes databases (mainly bacteria and protists) are composed of a collection of close species. It updates the database connection settings (i.e. the database name) if the value is set to 1. Default: 0    
--lrg
  Map input variants to LRG coordinates (or to chromosome coordinates if given in LRG coordinates), and provide consequences on both LRG and chromosomal transcripts. Not used by default  
--db_version [number]
  Force VEP to connect to a specific version of the Ensembl databases. Not recommended as there may be conflicts between software and database versions. Not used by default    
--registry [filename]
  Defining a registry file overwrites other connection settings and uses those found in the specified registry file to connect. Not used by default