Custom data sets

If you want to filter or customise your download, please try Biomart, a web-based querying tool.

FTP Download

Detailed information about the available data and file formats can be found here.

The data can also be downloaded directly from the Ensembl Metazoa FTP server.

Database dumps

Entire databases can be downloaded from our FTP site in a variety of formats. Please be aware that some of these files can run to many gigabytes of data.

Looking for MySQL dumps to install databases locally? Instructions for loading MySQL dumps onto a local MySQL server can be found on the Ensembl website.

Each directory on ftp.ensemblgenomes.org contains a README file, explaining the directory structure.

Programatic data access

Data can be accessed programatically in a number of ways, including the REST service and Perl API. For full details see the Programatic access documenation.

Multi-species data

DatabaseMySQLTSVEMFMAF
Pan_compara Multi-speciesMySQLTSVEMF
Metazoa Multi-speciesMySQLTSVEMFMAF
Ensembl MartMySQL

Single species data

SpeciesDNAcDNACDSncRNAProteinEMBLGENBANKMySQLTSVGTFGFF3GVFVCFVEP
Acyrthosiphon pisumFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Aedes aegyptiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
MySQL(funcgen)
MySQL(variation)
TSVGTFGFF3GVFVCFVEP
Amphimedon queenslandicaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Anopheles darlingiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Anopheles gambiaeFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
MySQL(funcgen)
MySQL(variation)
TSVGTFGFF3GVFVCFVEP
Apis melliferaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Atta cephalotesFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Belgica antarcticaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Bombus impatiensFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Bombyx moriFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Brugia malayiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Caenorhabditis brenneriFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Caenorhabditis briggsaeFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Caenorhabditis elegansFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(funcgen)
TSVGTFGFF3VEP
Caenorhabditis japonicaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Caenorhabditis remaneiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Capitella teletaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Crassostrea gigasFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Culex quinquefasciatusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Danaus plexippusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Daphnia pulexFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Dendroctonus ponderosaeFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Drosophila ananassaeFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila erectaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila grimshawiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila melanogasterFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
MySQL(funcgen)
MySQL(variation)
TSVGTFGFF3GVFVCFVEP
Drosophila mojavensisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila persimilisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Drosophila pseudoobscuraFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila sechelliaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila simulansFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila virilisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila willistoniFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Drosophila yakubaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Heliconius melpomeneFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Helobdella robustaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Ixodes scapularisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
MySQL(variation)
TSVGTFGFF3GVFVCFVEP
Lepeophtheirus salmonisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Lingula anatinaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Loa loaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Lottia giganteaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Lucilia cuprinaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Megaselia scalarisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Melitaea cinxiaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Mnemiopsis leidyiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Nasonia vitripennisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Nematostella vectensisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Octopus bimaculoidesFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Onchocerca volvulusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Pediculus humanusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Pristionchus pacificusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Rhodnius prolixusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Sarcoptes scabieiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Schistosoma mansoniFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Solenopsis invictaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)
MySQL(otherfeatures)
TSVGTFGFF3VEP
Stegodyphus mimosarumFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Strigamia maritimaFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Strongylocentrotus purpuratusFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Strongyloides rattiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Tetranychus urticaeFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Thelohanellus kitaueiFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Tribolium castaneumFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Trichinella spiralisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Trichoplax adhaerensFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP
Zootermopsis nevadensisFASTA (DNA)FASTA (cDNA)FASTA (CDS)FASTA (ncRNA)FASTA (protein)EMBLGenBankMySQL(core)TSVGTFGFF3VEP

To facilitate storage and download all databases are GNU Zip (gzip, *.gz) compressed.

Metadata

Detailed metadata on the genomes provided by Ensembl Genomes is available from the FTP site in TSV, JSON and XML formats (format details).

Ensembl Metazoa: TSV | JSON | XML

Ensembl Genomes (all divisions): TSV | JSON | XML

About the data

The following types of data dumps are available on the FTP site.

FASTA
FASTA sequence databases of Ensembl gene, transcript and protein model predictions. Since the FASTA format does not permit sequence annotation, these database files are mainly intended for use with local sequence similarity search algorithms. Each directory has a README file with a detailed description of the header line format and the file naming conventions.
DNA
Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).
The header line in an FASTA dump files containing DNA sequence consists of the following attributes : coord_system:version:name:start:end:strand This coordinate-system string is used in the Ensembl API to retrieve slices with the SliceAdaptor.
CDS
Coding sequences for Ensembl or ab initio predicted genes.
cDNA
cDNA sequences for Ensembl or ab initio predicted genes.
Peptides
Protein sequences for Ensembl or ab initio predicted genes.
RNA
Non-coding RNA gene predictions.
Annotated sequence
Flat files allow more extensive sequence annotation by means of feature tables and contain thus the genome sequence as annotated by the automated Ensembl genome annotation pipeline. Each nucleotide sequence record in a flat file represents a 1Mb slice of the genome sequence. Flat files are broken into chunks of 1000 sequence records for easier downloading.
EMBL
Ensembl database dumps in EMBL nucleotide sequence database format
GenBank
Ensembl database dumps in GenBank nucleotide sequence database format
MySQL
All Ensembl MySQL databases are available in text format as are the SQL table definition files. These can be imported into any SQL database for a local installation of a mirror site. Generally, the FTP directory tree contains one directory per database. For more information about these databases and their Application Programming Interfaces (or APIs) see the API section.
GTF
Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.
EMF flatfile dumps (variation and comparative data)

Alignments of resequencing data are available for several species as Ensembl Multi Format (EMF) flatfile dumps. The accompanying README file describes the file format.

Also, the same format is used to dump whole-genome multiple alignments as well as gene-based multiple alignments and phylogentic trees used to infer Ensembl orthologues and paralogues. These files are available in the ensembl_compara database which will be found in the mysql directory.

MAF (comparative data)

MAF files are provided for all pairwise alignments. The MAF file format is described here.

GVF (variation data)
GVF (Genome Variation Format) is a simple tab-delimited format derived from GFF3 for variation positions across the genome. There are GVF files for different types of variation data (e.g. somatic variants, structural variants etc). For more information see the "README" files in the GVF directory.
BED format files (comparative data)

Constrained elements calculated using GERP are available in BED format. For more information see the accompanying README file.

BED format is a simple line-based format. The first 3 mandatory columns are:

  • chromosome name (may start with 'chr' for compliance with UCSC)
  • start position. This is a 0-based position
  • end position.

More information on the BED file format