Drosophila melanogaster Assembly and Gene Annotation

About Drosophila melanogaster

Drosophila melanogaster is a cosmopolitan species of fruitfly that has been used as a model organism for over a hundred years, particularly with respect to genetics and developmental biology. It was the second metazoan (the first being Caenorhabditis elegans) to have its genome sequenced [1], and was one of 12 fruitfly genomes included in a large comparative study [2]. Ensembl Genomes imports data from FlyBase, who also have much more information about the biology of Drosophila melanogaster, and a phylogeny of the 12 sequenced fruitfly species.

Picture credit (Creative Commons BY-NC-SA 2.0 FR): Nicolas Gompel 2008. Image shows a female fly.


Ensembl Metazoa uses the Berkeley Drosophila Genome Project (BDGP) assembly release 6 (July 2014). In contrast to release 5, regions without a chromosome assignment are stored as distinct scaffolds, rather than on a single pseudo-chromosome, and heterochromatin regions are incorporated in the chromosomal arms.


Protein-coding and RNA genes were imported from FlyBase, release dmel_r6.17 (FB2017_04).

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 222,247 Low complexity (Dust) features, covering 8 Mb (5.4% of the genome); 44,547 RepeatMasker features (with the RepBase library), covering 27 Mb (18.5% of the genome); 72,600 Tandem repeats (TRF) features, covering 6 Mb (4.1% of the genome).

Protein domains were annotated with the Ensembl Genomes protein feature pipeline.


The Ensembl Metazoa Drosophila melanogaster variation database has been produced using data from release 1.0 of the "50 genomes" data set from the Drosophila Population Genome Project. The data set contains over 6.7 million SNPs from two populations, one comprising 37 lines from north Carolina and the other comprising 15 lines from Malawi. The SNPs were originally generated with respect to assembly version BDGP5, and these were mapped to BDGP6 using the FlyBase converter tool; only 308 SNPs were unable to be mapped.

EST Alignments

Drosophila melanogaster ESTs from dbEST were mapped onto the genome using Exonerate. Drosophila melanogaster cDNA sequences from the BDGP are also aligned to the genome using Exonerate, and the BDGP "Gold Collection" is displayed as a separate track in the genome browser.


  1. The genome sequence of Drosophila melanogaster.
    Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF et al. 2000. Science. 287:2185-2195.
  2. Evolution of genes and genomes on the Drosophila phylogeny.
    Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al. 2007. Nature. 450:203-218.

More information

General information about this species can be found in Wikipedia.



AssemblyBDGP6, INSDC Assembly GCA_000001215.4, Jul 2014
Database version91.6
Base Pairs142,573,017
Golden Path Length143,725,995
Genebuild byFlyBase
Genebuild methodImport
Data sourceFlyBase

Gene counts

Coding genes13,931
Non coding genes3,497
Small non coding genes3,497
Gene transcripts34,776


Short Variants6,794,559

About this species