EMBL-EBI User Survey 2024

Do data resources managed by EMBL-EBI and our collaborators make a difference to your work?

Please take 10 minutes to fill in our annual user survey, and help us make the case for why sustaining open data resources is critical for life sciences research.

Survey link: https://www.surveymonkey.com/r/HJKYKTT?channel=[webpage]

Drosophila melanogaster (Fruit fly) (BDGP6.46)

Drosophila melanogaster (Fruit fly) Assembly and Gene Annotation

About Drosophila melanogaster

Drosophila melanogaster is a cosmopolitan species of fruitfly that has been used as a model organism for over a hundred years, particularly with respect to genetics and developmental biology. It was the second metazoan (the first being Caenorhabditis elegans) to have its genome sequenced [1], and was one of 12 fruitfly genomes included in a large comparative study [2]. Ensembl Genomes imports data from FlyBase, who also have much more information about the biology of Drosophila melanogaster, and a phylogeny of the 12 sequenced fruitfly species.

Picture credit (Creative Commons BY-NC-SA 2.0 FR): Nicolas Gompel 2008. Image shows a female fly.

Assembly

Ensembl Metazoa uses the Berkeley Drosophila Genome Project (BDGP) assembly release 6 (July 2014). In contrast to release 5, regions without a chromosome assignment are stored as distinct scaffolds, rather than on a single pseudo-chromosome, and heterochromatin regions are incorporated in the chromosomal arms.

Annotation

Protein-coding and RNA genes were imported from FlyBase, release dmel_r6.46 (FB2022_03).

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 222,250 Low complexity (Dust) features, covering 8 Mb (5.5% of the genome); 48,677 RepeatMasker features (with a custom library), covering 28 Mb (19.5% of the genome); 1426 RepeatMasker features (with the RepBase library), covering 0 Mb (0.1% of the genome); 72,600 Tandem repeats (TRF) features, covering 6 Mb (4.1% of the genome).

Protein domains were annotated with the Ensembl Genomes protein feature pipeline.

References

  1. The genome sequence of Drosophila melanogaster.
    Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF et al. 2000. Science. 287:2185-2195.
  2. Evolution of genes and genomes on the Drosophila phylogeny.
    Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al. 2007. Nature. 450:203-218.

Statistics

Summary

AssemblyBDGP6.46, INSDC Assembly GCA_000001215.4,
Database version112.10
Golden Path Length143,726,002
Genebuild byFlyBase
Genebuild methodImport
Data sourceFlyBase

Gene counts

Coding genes13,986
Non coding genes4,054
Small non coding genes4,054
Pseudogenes340
Gene transcripts41,620