Drosophila simulans Assembly and Gene Annotation

About Drosophila simulans

Drosophila simulans is, in evolutionary terms, closely related to D. melanogaster, and was one of 12 fruitfly genomes sequenced for a large comparative study [1]. In addition to comparison with the exceptionally well-studied D. melanogaster, D. simulans is useful for research into speciation due to its close relationship with D. sechellia and D. mauritiana [2]. Ensembl Genomes imports data from FlyBase, who also have much more information about the biology of Drosophila simulans, and a phylogeny of the 12 sequenced fruitfly species.

Picture credit (Creative Commons BY-NC-SA 2.0 FR): Nicolas Gompel 2008. Image shows a female fly.

Assembly

The ASM75419v3 assembly of Drosophila simulans is a chromosome-level assembly with a genome size of ~125Mb [3].

Annotation

Protein-coding and RNA genes, which were annotated with the NCBI eukaryotic genome annotation pipeline, were imported from FlyBase, release dsim_r2.02 (FB2017_04).

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 196,738 Low complexity (Dust) features, covering 16 Mb (11.8% of the genome); 40,696 RepeatMasker features (with the RepBase library), covering 16 Mb (11.5% of the genome); 61,523 Tandem repeats (TRF) features, covering 5 Mb (3.6% of the genome).

Protein domains were annotated with the Ensembl Genomes protein feature pipeline.

References

  1. Evolution of genes and genomes on the Drosophila phylogeny.
    Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al. 2007. Nature. 450:203-218.
  2. The reproductive relationships of Drosophila sechellia with D. mauritiana, D. simulans, and D. melanogaster from the afrotropical region.
    Lachaise D, David JR, Lemeunier F, Tsacas L, Ashburner M. 1986. Evolution. 40(2):262-271.
  3. A second-generation assembly of the Drosophila simulans genome provides new insights into patterns of lineage-specific divergence.
    Hu TT, Eisen MB, Thornton KR, Andolfatto P . 2012. Genome Research. 23:89-98.

Statistics

Summary

AssemblyASM75419v3, INSDC Assembly GCA_000754195.3,
Database version93.2
Base Pairs124,963,774
Golden Path Length124,963,774
Genebuild byFlyBase
Genebuild methodImport
Data sourceFlyBase

Gene counts

Coding genes14,179
Non coding genes1,019
Small non coding genes1,019
Pseudogenes187
Gene transcripts26,100

About this species