Drosophila simulans (Fruit fly, w501) Assembly and Gene Annotation
About Drosophila simulans
Drosophila simulans is, in evolutionary terms, closely related to D. melanogaster, and was one of 12 fruitfly genomes sequenced for a large comparative study [1]. In addition to comparison with the exceptionally well-studied D. melanogaster, D. simulans is useful for research into speciation due to its close relationship with D. sechellia and D. mauritiana [2]. Ensembl Genomes imports data from FlyBase, who also have much more information about the biology of Drosophila simulans, and a phylogeny of the 12 sequenced fruitfly species.
Picture credit (Creative Commons BY-NC-SA 2.0 FR): Nicolas Gompel 2008. Image shows a female fly.
Assembly
The ASM75419v3 assembly of Drosophila simulans is a chromosome-level assembly with a genome size of ~125Mb [3].
Annotation
Protein-coding and RNA genes, which were annotated with the NCBI eukaryotic genome annotation pipeline, were imported from FlyBase, release dsim_r2.02 (FB2017_04).
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 180,460 Low complexity (Dust) features, covering 6 Mb (4.6% of the genome); 31,441 RepeatMasker features (with the RepBase library), covering 9 Mb (7.2% of the genome); 60,487 Tandem repeats (TRF) features, covering 3 Mb (2.5% of the genome).
Protein domains were annotated with the Ensembl Genomes protein feature pipeline.
References
- Evolution of genes and genomes on the Drosophila
phylogeny.
Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al. 2007. Nature. 450:203-218. - The reproductive relationships of Drosophila sechellia with D.
mauritiana, D. simulans, and D. melanogaster from the
afrotropical region.
Lachaise D, David JR, Lemeunier F, Tsacas L, Ashburner M. 1986. Evolution. 40(2):262-271. - A second-generation assembly of the Drosophila simulans genome
provides new insights into patterns of lineage-specific
divergence.
Hu TT, Eisen MB, Thornton KR, Andolfatto P . 2012. Genome Research. 23:89-98.
Statistics
Summary
Assembly | ASM75419v3, INSDC Assembly GCA_000754195.3, |
Database version | 113.3 |
Golden Path Length | 124,963,774 |
Genebuild by | FlyBase |
Genebuild method | Import |
Data source | FlyBase |
Gene counts
Coding genes | 14,179 |
Non coding genes | 1,180 |
Small non coding genes | 1,180 |
Pseudogenes | 187 |
Gene transcripts | 26,261 |