Drosophila pseudoobscura pseudoobscura (Fruit fly, MV2-25) Assembly and Gene Annotation
About Drosophila pseudoobscura
Drosophila pseudoobscura is a North American fruitfly that has been used extensively in genetic studies, particularly with respect to speciation. It was the second fruitfly to have its genome sequenced [1], and was one of 12 fruitfly genomes included in a large comparative study [2]. Ensembl Genomes imports data from FlyBase, who also have much more information about the biology of Drosophila pseudoobscura, and a phylogeny of the 12 sequenced fruitfly species.
Picture credit: MA Hanson Creative Commons Attribution 4.0 via Wikimedia Commons (Image source)
Assembly
Ensembl Metazoa uses the v3.0 genome assembly of Drosophila pseudoobscura [3]. The genome was sequenced and assembled by the Human Genome Sequencing Center at Baylor College of Medicine (BCM-HGSC); further details are provided by FlyBase.
Annotation
Protein-coding and RNA genes, which were annotated with the NCBI eukaryotic genome annotation pipeline, were imported from FlyBase, release dpse_r3.04 (FB2017_04).
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 286,881 Low complexity (Dust) features, covering 13 Mb (8.8% of the genome); 25,800 RepeatMasker features (with the RepBase library), covering 10 Mb (6.7% of the genome); 181,856 Tandem repeats (TRF) features, covering 12 Mb (7.7% of the genome).
Protein domains were annotated with the Ensembl Genomes protein feature pipeline.
References
- Comparative genome sequencing of Drosophila pseudoobscura:
chromosomal, gene, and cis-element
evolution.
Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, Thornton K, Hubisz MJ, Chen R, Meisel RP et al. 2005. Genome Research. 15:1-18. - Evolution of genes and genomes on the Drosophila
phylogeny.
Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al. 2007. Nature. 450:203-218. - Mind the gap: upgrading genomes with Pacific Biosciences RS
long-read sequencing
technology.
English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC et al. 2012. PLoS ONE. 7:e47768.
Statistics
Summary
Assembly | Dpse_3.0, INSDC Assembly GCA_000001765.2, |
Database version | 113.4 |
Golden Path Length | 152,696,384 |
Genebuild by | FlyBase |
Genebuild method | Import |
Data source | FlyBase |
Gene counts
Coding genes | 14,574 |
Non coding genes | 2,449 |
Small non coding genes | 2,449 |
Pseudogenes | 273 |
Gene transcripts | 27,888 |