EMBL-EBI User Survey 2024

Do data resources managed by EMBL-EBI and our collaborators make a difference to your work?

Please take 10 minutes to fill in our annual user survey, and help us make the case for why sustaining open data resources is critical for life sciences research.

Survey link: https://www.surveymonkey.com/r/HJKYKTT?channel=[webpage]

Drosophila yakuba (Fruit fly, Tai18E2) (dyak_caf1)

Drosophila yakuba (Fruit fly, Tai18E2) Assembly and Gene Annotation

About Drosophila yakuba

Drosophila yakuba is an African fruitfly that is predominantly found in open savanna, and was one of 12 fruitfly genomes sequenced for a large comparative study [1]. Ensembl Genomes imports data from FlyBase, who also have more information about the biology of Drosophila yakuba, and a phylogeny of the 12 sequenced fruitfly species.

Picture credit (Creative Commons BY-NC-SA 2.0 FR): Nicolas Gompel 2008. Image shows a female fly.

Assembly

This is the November 2005 genome assembly 2.0 of Drosophila yakuba. The 9.1x whole-genome shotgun sequencing was performed and assembly provided by the Genome Sequencing Center, Washington University School of Medicine in St. Louis (WUGSC).

The data were assembled using PCAP [2] followed by two rounds of primer-directed sequence improvement targeted at improving regions of low-quality sequence and closing gaps. The data was re-assembled and supercontigs were assigned to chromosomes based on their alignment to the D. melanogaster genome. Inversions based on the D. yakuba assembly were then introduced and checked against polytene chromosome banding data.

Annotation

Protein-coding and RNA genes, which were annotated with the NCBI eukaryotic genome annotation pipeline, were imported from FlyBase, release dyak_r1.05 (FB2017_04).

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 254,353 Low complexity (Dust) features, covering 11 Mb (6.6% of the genome); 70,827 RepeatMasker features (with the RepBase library), covering 31 Mb (19.0% of the genome); 122,416 Tandem repeats (TRF) features, covering 9 Mb (5.5% of the genome).

Protein domains were annotated with the Ensembl Genomes protein feature pipeline.

References

  1. Evolution of genes and genomes on the Drosophila phylogeny.
    Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W et al. 2007. Nature. 450:203-218.
  2. PCAP: a whole-genome assembly program.
    Huang X, Wang J, Aluru S, Yang SP, Hillier L. 2003. Genome Research. 13:2164-2170.

Statistics

Summary

Assemblydyak_caf1, INSDC Assembly GCA_000005975.1,
Database version112.2
Golden Path Length165,693,946
Genebuild byFlyBase
Genebuild methodImport
Data sourceFlyBase

Gene counts

Coding genes14,824
Non coding genes1,187
Small non coding genes1,187
Pseudogenes338
Gene transcripts25,303