
Anopheles gambiae Assembly and Gene Annotation
About Anopheles gambiae
Anopheles gambiae senso stricto is the primary mosquito vector responsible for the transmission of malaria in most of sub-Saharan Africa. It is a member of a species complex that includes at least seven morphologically indistinguishable species in the Series Pyretophorus in the Anopheles subgenus Cellia. An. gambiae feeds preferentially on humans and is one of the most efficient malaria vectors known.
Picture credit (public domain): James Gathany (CDC) 1994
Assembly
The genome assembly presented here (AgamP4, April 2014) is a revised assembly based on the whole genome shotgun assembly of the PEST strain of Anopheles gambiae [1, 2], plus the addition of a mitochondrial chromosome. The assembly is described in detail in VectorBase.
Annotation
Annotation of the AgamP4 assembly was carried out by VectorBase. The set of gene models presented (genebuild AgamP4.7, released August 2017) combines manual annotation, data provided by the research community, and gene prediction using the Ensembl system. Prediction utilised alignments of dipteran and other protein sets to the genome and generation of GeneWise models, alignment and gene prediction based on Anopheles ESTs, and selected ab initio predictions. More details can be found in VectorBase.
Variation
Variation data for Anopheles gambiae was imported from NCBI dbSNP, and from other studies involving the M and S molecular forms [3, 4, 5].
Variation data is also available for the Anopheles gambiae MR4 reference colonies 4ARR, Kisumu, Akron, L3-5 and G3. These samples were sequenced by the Kwiatkowski group at the Wellcome Trust Sanger Institute, as part of the Malaria Programme's Anopheles gambiae Genome Variation Project. These variants should be considered preliminary, pending further analysis and quality control filtering.
References
- The genome sequence of the malaria mosquito Anopheles gambiae.
Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R et al. 2002. Science. 298:129-149. - Update of the Anopheles gambiae PEST genome assembly.
Sharakhova MV, Hammond MP, Lobo NF, Krzywinski J, Unger MF, Hillenmeyer ME, Bruggner RV, Birney E, Collins FH. 2007. Genome Biology. 8:R5. - SNP genotyping defines complex gene-flow boundaries among African malaria vector mosquitoes.
Neafsey DE, Lawniczak MK, Park DJ, Redmond SN, Coulibaly MB, Traor SF, Sagnon N, Costantini C, Johnson C, Wiegand RC et al. 2010. Science. 330:514-517. - Association mapping of insecticide resistance in wild Anopheles gambiae populations: major variants identified in a low-linkage disequilbrium genome.
Weetman D, Wilding CS, Steen K, Morgan JC, Simard F, Donnelly MJ. 2010. PLoS ONE. 5:e13140. - Gene flow-dependent genomic divergence between Anopheles gambiae M and S forms.
Weetman D, Wilding CS, Steen K, Pinto J, Donnelly MJ. 2012. Molecular Biology and Evolution. 29:279-291.
Statistics
Summary
Assembly | AgamP4, INSDC Assembly GCA_000005575.1, Feb 2006 |
Database version | 92.4 |
Base Pairs | 278,268,413 |
Golden Path Length | 273,109,044 |
Genebuild by | VectorBase |
Genebuild method | Full genebuild |
Data source | VectorBase |
Gene counts
Coding genes | 13,024 |
Non coding genes | 729 |
Small non coding genes | 727 |
Long non coding genes | 2 |
Pseudogenes | 10 |
Gene transcripts | 15,655 |
Other
Snap gene prediction | 24,679 |
Short Variants | 9,089,141 |
Structural variants | 8 |