Anopheles gambiae (African malaria mosquito, PEST) (AgamP4)

Anopheles gambiae (African malaria mosquito, PEST) Assembly and Gene Annotation

The Anopheles gambiae data and its display on Ensembl Genomes are made possible through a joint effort by the Ensembl Genomes group and VectorBase, a component of VEuPathDB.

The assembly name may not match that from INSDC due to additional community contributions applied by VEuPathDB to the initial INSDC assembly (recorded by the assembly accession).

About Anopheles gambiae


Anopheles gambiae senso stricto is the primary mosquito vector responsible for the transmission of malaria in most of sub-Saharan Africa. It is a member of a species complex that includes at least seven morphologically indistinguishable species in the Series Pyretophorus in the Anopheles subgenus Cellia. Anopheles gambiae feeds preferentially on humans and is one of the most efficient malaria vectors known. Anopheles gambiae senso stricto is now known to consist of two genetically distinct forms or incipient species, known formally as the A. gambiae M and A. gambiae S forms. Colonies of these two forms have also been sequenced, assembled and provided here on VectorBase as the A. gambiae Mali-NIH (M) and A. gambiae Pimperena (S) genomes.


An. gambiae larvae are generally considered to typically inhabit sunlit, shallow, temporary bodies of fresh water such as ground depressions, puddles, pools and hoof prints . This characteristic may allow predator avoidance as the larvae are able to develop very quickly (~six days from egg to adult under optimal conditions), possibly in response to the ephemeral nature of such larval habitats. An. gambiae larval habitats are therefore often described as containing no (or very sparse) vegetation due to their temporary nature but the great diversity of habitats utilised by An. gambiae includes vegetated (e.g. rice fields) sites. An. gambiae larvae have been reported from habitats containing floating and submerged algae, emergent grass, rice, or 'short plants' and from sites devoid of any vegetation, The variability of larval habitats can be related to the known forms of An. gambiae (e.g. M and S, or Forest, Bamako, Savanna, Mopti and Bissau). For example, the Mopti and M forms are associated with semi-permanent, often man-made, larval habitats such as rice fields or flooded areas, whereas the Savanna/Bamako and S forms are seen more commonly in temporary, rain-dependent sites such as ground puddles.

Resting and feeding preferences

An. gambiae is highly anthropophilic, however, there are indications that An. gambiae can be less discriminant and more opportunistic in its host selection and that host choice is highly influenced by location, host availability and the genetic make-up of the mosquito population. Females of An. gambiae typically feed late at night and are often described as both endophagic and endophilic. Yet there is evidence that indoor and outoor biting are common and both indoor and outdoor resting behaviour appear to be regularly reported. For example, in southern Sierra Leone strong exophily has been demonstrated, linked to the Forest form. Conversely, endophilic behaviour has been linked to Savannah forms. As with host preference, this species appears to exhibit phenotypic plasticity and opportunism in resting locations.

Vectorial capacity

An. gambiae is considered to be one of the most efficient vectors of malaria in the world.

This text was modified from Sinka ME et al. (2010) The dominant Anopheles vectors of human malaria in Africa, Europe and the Middle East: occurrence data, distribution maps and bionomic précis Parasites & Vectors 3:117.

PEST strain

The Anopheles gambiae PEST strain was chosen for genome sequencing because it had both a fixed, standard chromosomal arrangement and a sex-linked pink eye mutation that could readily be used as an indicator of cross-colony contamination (Holt et al 2002: PMID 12364791). The pink eye mutation originated in a colony called A. gambiae LPE established in 1951 at the London School of Hygiene and Tropical Medicine from mosquitoes collected in Lagos, Nigeria. In 1986, this mutation was introduced into a colony of A. gambiae from Asembo Bay western Kenya by crossing males of the LPE strain with female offspring of wild caught Kenyan A. gambiae (the Savanna form), selecting males from the F2 of this cross and then crossing them again with additional female offspring of wild caught Kenyan A. gambiae. From the F2 offspring of this second outcross to Kenyan mosquitoes, a strain was selected that was fixed for pink eye. This outcrossing scheme was repeated one more time in 1987 producing a pink eye strain with a genetic composition largely constituted of the western Kenya Savanna cytogenetic form. In each of these crosses, several hundred female offspring of at least 20 wild caught mosquitoes were used in the cross. This strain, designated A. gambiae PE (Pink Eye), was polymorphic for the inversions 2La (32%) and 2Rbc (19%). The 2Rbc inversion is characteristic of the Mopti chromosomal form, indicating that the original LPE strain from Nigeria was the Mopti form, which is the M molecular form. This inversion was apparently balanced by the uninverted form, because no 2Rbc/bc individuals were detected in the colony. Mukabayire and Besansky selected from this PE strain a set of 9 families whose female parent and at least 20 female offspring were fixed for the standard chromosome karyotype. The progeny of these nine families were pooled to form the A. gambiae PEST strain (Pink Eye STandard). This strain clearly had some Mopti-derived DNA, as the standard karyotype is shared by Mopti and Savanna and the original PE strain did have the 2Rbc inversion rather than 2Rb that is typical of Savanna. Clones from two different PEST strain BAC (Bacterial Artificial Chromosome) libraries had already been end sequenced and physically mapped. When tested, this colony was fully susceptible to P. falciparum from western Kenya. DNA preparation and library construction methods were conducted following standard protocols, and the sequencing method was whole-genome Sanger sequencing. Subsequent to sequencing, the PEST strain was found to be polymorphic for molecular markers diagnostic of the A. gambiae M and S molecular forms. The last known isolate of the A. gambiae PEST colony was lost in about 2005.

Source: VectorBase

Picture credit (public domain): James Gathany (CDC) 1994


As described in Holt et al. (2002), plasmid and BAC DNA libraries were constructed with stringently size-selected PEST strain DNA. Two BAC libraries were constructed, one (ND-TAM) using DNA from whole adult male and female mosquitoes and the other (ND-1) using DNA from ovaries of PEST females collected about 24 hours after the blood meal. Plasmid libraries containing inserts of 2.5, 10 and 50 kb were constructed with DNA derived from either 330 male or 430 female mosquitoes. For each sex, several libraries of each insert size class were made, and these were sequenced such that there was approximately equal coverage from male and female mosquitoes in the final data set. Celera Genomics, Genoscope and TIGR contributed sequence data that collectively provided 10.2-fold coverage, assuming a genome size of 278 Mb. The whole-genome data set was assembled with the Celera assembler (MOZ1 assembly), which constituted the basis of the primary genome publication (Holt et al. 2002).

The first update to this assembly (MOZ2) involved the results of a concerted effort to correct some of the ambiguities in scaffold map locations and orientations by manual analysis of the archived BAC chromosome hybridization photographs and by the hybridization of a small number of new BAC clones selected to resolve questions of scaffold orientation. The new AGP file, and early draft of which was first displayed on the An. gambiae genome poster published in the 4 October 2002 issue of science, formed the basis of a new annotation and gene build displayed on 1 October 2003 (MOZ2) (Mongin et al. 2004). This assembly is also 278 Mb.

In 2006, the major scaffolds were re-ordered into a new golden path file by use of additional physically mapped BAC clones combined with scaffold-to-scaffold sequence comparisons that identified some sequence overlaps. The AgamP3 assembly has a total of 80 scaffolds assigned to and ordered on the chromosome arms X, 2R, 2L, 3R and 3L, 28 of which are newly mapped or oriented. The most significant improvement in this new assembly is 24 scaffolds (8.64 Mbp) located to pericentromeric regions. However, this does not complete the centromeric region of any of the chromosomes. The new GenBank entries, CM000356-CM000360, reflect the revised 2L, 2R, 3L, 3R and X chromosome assemblies. This new assembly (AgamP3) of non-redundant ~264 Mb is still probably an overestimation of the true genome size (Sharakhova et al. 2007).

The AgamP4 assembly differs from the previous version, AgamP3, by the addition of the mitochondrial genome (L20934, 16,655 bp) which includes 13 protein-coding and 24 ncRNAs (22 tRNA and 2 rRNA genes).

Source: VectorBase


Annotation of the AgamP4 assembly was carried out by VectorBase. The set of gene models presented (genebuild AgamP4.13, released July 2019) combines manual annotation, data provided by the research community, and gene prediction using the Ensembl system. Prediction utilised alignments of dipteran and other protein sets to the genome and generation of GeneWise models, alignment and gene prediction based on Anopheles ESTs, and selected ab initio predictions. More details can be found in VectorBase.


  1. An expression map for Anopheles gambiae.
    Maccallum RM, Redmond SN, Christophides GK. 2011. BMC Genomics. 12:620.
  2. Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae.
    Rund SS, Bonar NA, Champion MM, Ghazi JP, Houk CM, Leming MT, Syed Z, Duffield GE. 2013. Scientific Reports. 3:2494.
  3. Genetic dissection of Anopheles gambiae gut epithelial responses to Serratia marcescens.
    Stathopoulos S, Neafsey DE, Lawniczak MK, Muskavitch MA, Christophides GK. 2014. PLoS Pathogens. 10(3)
  4. Radical remodeling of the Y chromosome in a recent radiation of malaria mosquitoes.
    Hall AB, Papathanos PA, Sharma A, Cheng C, Akbari OS, Assour L, Bergman NH, Cagnetti A, Crisanti A, Dottorini T et al. 2016. Science. 113(15)
  5. Reduced-representation sequencing identifies small effective population sizes of Anopheles gambiae in the north-western Lake Victoria basin, Uganda.
    Wiltshire RM, Bergey CM, Kayondo JK, Birungi J, Mukwaya LG, Emrich SJ, Besansky NJ, Collins FH. 2018. Malaria Journal. 17(1):285.
  6. The dominant Anopheles vectors of human malaria in Africa, Europe and the Middle East: occurrence data, distribution maps and bionomic prcis.
    Sinka ME, Bangs MJ, Manguin S, Coetzee M, Mbogo CM, Hemingway J, Patil AP, Temperley WH, Gething PW, Kabaria CW et al. 2010. Parasit Vectors. 3:117.
  7. The genome sequence of the malaria mosquito Anopheles gambiae.
    Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R et al. 2002. Science. 298:129-149.
  8. Update of the Anopheles gambiae PEST genome assembly.
    Sharakhova MV, Hammond MP, Lobo NF, Krzywinski J, Unger MF, Hillenmeyer ME, Bruggner RV, Birney E, Collins FH. 2007. Genome Biology. 8:R5.
  9. SNP genotyping defines complex gene-flow boundaries among African malaria vector mosquitoes.
    Neafsey DE, Lawniczak MK, Park DJ, Redmond SN, Coulibaly MB, Traor SF, Sagnon N, Costantini C, Johnson C, Wiegand RC et al. 2010. Science. 330:514-517.
  10. Association mapping of insecticide resistance in wild Anopheles gambiae populations: major variants identified in a low-linkage disequilbrium genome.
    Weetman D, Wilding CS, Steen K, Morgan JC, Simard F, Donnelly MJ. 2010. PLoS ONE. 5:e13140.
  11. Gene flow-dependent genomic divergence between Anopheles gambiae M and S forms.
    Weetman D, Wilding CS, Steen K, Pinto J, Donnelly MJ. 2012. Molecular Biology and Evolution. 29:279-291.



AssemblyAgamP4, INSDC Assembly GCA_000005575.1, Feb 2006
Database version111.4
Golden Path Length281,378,673
Genebuild byVEuPathDB
Genebuild methodImport
Data sourceVectorBase

Gene counts

Coding genes13,094
Non coding genes729
Small non coding genes727
Long non coding genes2
Gene transcripts15,863


Snap gene prediction24,679
Short Variants61,404,961
Structural variants8