Culicoides sonorensis (Biting Midge) (Cson1)

Culicoides sonorensis (Biting Midge) Assembly and Gene Annotation

About Culicoides sonorensis

The biting midge Culicoides sonorensis belonging to the diptera group: Ceratopogonidae is an important vector of pathogenic arboviruses (arthropod-borne viruses). The main viral pathogen transmitted by biting midges is bluetongue virus (BTV), which is prevalent across the USA but also occurs internationally. Increasingly, outbreaks of BTV infection have been reported in many parts of Europe while the range of midge associated BTV infection has also widened. The C. sonorensis genome provides the first attempt at understanding the transmission of arboviruses outside of previously investigated well-studied arthropod viral vectors such as the Aedes mosquitos, (separated from midges by ~220 million years). C. sonorensis represents the first genome resource for understanding the transmission of non-zoonotic arbovirus and the drivers of susceptibility of infection in ruminants (Cows, sheep etc.) and wildlife (deer and equine). Culicoides has other interesting biological characteristics including the capacity for reaching huge population densities and their ability for long-distance dispersal by semi-passive flight under susceptible conditions.

Image Credit: The Pirbright Institute


The genome of C. sonorensis is derived from pooled samples of both male (n=375) and female (n=150) individuals. DNA samples were obtained from the 'AA' colony originally established in the Kerrville laboratory, (Texas, USA) and then maintained and hosted in the Pirbright Institute (Woking, UK) from 1969 without any additional inbreeding. Library construction and sequencing were performed at the Earlham institute (Norwich, UK). The genome was sequenced from 200 bp PE and 4.4Kb average insert MP libraries, using a combination of illumina HiSeq 2000 and 2500 systems at a coverage of ~27X. The present assembly of C. sonorensis genome[1], which has four chromosomes, was generated at EMBL-EBI (Hinxton, UK). The assembly length of C. sonorensi's genome is 189Mb, housed in 7,974 scaffolds has a contig N50 of 30,774 and a scaffold N50 of 89,077bp. The proportion of assembly gaps or N's was 2.63%. Coverage of repeats 14% and low complexity regions (15.7%) can be located to 793 repeat regions or 29.7% of the overall genome length.


Gene models were constructed de-novo in EMBL-EBI (Hinxton, UK) using MAKER v2.31.6 resulting in 15,810 protein-coding genes predicted. Gene predictions were curated and refined using WebApollo v1.0. RNA-Seq data was obtained as part of the associated C. sonorensis genome publication^1^ and previously reported transcriptome data^2,3^. RNA-Seq data from the genome associated publication was generated from BTV-competent, BTV-refractory, blood-fed and sucrose-fed midges which was then sequenced at Edinburgh Genomics (The University of Edinburgh, Scotland).


  1. The genome of the biting midge Culicoides sonorensis and gene expression analyses of vector competence for bluetongue virus.
    Hinsley M, Armean IM, Silk R, Harrup LE, Gonzalez-Uriarte A, Veronesi E, Campbell L, Nayduch D, Saski C, Tabachnick WJ et al. 2018. BMC Genomics. 19(1):624.
  2. The Reference Transcriptome of the Adult Female Biting Midge (Culicoides sonorensis) and Differential Gene Expression Profiling during Teneral, Blood, and Sucrose Feeding Conditions.
    Lee MB, Saski CA. 2014. PLoS One. 9(5):e98123.
  3. Gene discovery and differential expression analysis of humoral immune response elements in female Culicoides sonorensis (Diptera: Ceratopogonidae).
    Lee MB, Saski CA. 2014. Parasit Vectors. 7:388.



AssemblyCson1, INSDC Assembly GCA_900258525.1,
Database version112.1
Golden Path Length194,177,243
Genebuild byEnsembl Metazoa
Genebuild methodFull genebuild
Data sourceEnsembl Metazoa

Gene counts

Coding genes15,612
Gene transcripts21,241