Lutzomyia longipalpis Assembly and Gene Annotation
About Lutzomyia longipalpis
The sand fly Lutzomyia longipalpis is distributed from Mexico to Argentina, including all the countries of Central America (except Belize) and most of tropical South America east of the Andes (except Guyana, Surinam and French Guiana). Across its distribution range is the major vector of American visceral leishmaniasis. Studies suggest that L. longipalpis may be a single heterogeneous species or a species complex.
Lutzomyia longipalpis (Lutz & Neiva, 1912) (Diptera: Psychodidae) is a new world sand fly. It was the first Lutzomyia species to be recognised as a vector of Leishmania in South and Central America. The female L. longipalpis transmits a protozoan parasite Leishmania infantum (also know as L. chagasi).
The strain of sand flies used for the sequencing were originally collected by Prof Richard Ward in the 1988 from Jacobina, Bahia State, Brazil. They were kept at the Liverpool School of Tropical Medicine before transfer to their present site based at Lancaster University in NW England.
Lutzomyia longipalpis is thought to be a species complex but the number within the complex and their relationships are unclear. The sibling species are thought to differ in their vectorial capacity and genome sequencing will help to identify the most important vector species. This knowledge is vital to understand the epidemiology and control of this neglected disease in South and Central America
Three types of WGS libraries were used to produce these Lutzomyia longipalpis sequencing data: a 454 Titanium fragment library and paired end libraries with 3 kb and 8 kb inserts. The 454 data (11.5 million reads; ~24.4x coverage) was derived from the same individual while mate pair reads (7.4 million 3kb reads, 9.6X; 3.7 million 8kb reads, 4.9X) were derived from a pool of individuals. In total about 22.6 million reads were generating representing 38.9x coverage of this sand fly genome.
The Llon_1.0 assembled draft genome sequence was built from the data described above using the Celera CABOG assembler (version 6.1, 2010/03/22). Next, these initial results were used as a backbone for longer superscaffolds using Baylor's ATLAS-link. Finally, discernible gaps filled with ATLAS-gapfill and ATLAS-gapmerge. The final assembly includes these superscaffolds, which can be ordered and oriented with respect to each other, and isolated sequences that could not be linked (single contig scaffolds or singletons from the original CABOG assembly).
The total length of all contigs is 142.7 Mb; however, the total span of the assembly is 154.2 Mb after gaps are included. The N50 of the contigs is 7.5 kb and the N50 of the scaffolds is 85.1 kb. Both the assembly and the description of the positions and orientations of contigs (AGP) are available from the Sand fly section of the BCM-HGSC web site at: www.hgsc.bcm.tmc.edu.
LlonJ1.6 gene set
Community annotation patch build for July 2019.
|Assembly||LlonJ1, INSDC Assembly GCA_000265325.1, Jun 2012|
|Golden Path Length||154,229,266|
|Non coding genes||338|
|Small non coding genes||334|
|Long non coding genes||4|
|Snap gene prediction||37,229|