Phlebotomus papatasi (Sand fly, Israel) (PpapI1)

Phlebotomus papatasi (Sand fly, Israel) Assembly and Gene Annotation

The Phlebotomus papatasi data and its display on Ensembl Genomes are made possible through a joint effort by the Ensembl Genomes group and VectorBase, a component of VEuPathDB.

The assembly name may not match that from INSDC due to additional community contributions applied by VEuPathDB to the initial INSDC assembly (recorded by the assembly accession).

About Phlebotomus papatasi

The sandfly Phlebotomus papatasi is the main vector of the Old World Cutaneous leishmaniasis. It is distributed from Morocco to the Indian subcontinent and from southern Europe to central and eastern Africa.

Israel strain

The Israeli strain of Ph. papatasi was originally was given to Walter Reed Army Institute of Research (WRAIR) in 1983 from the Hebrew University, Jerusalem. The population size of this colony fluctuated over time, experiencing several bottlenecks. The colony was expanded from a small number of flies at the University of Notre Dame.

Source: VectorBase

Picture credit: James Gathany Public domain via Wikimedia Commons (Image source)

PpapI1 assembly

Sequencing and assembly was performed by The Genome Institute, Washington University School of Medicine (St. Louis). DNA used for sequencing was derived from Phlebotomus papatasi females from the laboratory of Dr. Mary Ann McDowell, Eck Institute for Global Health, University of Notre Dame (Notre Dame, IN). These sand flies have been through several bottlenecks and were presumed highly inbred. All sequences were generated on the Roche 454 Titanium instrument with the exception of the BAC-ends, which were generated on an ABI3730.

The Ppap1 assembly was built with the --het option, using the Newbler assembler test release 2.6RC02 from an input of ~22.5x total sequence coverage including 15.1X of whole-genome shotgun reads, 4.4X 3kb clone inserts, 3.0X 8kb inserts and 0.01X BAC-end read pairs. The fragment and 3kb data were generated from a single fly after whole genome amplification (WGA), while the 8kb and BES data were derived from multiple flies. All scaffolds larger than 200 bases (n = 123,558) total 347,840,937 bases with an N50 scaffold size and number of 23,692 and 2311, respectively. The longest scaffold was 2.04MB in length.

Prior to submission to NCBI this assembly was screened for contamination and 247 contigs were removed. Additional contigs were removed or merged by the in house program PGA, PolyGraph Assembler, which collapses heterozygous contigs and reduced the assembled genome size from 410mb to 343mb. A total of 5661 gaps were closed and nearly 6.8mb of sequence was added by another in house post-assembly program, PyGap. This program detects and merges overlaps of adjoining contigs, and attempts to close gaps between non-overlapping adjoining contigs with Illumina data. The same Illumina data used in gap closure was aligned to the assembly to correct 89,378 presumed 454 insertion/deletion errors.

PpapI1.6 gene set

Community annotation patch build for July 2019.

Statistics

Summary

AssemblyPpapI1, INSDC Assembly GCA_000262795.1, May 2012
Database version111.1
Golden Path Length363,767,980
Genebuild byVEuPathDB
Genebuild methodImport
Data sourceVectorBase

Gene counts

Coding genes11,391
Non coding genes444
Small non coding genes442
Long non coding genes2
Gene transcripts11,849

Other

Snap gene prediction41,737
Short Variants4,317,604