Phlebotomus perniciosus (Murcia) (pperniciosus_asm_v2.0)

Phlebotomus perniciosus (Murcia) Assembly and Gene Annotation

About Phlebotomus perniciosus

Phlebotomus is a genus of "sand flies" in the Diptera family Psychodidae. In the past, they have sometimes been considered to belong in a separate family, Phlebotomidae, but this alternative classification is not yet widley accepted.

In the Old World, Phlebotomus sand flies are primarily responsible for the transmission of leishmaniasis,[1] an important parasitic disease, while transmission in the New World, is generally via sand flies of the genus Lutzomyia.[2] The protozoan parasite itself (Leishmania infantum)is a species of the genus Leishmania. Leishmaniasis is a deaedly disease, normally finds a mammalian reservoir in rodents and other small animals such as canids (canine leishmaniasis). The female sand fly carries the Leishmania protozoa from infected animals after feeding, thus transmitting the disease, while the male feeds on plant nectar.

Murcia strain

Sequencing material for this species was collected from whole insects and dissected tissues or extracts; collected in Spain before 1994.

Picture credit: Public domain via Wikimedia Commons (Image source)

Taxonomy ID 46731

More information General information about this species can be found in Wikipedia

Phlebotomus perniciosus preserved or extracts

Assembly

The genome assembly presented here is linked to the assembly accession [GCA_918844115.2]. This genome assembly was produced as part of the Infravec2 project "De novo genome for Infravec. A hematophagous fly Phlebotomus perniciosus, Murcia strain assembly." to study P. perniciosus. The assembly was produced using MegaHit v1.2.9 [3]. This material sampled from P. perniciosus strain 'Murcia', made available as biological resource for distribution via the Infravec product catalog (https://infravec2.eu/). The assembly produced from a single genomic gDNA Illumina PE library "SAMEA10651172" (ERX6791865).

The total length of the assembly is 166001358 bp contained within 77886 scaffolds. The scaffold N50 value is 14455, the scaffold L50 value is 2233. Assembly gaps span 374089 bp. The GC% content of the assembly is 37.0%.

Draft quality and performance assessed with comparison to another sand fly (Phlebotomous); the P. pappatasi reference assembly GCA_000262795.1.

Annotation

RNA-Seq data utilized for genome annotation were obtained from publically available RNA-seq. A combination of RNA-Seq from allied species Phlebotomus papatasi (PRJEB35592 - "RNA-seq of Phlebotomus papatasi after feeding with blood, and blood containing Leishmania major, Leishmania donovani and Herpetomonas muscarum") and Phlebotomus chinensis (PRJNA471571) - "Transcriptome of Phlebotomus chinensis" was utilized for genomic annotation. Due to data restrictions and specifications of input RNA-Seq read lengths required, no publically available RNA-Seq from P. perniciosus was available to use within the Ensembl Annotation pipeline. For in-depth overview of the Gene Annotation pipeline see detailed information here.

Small ncRNAs were obtained using a combination of BLAST and Infernal/RNAfold[4]. Pseudogenes were calculated by examining genes with a large percentage of non-biological introns (introns of <10bp), where the gene was covered in repeats, or where the gene was single exon and evidence of a functional multi-exon paralog was found elsewhere in the genome.

lncRNAs were generated via RNA-seq data where no evidence of protein homology or protein domains could be found in the transcript.

References

  1. Aoun, Karim, and Aïda Bouratbine. “Cutaneous leishmaniasis in North Africa: a review.” Parasite (Paris, France) vol. 21 (2014): 14.
  2. Abadías-Granado, I et al. “Cutaneous and Mucocutaneous Leishmaniasis.” “Leishmaniasis cutánea y mucocutánea.” Actas dermo-sifiliograficas, S0001-7310(21)00108-3. 27 (2021)
  3. Li, Dinghua et al. “MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.” Bioinformatics (Oxford, England) vol. 31,10 (2015): 1674-6.
  4. Eddy SR. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics. 3(18), (2002).

Statistics

Summary

Assemblypperniciosus_asm_v2.0, INSDC Assembly GCA_918844115.2,
Database version113.1
Golden Path Length166,001,358
Genebuild byEnsembl
Genebuild methodFull genebuild
Data sourceEnsembl Metazoa

Gene counts

Coding genes7,198
Non coding genes106
Small non coding genes93
Long non coding genes11
Misc non coding genes2
Pseudogenes936
Gene transcripts9,855