Phlebotomus perniciosus (Murcia) Assembly and Gene Annotation
About Phlebotomus perniciosus
Phlebotomus is a genus of "sand flies" in the Diptera family Psychodidae. In the past, they have sometimes been considered to belong in a separate family, Phlebotomidae, but this alternative classification is not yet widley accepted.
In the Old World, Phlebotomus sand flies are primarily responsible for the transmission of leishmaniasis,[1] an important parasitic disease, while transmission in the New World, is generally via sand flies of the genus Lutzomyia.[2] The protozoan parasite itself (Leishmania infantum)is a species of the genus Leishmania. Leishmaniasis is a deaedly disease, normally finds a mammalian reservoir in rodents and other small animals such as canids (canine leishmaniasis). The female sand fly carries the Leishmania protozoa from infected animals after feeding, thus transmitting the disease, while the male feeds on plant nectar.
Murcia strain
Sequencing material for this species was collected from whole insects and dissected tissues or extracts; collected in Spain before 1994.
Picture credit: Public domain via Wikimedia Commons (Image source)
Taxonomy ID 46731
More information General information about this species can be found in Wikipedia
Phlebotomus perniciosus preserved or extracts
Assembly
The genome assembly presented here is linked to the assembly accession [GCA_918844115.2]. This genome assembly was produced as part of the Infravec2 project "De novo genome for Infravec. A hematophagous fly Phlebotomus perniciosus, Murcia strain assembly." to study P. perniciosus. The assembly was produced using MegaHit v1.2.9 [3]. This material sampled from P. perniciosus strain 'Murcia', made available as biological resource for distribution via the Infravec product catalog (https://infravec2.eu/). The assembly produced from a single genomic gDNA Illumina PE library "SAMEA10651172" (ERX6791865).
The total length of the assembly is 166001358 bp contained within 77886 scaffolds. The scaffold N50 value is 14455, the scaffold L50 value is 2233. Assembly gaps span 374089 bp. The GC% content of the assembly is 37.0%.
Draft quality and performance assessed with comparison to another sand fly (Phlebotomous); the P. pappatasi reference assembly GCA_000262795.1.
Annotation
RNA-Seq data utilized for genome annotation were obtained from publically available RNA-seq. A combination of RNA-Seq from allied species Phlebotomus papatasi (PRJEB35592 - "RNA-seq of Phlebotomus papatasi after feeding with blood, and blood containing Leishmania major, Leishmania donovani and Herpetomonas muscarum") and Phlebotomus chinensis (PRJNA471571) - "Transcriptome of Phlebotomus chinensis" was utilized for genomic annotation. Due to data restrictions and specifications of input RNA-Seq read lengths required, no publically available RNA-Seq from P. perniciosus was available to use within the Ensembl Annotation pipeline. For in-depth overview of the Gene Annotation pipeline see detailed information here.
Small ncRNAs were obtained using a combination of BLAST and Infernal/RNAfold[4]. Pseudogenes were calculated by examining genes with a large percentage of non-biological introns (introns of <10bp), where the gene was covered in repeats, or where the gene was single exon and evidence of a functional multi-exon paralog was found elsewhere in the genome.
lncRNAs were generated via RNA-seq data where no evidence of protein homology or protein domains could be found in the transcript.
References
- Aoun, Karim, and Aïda Bouratbine. “Cutaneous leishmaniasis in North Africa: a review.” Parasite (Paris, France) vol. 21 (2014): 14.
- Abadías-Granado, I et al. “Cutaneous and Mucocutaneous Leishmaniasis.” “Leishmaniasis cutánea y mucocutánea.” Actas dermo-sifiliograficas, S0001-7310(21)00108-3. 27 (2021)
- Li, Dinghua et al. “MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.” Bioinformatics (Oxford, England) vol. 31,10 (2015): 1674-6.
- Eddy SR. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics. 3(18), (2002).
Statistics
Summary
Assembly | pperniciosus_asm_v2.0, INSDC Assembly GCA_918844115.2, |
Database version | 113.1 |
Golden Path Length | 166,001,358 |
Genebuild by | Ensembl |
Genebuild method | Full genebuild |
Data source | Ensembl Metazoa |
Gene counts
Coding genes | 7,198 |
Non coding genes | 106 |
Small non coding genes | 93 |
Long non coding genes | 11 |
Misc non coding genes | 2 |
Pseudogenes | 936 |
Gene transcripts | 9,855 |