Crassostrea gigas (Pacific oyster) - GCA_902806645.1 (GCA902806645v1)

Crassostrea gigas (Pacific oyster) - GCA_902806645.1 Assembly and Gene Annotation

About Crassostrea gigas

The Pacific oyster, Crassostrea gigas, also referred to as Magallana gigas, is a bivalve mollusc with vital roles in coastal ecosystems and aquaculture globally. Pacific oysters are sessile organisms that thrive in a wide range of environmental conditions. Although native to Asia, they have been introduced to most continents for farming purposes. In some areas, self-sustaining populations have been established, leading to C. gigas being recognised as an invasive species. The expansion of specific gene families appears to be an important evolutionary adaptation strategy of C. gigas. The Pacific oyster has many characteristics that are representative of molluscs and, more generally, lophotrochozoans. Its genome provides insight into metazoan evolution and is a valuable resource for aquaculture research [1, 2].

Source: Carolina Peñaloza[1], The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, 2022.

Picture credit: Chantelle Hooper, Genomics and Systematics, Cefas Weymouth Laboratory, 2022.

Assembly

The genome of a single female Pacific oyster (Crassostrea gigas) from a commercial oyster hatchery (Guernsey Sea Farms, United Kingdom) was sequenced and assembled[1].

DNA was isolated from gill tissue. Genome sequencing was performed on (i) a PacBio Sequel system at ~70 x coverage and (ii) an Illumina HiSeq X platform to a mean coverage of ~50 x. The oyster genome was assembled from PacBio reads using Canu v1.8 and then error corrected using Arrow v2.3.2 and Pilon v1.2.3.

The initial Canu-assembly was substantially larger than expected (~1.2 Gb), likely due to the high levels of heterozygosity present in the genome of C. gigas. Highly divergent haplotypes were identified among the contigs and reassigned with a combination of the Purge Haplotigs pipeline and an all-versus-all contig mapping approach.

The haploid version of the assembly was scaffolded using Hi-C sequence reads and integrated with a previously published linkage map (~20K SNPs), resulting in a chromosome-level assembly comprising ten large scaffolds (2n = 20 in C. gigas). Scaffolds with a high fraction of regions (>30%) showing abnormal coverage (i.e. 2SD above or below the mean) were removed from the assembly.

The final cgigas_uk_roslin_v1 genome assembly (647.9 Mb) contains the ten expected chromosomes and 226 unplaced scaffolds, with a total N50 of 58.5 Mb and 1.6 Mb for scaffold and contig lengths, respectively.

Annotation

Genome annotation was carried out using long-read PacBio Iso-Seq data from six distinct male and female tissue samples (gill, mantle, digestive gland, heart, adductor muscle and gonads)[1] in addition to the Illumina paired-end RNA-seq data available from Zhang et al. 2012 [2].

Gene models were created by BRAKER v.2.1.5 with GeneMark v.4.61_lic and Augustus v3.3.3 using only the paired-end RNA-seq datasets. The Iso-Seq transcript models were generated by mapping the PacBio full length non-chimeric reads to the oyster assembly by minimap2 v.2.16.

The short-read and long-read transcript models were merged using tama_merge from the TAMA package (https://github.com/GenomeRIK/tama/). Protein-coding transcripts and translation start and end positions were predicted by mapping known protein sequences from UniRef90 to the oyster transcripts with Diamond v.0.9.31.

References

  1. A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas.
    Peñaloza C, Gutierrez AP, Eöry L, Wang S, Guo X, Archibald AL, Bean TP, Houstod RD. Gigascience. 2021 Mar;10(3). DOI: 10.1093/gigascience/giab020. PMID: 33764468.

  2. The oyster genome reveals stress adaptation and complexity of shell formation.
    Zhang G, Fang X, Guo X, Li L, Luo R, Xu F, Yang P, Zhang L, Wang X, Qi H et al. 2012. Nature. 490:49-54.

Statistics

Summary

AssemblyGCA902806645v1, INSDC Assembly GCA_902806645.1,
Database version107.2
Golden Path Length647,905,321
Genebuild byThe Roslin Institute, The University of Edinburgh
Genebuild methodImport
Data sourceThe Roslin Institute, The University of Edinburgh

Gene counts

Coding genes30,418
Non coding genes5,594
Small non coding genes5,576
Long non coding genes18
Gene transcripts110,973