Variant Simulator
Variant Simulator is a tool for generating all possible single base substitutions (SNPs) in protein coding genes. It can run for a specific species, specific chromosome or specific gene.
One can restrict the SNPs to be generated only for the introns, exons or only coding exons and a specific number of bases around each of them.
For each generated variant the variant_simulator reports the gene symbol or gene stable_id and the Ensembl id of the feature.
Download and install
Variant Simulator is part of variation tools.
Note
Usage
Variant Simulator depends on database access for identifier lookup, and cannot be used in offline mode as per VEP.
The output format is VCF and the INFO field will contain the GENE symbol and FEATURE id.
Generate SNPs for a chromosome
# Running on one chromosome, default species is Homo sapiens: ./simulate_variation -chrom 2 ./simulate_variation -species pig -chrom 2
Output
# First 7 rows of the output: #CHROM POS ID REF ALT QUAL FILTER INFO 2 38814 2-38814-T-A T A . . GENE=FAM110C;FEATURE=ENSG00000184731 2 38814 2-38814-T-C T C . . GENE=FAM110C;FEATURE=ENSG00000184731 2 38814 2-38814-T-G T G . . GENE=FAM110C;FEATURE=ENSG00000184731
Generate SNPs for a gene
# Running on one gene, default species is Homo sapiens: ./simulate_variation -gene ENSG00000139618 ./simulate_variation -gene BRCA2
Output
# First 7 rows of the output: #CHROM POS ID REF ALT QUAL FILTER INFO 13 32315474 13-32315474-G-A G A . . GENE=BRCA2;FEATURE=ENSG00000139618 13 32315474 13-32315474-G-C G C . . GENE=BRCA2;FEATURE=ENSG00000139618 13 32315474 13-32315474-G-T G T . . GENE=BRCA2;FEATURE=ENSG00000139618
Generate SNPs for a gene using exonsOnly
# Running on one gene using only the exons, default species is Homo sapiens: ./simulate_variation -gene BRCA2 -exonsOnly
Output
# First 7 rows of the output: #CHROM POS ID REF ALT QUAL FILTER INFO 13 32357742 13-32357742-C-A C A . . GENE=BRCA2;FEATURE=ENSE00003719469 13 32357742 13-32357742-C-T C T . . GENE=BRCA2;FEATURE=ENSE00003719469 13 32357742 13-32357742-C-G C G . . GENE=BRCA2;FEATURE=ENSE00003719469
Generate SNPs for a gene using codingOnly exons
# Running on one gene using only the coding exons, default species is Homo sapiens: ./simulate_variation -gene BRCA2 -codingOnly
Output
# First 7 rows of the output: #CHROM POS ID REF ALT QUAL FILTER INFO 13 32325076 13-32325076-G-A G A . . GENE=BRCA2;FEATURE=ENSE00003659301 13 32325076 13-32325076-G-C G C . . GENE=BRCA2;FEATURE=ENSE00003659301 13 32325076 13-32325076-G-T G T . . GENE=BRCA2;FEATURE=ENSE00003659301
Generate SNPs for a gene using codingOnly exons with 5bp upstream/downstream of each exon
# Running on one gene using only the coding exons with 5bp flanks, default species is Homo sapiens: ./simulate_variation -gene BRCA2 -codingOnly -edge 5
Output
# First 7 rows of the output: #CHROM POS ID REF ALT QUAL FILTER INFO 13 32325071 13-32325071-T-A T A . . GENE=BRCA2;FEATURE=ENSE00003659301 13 32325071 13-32325071-T-C T C . . GENE=BRCA2;FEATURE=ENSE00003659301 13 32325071 13-32325071-T-G T G . . GENE=BRCA2;FEATURE=ENSE00003659301
Output
Output is in VCF format, for each position three lines will be created, with the following header:- CHROM: chromosome number
- POS: variant position
- ID: string concatenation of chrom-pos-ref-alt
- REF: reference allele
- ALT: alternate allele
- QUAL: empty (.)
- FILTER: empty (.)
- INFO: GENE= will have the value of the gene symbol if it exists, otherwise the Ensembl gene stable_id, FEATURE= will contain the gene or exon stable_id or intron display_id
Options
Flag | Alternate | Description |
---|---|---|
--chrom |
-chr |
Chromosome name to restrict script to. |
--gene |
-g |
Gene symbol or gene Ensembl stable_id to restrict script to. |
--species |
-s |
Species to use. Default value: homo_sapiens |
--assembly |
-a |
Assembly to use if species is homo_sapiens. Default value: grch38 |
--refseq |
Use RefSeq genes/transcripts if species is human. | |
--registry |
File containing database connections in Ensembl registry format (see Ensembl Registry). Default value: connect to latest public Ensembl database | |
--exonsOnly |
Generate all possible SNPs for exons only. | |
--intronsOnly |
Generate all possible SNPs for introns only. | |
--codingOnly |
Generate all possible SNPs for coding exons only. | |
--edge |
upstream and downstream bp for each feature. Default value: 0 | |
--output_file |
-o |
Output file. Default value: simulated.vcf |
--help |
Help usage message |