FinnGen exome query tool
Last updated
Was this helpful?
Last updated
Was this helpful?
FinnGen exome query tool is a command line interface allowing users to query carriers of variants in FinnGen exome data (link)
Two types of queries are available:
search for carriers of a single variant
search for coding variants in a gene
Query can be invoked by opening a terminal and executing:/finngen/shared_nfs/finngen/exome_query/run_query
See command line instructions with exome -h
and query type specific help for a single variant query (exome var -h)
or gene query (exome gene -h)
.
The var
command allows you to query a single variant. Note that the variant (`variant_id`)
needs to be in format chr:pos:ref:alt (e.g. 2:1503826:G:T) and in GRCh38 reference build. To get help on this query type and all its query modes type exome var -h.
Basic command: var `variant_id`
Example query for a simple variant search:
/finngen/shared_nfs/finngen/exome_query/run_query var 'variant_id'
The tool will output summary of the variant carriers and their genotype qualities to the screen.
The gene
command allows you to query variants within a specific gene. To get help on this query type and all its query modes type exome gene -h.
Basic command: gene `gene_name`
`gene_name`
is the name of the gene in HGNC format. For example, BRCA1.
The tool will output a list of variants along with their details within the specified gene.
Example query for a simple gene search:
/finngen/shared_nfs/finngen/exome_query/run_query gene 'gene_name'
You can request variants with certain functional consequences by giving a list of comma separated consequences e.g. --consequences missense_variant,stop_gained.
You can use shorthands for selecting all coding variants (-coding_variants
or -coding
) or protein truncating variants (--PTV
or -P
). Coding and PTV consequences used are:
General syntax for using commands and options is: /finngen/shared_nfs/finngen/exome_query/run_query COMMAND OPTION
. Replace OPTION
and COMMAND
with the appropriate arguments. NOTE however, that the output options need to be added before the command.
Genotypes can be filtered by Python compatible syntax
--gt_filt ‘python statement resolving to boolean’ -> genotypes not passing are set to missing
The statement can use all fields that are declared in the VCF header (GT,DP,GQ) or computed on the fly (AB,pAB)
Example for filtering genotypes so that they all have to have > 10 reads supporting the genotype. If genotype is heterozygote, require GQ>20 and p-value of allelic balance deviating from 50/50 > 0.05 . If genotype is homozygous reference, require genotype quality > 30
-gt filt 'DP>10 and ( (GT==”0/1” and GQ>20 and pAB>0.05) or (GT==”0/0” and GQ>30) )'
AB and pAB are on the fly computed fields for allelic balance (AB) and probability of deviating from 50/50 balance (pAB)
--export_carriers /out/carriers.tsv
Exports carrier IDs with genotype information to a specified file.
NOTE the output options need to be added before the command. NOTE also that the start of the path needs to be /out/. After successful execution your output will be in your /home/ivm.
--export_case_control /out/case_control.tsv
Exports cohort of carriers and non-carriers of non-ref alleles of variants. Column "COHORT" contains either CARRIERS/NON_CARRIERS for easy import to other sandbox tool (e.g. Cohort Operations)
You can also create an alias of the tool in your /home/ivm/.bashrc
.
Open the file with text editor and add the following:
Type source /home/ivm/.bashrc
to activate alias. Now you can use just command exome
to invoke the tool.
For information on sequencing genotype quality metrics, see e.g. (, "Interpreting genotype and other sample-level information" section)
Currently query tool does not contain association analyses, instead you are encouraged to import generated carrier files to Sandbox tools like or.