FinnGen exome query tool

Table of Contents

Introduction

FinnGen exome query tool is a command line interface allowing users to query carriers of variants in FinnGen exome data (link)

Two types of queries are available:

  • search for carriers of a single variant

  • search for coding variants in a gene

Query can be invoked by opening a terminal and executing:/finngen/shared_nfs/finngen/exome_query/run_query

See command line instructions with exome -h and query type specific help for a single variant query (exome var -h) or gene query (exome gene -h) .

Query types

Search for a single variant

The var command allows you to query a single variant. Note that the variant (`variant_id`) needs to be in format chr:pos:ref:alt (e.g. 2:1503826:G:T) and in GRCh38 reference build. To get help on this query type and all its query modes type exome var -h.

Basic command: var `variant_id`

Example query for a simple variant search:

/finngen/shared_nfs/finngen/exome_query/run_query var 'variant_id'

The tool will output summary of the variant carriers and their genotype qualities to the screen.

Search for Variants in a Gene

The gene command allows you to query variants within a specific gene. To get help on this query type and all its query modes type exome gene -h.

Basic command: gene `gene_name`

`gene_name` is the name of the gene in HGNC format. For example, BRCA1.

The tool will output a list of variants along with their details within the specified gene.

Example query for a simple gene search:

/finngen/shared_nfs/finngen/exome_query/run_query gene 'gene_name'

You can request variants with certain functional consequences by giving a list of comma separated consequences e.g. --consequences missense_variant,stop_gained. You can use shorthands for selecting all coding variants (-coding_variants or -coding) or protein truncating variants (--PTV or -P). Coding and PTV consequences used are:

PTV_CONSEQUENCES = [
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
]

CODING_CONSEQUENCES = [
    "missense_variant",
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
    "inframe_insertion",
    "inframe_deletion",
]

Common options

General syntax for using commands and options is: /finngen/shared_nfs/finngen/exome_query/run_query COMMAND OPTION . Replace OPTIONand COMMAND with the appropriate arguments. NOTE however, that the output options need to be added before the command.

Genotype filtering

Genotypes can be filtered by Python compatible syntax

  • --gt_filt ‘python statement resolving to boolean’ -> genotypes not passing are set to missing

  • The statement can use all fields that are declared in the VCF header (GT,DP,GQ) or computed on the fly (AB,pAB)

  • Example for filtering genotypes so that they all have to have > 10 reads supporting the genotype. If genotype is heterozygote, require GQ>20 and p-value of allelic balance deviating from 50/50 > 0.05 . If genotype is homozygous reference, require genotype quality > 30 -gt filt 'DP>10 and ( (GT==”0/1” and GQ>20 and pAB>0.05) or (GT==”0/0” and GQ>30) )'

  • AB and pAB are on the fly computed fields for allelic balance (AB) and probability of deviating from 50/50 balance (pAB)

  • For information on sequencing genotype quality metrics, see e.g. (https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format, "Interpreting genotype and other sample-level information" section)

Output options

--export_carriers /out/carriers.tsv

Exports carrier IDs with genotype information to a specified file.

NOTE the output options need to be added before the command. NOTE also that the start of the path needs to be /out/. After successful execution your output will be in your /home/ivm.

--export_case_control /out/case_control.tsv

Exports cohort of carriers and non-carriers of non-ref alleles of variants. Column "COHORT" contains either CARRIERS/NON_CARRIERS for easy import to other sandbox tool (e.g. Cohort Operations)

Phenotypic consequences of variant carriers

Currently query tool does not contain association analyses, instead you are encouraged to import generated carrier files to Sandbox tools like Cohort Operations or LifeTrack.

Creating a Bash Alias

You can also create an alias of the tool in your /home/ivm/.bashrc .

Open the file with text editor and add the following:

exome () {
    /finngen/shared_nfs/finngen/exome_query/run_query "$@"
}

Type source /home/ivm/.bashrc to activate alias. Now you can use just command exome to invoke the tool.

Last updated

Was this helpful?