# FinnGen exome query tool

### Table of Contents

* [Introduction](#introduction-1)
* [Query types](#query-types)
* [Common options](#common-options)
* [Phenotypic consequences of variant carriers](#phenotypic-consequences-of-variant-carriers)
* [Creating a Bash Alias](#creating-a-bash-alias)

## Introduction

FinnGen exome query tool is a command line interface allowing users to query carriers of variants in FinnGen exome data (link)

Two types of queries are available:

* search for carriers of a single variant
* search for coding variants in a gene

Query can be invoked by opening a terminal and executing:`/finngen/shared_nfs/finngen/exome_query/run_query` &#x20;

See command line instructions with `exome -h`  and query type specific help for a single variant query `(exome var -h)` or gene query `(exome gene -h)` .

## Query types

### Search for a single variant

The `var`  command allows you to query a single variant. Note that the variant (`` `variant_id`) `` needs to be in format chr:pos:ref:alt  (e.g. 2:1503826:G:&#x54;**)** and in GRCh38 reference build. To get help on this query type and all its query modes type `exome var -h.`

Basic command: `` var `variant_id` ``&#x20;

**Example query for a simple variant search:**&#x20;

`/finngen/shared_nfs/finngen/exome_query/run_query var 'variant_id'`

The tool will output summary of the variant carriers and their genotype qualities to the screen.

### Search for Variants in a Gene

The `gene`  command allows you to query variants within a specific gene. To get help on this query type and all its query modes type `exome gene -h.`

Basic command:  `` gene `gene_name` ``&#x20;

`` `gene_name` `` is the name of the gene in HGNC format. For example, **BRCA1**.&#x20;

The tool will output a list of variants along with their details within the specified gene.

**Example query for a simple gene search:**&#x20;

`/finngen/shared_nfs/finngen/exome_query/run_query gene 'gene_name'`

You can request variants with certain functional consequences by giving a list of comma separated consequences e.g. `--consequences missense_variant,stop_gained.` You can use shorthands for selecting all coding variants (`-coding_variants` or `-coding`) or protein truncating variants (`--PTV` or `-P`).  Coding and PTV consequences used are:

```python
PTV_CONSEQUENCES = [
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
]

CODING_CONSEQUENCES = [
    "missense_variant",
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
    "inframe_insertion",
    "inframe_deletion",
]
```

## Common options

General syntax for using commands and options is: `/finngen/shared_nfs/finngen/exome_query/run_query COMMAND OPTION` . Replace `OPTION`and `COMMAND` with the appropriate arguments. **NOTE however, that the output options need to be added before the command.**&#x20;

### Genotype filtering

Genotypes can be filtered by Python compatible syntax

* \--gt\_filt ‘python statement resolving to boolean’ -> genotypes not passing are set to missing&#x20;
* The statement can use all fields that are declared in the VCF header (GT,DP,GQ) or computed on the fly (AB,pAB)
* Example for filtering genotypes so that they all have to have > 10 reads supporting the genotype.  If  genotype is heterozygote, require  GQ>20 and p-value of allelic balance deviating from 50/50 > 0.05 . If genotype is homozygous reference, require genotype quality > 30  \
  `-gt filt 'DP>10 and ( (GT==”0/1” and GQ>20 and pAB>0.05) or (GT==”0/0” and GQ>30) )'`&#x20;
* AB and pAB are on the fly computed fields for allelic balance (AB) and probability of deviating from 50/50 balance (pAB)
* For information on sequencing genotype quality metrics, see e.g. (<https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format>, "Interpreting genotype and other sample-level information" section)&#x20;

### Output options

`--export_carriers /out/carriers.tsv`&#x20;

Exports carrier IDs with genotype information to a specified file.&#x20;

**NOTE the output options need to be added before the command. NOTE also that the start of the path needs to be /out/.  After successful execution your output will be in your /home/ivm.**&#x20;

`--export_case_control /out/case_control.tsv`

Exports cohort of carriers and non-carriers of  non-ref alleles of variants. Column "COHORT" contains either CARRIERS/NON\_CARRIERS for easy import to other sandbox tool (e.g. Cohort Operations)

## Phenotypic consequences of variant carriers

Currently query tool does not contain association analyses, instead you are encouraged to import generated carrier files to Sandbox tools like [Cohort Operations](https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co) or[ LifeTrack](https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/lifetrack).

## Creating a Bash Alias&#x20;

You can also create an alias of the tool in your  `/home/ivm/.bashrc` .

Open the file with text editor and add the following:

```
exome () {
    /finngen/shared_nfs/finngen/exome_query/run_query "$@"
}
```

Type `source /home/ivm/.bashrc` to activate alias. Now you can use just command `exome` to invoke the tool.

&#x20;
