# FinnGen exome query tool

### Table of Contents

* [Introduction](#introduction-1)
* [Query types](#query-types)
* [Common options](#common-options)
* [Phenotypic consequences of variant carriers](#phenotypic-consequences-of-variant-carriers)
* [Creating a Bash Alias](#creating-a-bash-alias)

## Introduction

FinnGen exome query tool is a command line interface allowing users to query carriers of variants in FinnGen exome data (link)

Two types of queries are available:

* search for carriers of a single variant
* search for coding variants in a gene

Query can be invoked by opening a terminal and executing:`/finngen/shared_nfs/finngen/exome_query/run_query`

See command line instructions with `exome -h` and query type specific help for a single variant query `(exome var -h)` or gene query `(exome gene -h)` .

## Query types

### Search for a single variant

The `var` command allows you to query a single variant. Note that the variant (`` `variant_id`) `` needs to be in format chr:pos:ref:alt (e.g. 2:1503826:G:&#x54;**)** and in GRCh38 reference build. To get help on this query type and all its query modes type `exome var -h.`

Basic command: `` var `variant_id` ``

**Example query for a simple variant search:**

`/finngen/shared_nfs/finngen/exome_query/run_query var 'variant_id'`

The tool will output summary of the variant carriers and their genotype qualities to the screen.

### Search for Variants in a Gene

The `gene` command allows you to query variants within a specific gene. To get help on this query type and all its query modes type `exome gene -h.`

Basic command: `` gene `gene_name` ``

`` `gene_name` `` is the name of the gene in HGNC format. For example, **BRCA1**.

The tool will output a list of variants along with their details within the specified gene.

**Example query for a simple gene search:**

`/finngen/shared_nfs/finngen/exome_query/run_query gene 'gene_name'`

You can request variants with certain functional consequences by giving a list of comma separated consequences e.g. `--consequences missense_variant,stop_gained.` You can use shorthands for selecting all coding variants (`-coding_variants` or `-coding`) or protein truncating variants (`--PTV` or `-P`). Coding and PTV consequences used are:

```python
PTV_CONSEQUENCES = [
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
]

CODING_CONSEQUENCES = [
    "missense_variant",
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
    "inframe_insertion",
    "inframe_deletion",
]
```

## Common options

General syntax for using commands and options is: `/finngen/shared_nfs/finngen/exome_query/run_query COMMAND OPTION` . Replace `OPTION`and `COMMAND` with the appropriate arguments. **NOTE however, that the output options need to be added before the command.**

### Genotype filtering

Genotypes can be filtered by Python compatible syntax

* \--gt\_filt ‘python statement resolving to boolean’ -> genotypes not passing are set to missing
* The statement can use all fields that are declared in the VCF header (GT,DP,GQ) or computed on the fly (AB,pAB)
* Example for filtering genotypes so that they all have to have > 10 reads supporting the genotype. If genotype is heterozygote, require GQ>20 and p-value of allelic balance deviating from 50/50 > 0.05 . If genotype is homozygous reference, require genotype quality > 30\
  `-gt filt 'DP>10 and ( (GT==”0/1” and GQ>20 and pAB>0.05) or (GT==”0/0” and GQ>30) )'`
* AB and pAB are on the fly computed fields for allelic balance (AB) and probability of deviating from 50/50 balance (pAB)
* For information on sequencing genotype quality metrics, see e.g. (<https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format>, "Interpreting genotype and other sample-level information" section)

### Output options

`--export_carriers /out/carriers.tsv`

Exports carrier IDs with genotype information to a specified file.

**NOTE the output options need to be added before the command. NOTE also that the start of the path needs to be /out/. After successful execution your output will be in your /home/ivm.**

`--export_case_control /out/case_control.tsv`

Exports cohort of carriers and non-carriers of non-ref alleles of variants. Column "COHORT" contains either CARRIERS/NON\_CARRIERS for easy import to other sandbox tool (e.g. Cohort Operations)

## Phenotypic consequences of variant carriers

Currently query tool does not contain association analyses, instead you are encouraged to import generated carrier files to Sandbox tools like [Cohort Operations](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co.md) or[ LifeTrack](/working-in-the-sandbox/which-tools-are-available/lifetrack.md).

## Creating a Bash Alias

You can also create an alias of the tool in your `/home/ivm/.bashrc` .

Open the file with text editor and add the following:

```
exome () {
    /finngen/shared_nfs/finngen/exome_query/run_query "$@"
}
```

Type `source /home/ivm/.bashrc` to activate alias. Now you can use just command `exome` to invoke the tool.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/finngen-exome-query-tool.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
