# Variant-wise QC metrics file

Variant-wise QC metrics file is available to Sandbox users.

**Sandbox directory**

Variant-wise metrics file (and index file) are available in the following sandbox directory:

```
gs://finngen-production-library-green/imputation_panel/v4.2/variant_qc/sisu4.2_panel_var_wise_QC_metrics/
```

To generate SISu v4.2 reference panel, sample-, genotype- and variant-wise quality control (QC) filtering procedures were applied by an iterative manner on the high-coverage WGS (hcWGS) data. Then, a allele count (AC) > 2 cutting off was applied, symmetrically. Variant-wise metrics were exported for monitoring data quality after major QC steps.

We offer variant-wise QC metrics tsv files of Sisu v4.2 reference panel after major QC steps: raw data, after sample-wise QC data and after sample-, genotype- and variant-wise QC data, for autosomal chromosomes and chromosome X separately.

**Variant-wise QC metrics file**

| QC Steps                                   | Autosomal chromosome                                                                                      | Chromosome X                                                                                                                                                                                                |
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Raw VCF data from WashU (w/ 10500 samples) | sisu4.2\_panel\_autosomal\_raw\_variant\_wise\_qc\_metrics.tsv.gz / .tbi                                  | sisu4.2\_panel\_chrX\_raw\_variant\_wise\_qc\_metrics.tsv.gz                                                                                                                                                |
| After sample-wise QC (w/ 8554 samples)     | sisu4.2\_panel\_autosomal\_after\_sample\_qc\_variant\_wise\_qc\_metrics.tsv.gz / .tbi                    | sisu4.2\_panel\_chrX\_after\_sample\_qc\_variant\_wise\_qc\_metrics.tsv.gz                                                                                                                                  |
| After genotype- and variant-wise QC        | sisu4.2\_panel\_autosomal\_after\_sample\_genotype\_variant\_qc\_variant\_wise\_qc\_metrics.tsv.gz / .tbi | sisu4.2\_panel\_chrX\_XPAR\_after\_sample\_genotype\_variant\_qc\_variant\_wise\_qc\_metrics.tsv.gz; sisu4.2\_panel\_chrX\_nonXPAR\_after\_sample\_genotype\_variant\_qc\_variant\_wise\_qc\_metrics.tsv.gz |
| After AC>2 filtering, symmetrically        |                                                                                                           |                                                                                                                                                                                                             |

**General description**

Autosomal chromosomes variant-wise metrics file contains 24 columns:

| Column Number | Name                  | Type            | Description                                                                                |
| ------------- | --------------------- | --------------- | ------------------------------------------------------------------------------------------ |
| 1             | #chr                  | int             | Chromosome number (1-22) where the variant is located                                      |
| 2             | pos                   | int             | Genomic position of the variant on the specified chromosome                                |
| 3             | Variant               | string          | Variant identifier in the format "chromosome:position:reference\_allele:alternate\_allele" |
| 4             | callRate              | double          | Fraction of samples with called genotypes                                                  |
| 5             | AC                    | int             | Count of alternate alleles                                                                 |
| 6             | AF                    | double          | Calculated alternate allele frequency (q)                                                  |
| 7             | nCalled               | int             | Sum of nHomRef, nHet, and nHomVar                                                          |
| 8             | nNotCalled            | int             | Number of uncalled samples                                                                 |
| 9             | nHomRef               | int             | Number of homozygous reference samples                                                     |
| 10            | nHet                  | int             | Number of heterozygous samples                                                             |
| 11            | nHomVar               | int             | Number of homozygous alternate samples                                                     |
| 12            | dpMean                | double          | Depth mean across all samples                                                              |
| 13            | dpStDev               | double          | Depth standard deviation across all samples                                                |
| 14            | gqMean                | double          | The average genotype quality across all samples                                            |
| 15            | dpStDev               | double          | Depth standard deviation across all samples                                                |
| 16            | nNonRef               | int             | Sum of nHet and nHomVar                                                                    |
| 17            | rHeterozygosity       | double          | Proportion of heterozygotes                                                                |
| 18            | rHetHomVar            | double          | Ratio of heterozygotes to homozygous alternates                                            |
| 19            | rExpectedHetFrequency | double          | Expected rHeterozygosity based on HWE                                                      |
| 20            | pHWE                  | double          | p-value from Hardy Weinberg Equilibrium null model                                         |
| 21            | FILTERS               | list of strings | FILTER entry in the VCF, \[] means PASS                                                    |
| 22            | QD                    | double          | Quality by Depth (QD) of INFO field in the VCF                                             |
| 23            | IS\_INDEL             | boolean         | Insertion-deletion variant                                                                 |
| 24            | IS\_SNP               | boolean         | Single nucleotide variant                                                                  |

Chromosome X variant-wise metrics file contains 25 columns:

| Column Number | Name                              | Type            | Description                                                                                                                             |
| ------------- | --------------------------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| 1             | locus                             | tlocus          | Hail type for a genomic coordinate with a contig and a position, e.g., chrX:10009                                                       |
| 2             | alleles                           | tarray of tstr  | Hail type for variable-length arrays of text strings, e.g., \["A","G"]                                                                  |
| 3             | filters                           | list of strings | FILTER entry in the VCF, \[] means PASS                                                                                                 |
| 4             | variant\_qc.dp\_stats.mean        | float64         | Mean depth of coverage (DP) across samples.                                                                                             |
| 5             | variant\_qc.dp\_stats.stdev       | float64         | Standard deviation of depth of coverage (DP) across samples.                                                                            |
| 6             | variant\_qc.dp\_stats.min         | int32           | Minimum depth of coverage (DP) across samples.                                                                                          |
| 7             | variant\_qc.dp\_stats.max         | int32           | Maximum depth of coverage (DP) across samples.                                                                                          |
| 8             | variant\_qc.gq\_stats.mean        | float64         | Mean genotype quality (GQ) across samples.                                                                                              |
| 9             | variant\_qc.gq\_stats.stdev       | float64         | Standard deviation of genotype quality (GQ) across samples.                                                                             |
| 10            | variant\_qc.gq\_stats.min         | int32           | Minimum genotype quality (GQ) across samples.                                                                                           |
| 11            | variant\_qc.gq\_stats.max         | int32           | Maximum genotype quality (GQ) across samples.                                                                                           |
| 12            | variant\_qc.AC                    | array\<int32>   | Calculated allele count, one element per allele, including the reference. Sums to AN.                                                   |
| 13            | variant\_qc.AF                    | array\<float64> | Calculated allele frequency, one element per allele, including the reference. Sums to one. Equivalent to AC / AN.                       |
| 14            | variant\_qc.AN                    | int32           | Total number of called alleles.                                                                                                         |
| 15            | variant\_qc.homozygote\_count     | array\<int32>   | Number of homozygotes per allele. One element per allele, including the reference.                                                      |
| 16            | variant\_qc.call\_rate            | float64         | Fraction of calls neither missing nor filtered. Equivalent to n\_called / count\_cols().                                                |
| 17            | variant\_qc.n\_called             | int64           | Number of samples with a defined GT.                                                                                                    |
| 18            | variant\_qc.n\_not\_called        | int64           | Number of samples with a missing GT.                                                                                                    |
| 19            | variant\_qc.n\_filtered           | int64           | Number of filtered entries.                                                                                                             |
| 20            | variant\_qc.n\_het                | int64           | Number of heterozygous samples.                                                                                                         |
| 21            | variant\_qc.n\_non\_ref           | int64           | Number of samples with at least one called non-reference allele.                                                                        |
| 22            | variant\_qc.het\_freq\_hwe        | float64         | Expected frequency of heterozygous samples under Hardy-Weinberg equilibrium. See functions.hardy\_weinberg\_test() for details.         |
| 23            | variant\_qc.p\_value\_hwe         | float64         | p-value from two-sided test of Hardy-Weinberg equilibrium. See functions.hardy\_weinberg\_test() for details.                           |
| 24            | variant\_qc.p\_value\_excess\_het | float64         | p-value from one-sided test of Hardy-Weinberg equilibrium for excess heterozygosity. See functions.hardy\_weinberg\_test() for details. |
| 25            | info.QD                           | double          | Quality by Depth (QD) of INFO field in the VCF.                                                                                         |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/genotype-data/imputation-panel/sisu-v4.2-reference-panel/variant-wise-qc-metrics-file.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
