# Variant-wise QC metrics file

Variant-wise QC metrics file is available to Sandbox users.

**Sandbox directory**

Variant-wise metrics file (and index file) are available in the following sandbox directory:

```
gs://finngen-production-library-green/imputation_panel/v4.2/variant_qc/sisu4.2_panel_var_wise_QC_metrics/
```

To generate SISu v4.2 reference panel, sample-, genotype- and variant-wise quality control (QC) filtering procedures were applied by an iterative manner on the high-coverage WGS (hcWGS) data. Then, a allele count (AC) > 2 cutting off was applied, symmetrically. Variant-wise metrics were exported for monitoring data quality after major QC steps.&#x20;

We offer variant-wise QC metrics tsv files of Sisu v4.2 reference panel after major QC steps: raw data, after sample-wise QC data and after sample-, genotype- and variant-wise QC data, for autosomal chromosomes and chromosome X separately.&#x20;

**Variant-wise QC metrics file**

| QC Steps                                   | Autosomal chromosome                                                                                      | Chromosome X                                                                                                                                                                                                |
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Raw VCF data from WashU (w/ 10500 samples) | sisu4.2\_panel\_autosomal\_raw\_variant\_wise\_qc\_metrics.tsv.gz / .tbi                                  | sisu4.2\_panel\_chrX\_raw\_variant\_wise\_qc\_metrics.tsv.gz                                                                                                                                                |
| After sample-wise QC (w/ 8554 samples)     | sisu4.2\_panel\_autosomal\_after\_sample\_qc\_variant\_wise\_qc\_metrics.tsv.gz / .tbi                    | sisu4.2\_panel\_chrX\_after\_sample\_qc\_variant\_wise\_qc\_metrics.tsv.gz                                                                                                                                  |
| After genotype- and variant-wise QC        | sisu4.2\_panel\_autosomal\_after\_sample\_genotype\_variant\_qc\_variant\_wise\_qc\_metrics.tsv.gz / .tbi | sisu4.2\_panel\_chrX\_XPAR\_after\_sample\_genotype\_variant\_qc\_variant\_wise\_qc\_metrics.tsv.gz; sisu4.2\_panel\_chrX\_nonXPAR\_after\_sample\_genotype\_variant\_qc\_variant\_wise\_qc\_metrics.tsv.gz |
| After AC>2 filtering, symmetrically        |                                                                                                           |                                                                                                                                                                                                             |

**General description**

Autosomal chromosomes variant-wise metrics file contains 24 columns:&#x20;

| Column Number | Name                  | Type            | Description                                                                                |
| ------------- | --------------------- | --------------- | ------------------------------------------------------------------------------------------ |
| 1             | #chr                  | int             | Chromosome number (1-22) where the variant is located                                      |
| 2             | pos                   | int             | Genomic position of the variant on the specified chromosome                                |
| 3             | Variant               | string          | Variant identifier in the format "chromosome:position:reference\_allele:alternate\_allele" |
| 4             | callRate              | double          | Fraction of samples with called genotypes                                                  |
| 5             | AC                    | int             | Count of alternate alleles                                                                 |
| 6             | AF                    | double          | Calculated alternate allele frequency (q)                                                  |
| 7             | nCalled               | int             | Sum of nHomRef, nHet, and nHomVar                                                          |
| 8             | nNotCalled            | int             | Number of uncalled samples                                                                 |
| 9             | nHomRef               | int             | Number of homozygous reference samples                                                     |
| 10            | nHet                  | int             | Number of heterozygous samples                                                             |
| 11            | nHomVar               | int             | Number of homozygous alternate samples                                                     |
| 12            | dpMean                | double          | Depth mean across all samples                                                              |
| 13            | dpStDev               | double          | Depth standard deviation across all samples                                                |
| 14            | gqMean                | double          | The average genotype quality across all samples                                            |
| 15            | dpStDev               | double          | Depth standard deviation across all samples                                                |
| 16            | nNonRef               | int             | Sum of nHet and nHomVar                                                                    |
| 17            | rHeterozygosity       | double          | Proportion of heterozygotes                                                                |
| 18            | rHetHomVar            | double          | Ratio of heterozygotes to homozygous alternates                                            |
| 19            | rExpectedHetFrequency | double          | Expected rHeterozygosity based on HWE                                                      |
| 20            | pHWE                  | double          | p-value from Hardy Weinberg Equilibrium null model                                         |
| 21            | FILTERS               | list of strings | FILTER entry in the VCF, \[] means PASS                                                    |
| 22            | QD                    | double          | Quality by Depth (QD) of INFO field in the VCF                                             |
| 23            | IS\_INDEL             | boolean         | Insertion-deletion variant                                                                 |
| 24            | IS\_SNP               | boolean         | Single nucleotide variant                                                                  |

Chromosome X variant-wise metrics file contains 25 columns:&#x20;

| Column Number | Name                              | Type            | Description                                                                                                                             |
| ------------- | --------------------------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| 1             | locus                             | tlocus          | Hail type for a genomic coordinate with a contig and a position, e.g., chrX:10009                                                       |
| 2             | alleles                           | tarray of tstr  | Hail type for variable-length arrays of text strings, e.g., \["A","G"]                                                                  |
| 3             | filters                           | list of strings | FILTER entry in the VCF, \[] means PASS                                                                                                 |
| 4             | variant\_qc.dp\_stats.mean        | float64         | Mean depth of coverage (DP) across samples.                                                                                             |
| 5             | variant\_qc.dp\_stats.stdev       | float64         | Standard deviation of depth of coverage (DP) across samples.                                                                            |
| 6             | variant\_qc.dp\_stats.min         | int32           | Minimum depth of coverage (DP) across samples.                                                                                          |
| 7             | variant\_qc.dp\_stats.max         | int32           | Maximum depth of coverage (DP) across samples.                                                                                          |
| 8             | variant\_qc.gq\_stats.mean        | float64         | Mean genotype quality (GQ) across samples.                                                                                              |
| 9             | variant\_qc.gq\_stats.stdev       | float64         | Standard deviation of genotype quality (GQ) across samples.                                                                             |
| 10            | variant\_qc.gq\_stats.min         | int32           | Minimum genotype quality (GQ) across samples.                                                                                           |
| 11            | variant\_qc.gq\_stats.max         | int32           | Maximum genotype quality (GQ) across samples.                                                                                           |
| 12            | variant\_qc.AC                    | array\<int32>   | Calculated allele count, one element per allele, including the reference. Sums to AN.                                                   |
| 13            | variant\_qc.AF                    | array\<float64> | Calculated allele frequency, one element per allele, including the reference. Sums to one. Equivalent to AC / AN.                       |
| 14            | variant\_qc.AN                    | int32           | Total number of called alleles.                                                                                                         |
| 15            | variant\_qc.homozygote\_count     | array\<int32>   | Number of homozygotes per allele. One element per allele, including the reference.                                                      |
| 16            | variant\_qc.call\_rate            | float64         | Fraction of calls neither missing nor filtered. Equivalent to n\_called / count\_cols().                                                |
| 17            | variant\_qc.n\_called             | int64           | Number of samples with a defined GT.                                                                                                    |
| 18            | variant\_qc.n\_not\_called        | int64           | Number of samples with a missing GT.                                                                                                    |
| 19            | variant\_qc.n\_filtered           | int64           | Number of filtered entries.                                                                                                             |
| 20            | variant\_qc.n\_het                | int64           | Number of heterozygous samples.                                                                                                         |
| 21            | variant\_qc.n\_non\_ref           | int64           | Number of samples with at least one called non-reference allele.                                                                        |
| 22            | variant\_qc.het\_freq\_hwe        | float64         | Expected frequency of heterozygous samples under Hardy-Weinberg equilibrium. See functions.hardy\_weinberg\_test() for details.         |
| 23            | variant\_qc.p\_value\_hwe         | float64         | p-value from two-sided test of Hardy-Weinberg equilibrium. See functions.hardy\_weinberg\_test() for details.                           |
| 24            | variant\_qc.p\_value\_excess\_het | float64         | p-value from one-sided test of Hardy-Weinberg equilibrium for excess heterozygosity. See functions.hardy\_weinberg\_test() for details. |
| 25            | info.QD                           | double          | Quality by Depth (QD) of INFO field in the VCF.                                                                                         |
