Variant-wise QC metrics file

Variant-wise QC metrics file is available to Sandbox users.

Sandbox directory

Variant-wise metrics file (and index file) are available in the following sandbox directory:

gs://finngen-production-library-green/imputation_panel/v4.2/variant_qc/sisu4.2_panel_var_wise_QC_metrics/

To generate SISu v4.2 reference panel, sample-, genotype- and variant-wise quality control (QC) filtering procedures were applied by an iterative manner on the high-coverage WGS (hcWGS) data. Then, a allele count (AC) > 2 cutting off was applied, symmetrically. Variant-wise metrics were exported for monitoring data quality after major QC steps.

We offer variant-wise QC metrics tsv files of Sisu v4.2 reference panel after major QC steps: raw data, after sample-wise QC data and after sample-, genotype- and variant-wise QC data, for autosomal chromosomes and chromosome X separately.

Variant-wise QC metrics file

QC Steps
Autosomal chromosome
Chromosome X

Raw VCF data from WashU (w/ 10500 samples)

sisu4.2_panel_autosomal_raw_variant_wise_qc_metrics.tsv.gz / .tbi

sisu4.2_panel_chrX_raw_variant_wise_qc_metrics.tsv.gz

After sample-wise QC (w/ 8554 samples)

sisu4.2_panel_autosomal_after_sample_qc_variant_wise_qc_metrics.tsv.gz / .tbi

sisu4.2_panel_chrX_after_sample_qc_variant_wise_qc_metrics.tsv.gz

After genotype- and variant-wise QC

sisu4.2_panel_autosomal_after_sample_genotype_variant_qc_variant_wise_qc_metrics.tsv.gz / .tbi

sisu4.2_panel_chrX_XPAR_after_sample_genotype_variant_qc_variant_wise_qc_metrics.tsv.gz; sisu4.2_panel_chrX_nonXPAR_after_sample_genotype_variant_qc_variant_wise_qc_metrics.tsv.gz

After AC>2 filtering, symmetrically

General description

Autosomal chromosomes variant-wise metrics file contains 24 columns:

Column Number
Name
Type
Description

1

#chr

int

Chromosome number (1-22) where the variant is located

2

pos

int

Genomic position of the variant on the specified chromosome

3

Variant

string

Variant identifier in the format "chromosome:position:reference_allele:alternate_allele"

4

callRate

double

Fraction of samples with called genotypes

5

AC

int

Count of alternate alleles

6

AF

double

Calculated alternate allele frequency (q)

7

nCalled

int

Sum of nHomRef, nHet, and nHomVar

8

nNotCalled

int

Number of uncalled samples

9

nHomRef

int

Number of homozygous reference samples

10

nHet

int

Number of heterozygous samples

11

nHomVar

int

Number of homozygous alternate samples

12

dpMean

double

Depth mean across all samples

13

dpStDev

double

Depth standard deviation across all samples

14

gqMean

double

The average genotype quality across all samples

15

dpStDev

double

Depth standard deviation across all samples

16

nNonRef

int

Sum of nHet and nHomVar

17

rHeterozygosity

double

Proportion of heterozygotes

18

rHetHomVar

double

Ratio of heterozygotes to homozygous alternates

19

rExpectedHetFrequency

double

Expected rHeterozygosity based on HWE

20

pHWE

double

p-value from Hardy Weinberg Equilibrium null model

21

FILTERS

list of strings

FILTER entry in the VCF, [] means PASS

22

QD

double

Quality by Depth (QD) of INFO field in the VCF

23

IS_INDEL

boolean

Insertion-deletion variant

24

IS_SNP

boolean

Single nucleotide variant

Chromosome X variant-wise metrics file contains 25 columns:

Column Number
Name
Type
Description

1

locus

tlocus

Hail type for a genomic coordinate with a contig and a position, e.g., chrX:10009

2

alleles

tarray of tstr

Hail type for variable-length arrays of text strings, e.g., ["A","G"]

3

filters

list of strings

FILTER entry in the VCF, [] means PASS

4

variant_qc.dp_stats.mean

float64

Mean depth of coverage (DP) across samples.

5

variant_qc.dp_stats.stdev

float64

Standard deviation of depth of coverage (DP) across samples.

6

variant_qc.dp_stats.min

int32

Minimum depth of coverage (DP) across samples.

7

variant_qc.dp_stats.max

int32

Maximum depth of coverage (DP) across samples.

8

variant_qc.gq_stats.mean

float64

Mean genotype quality (GQ) across samples.

9

variant_qc.gq_stats.stdev

float64

Standard deviation of genotype quality (GQ) across samples.

10

variant_qc.gq_stats.min

int32

Minimum genotype quality (GQ) across samples.

11

variant_qc.gq_stats.max

int32

Maximum genotype quality (GQ) across samples.

12

variant_qc.AC

array<int32>

Calculated allele count, one element per allele, including the reference. Sums to AN.

13

variant_qc.AF

array<float64>

Calculated allele frequency, one element per allele, including the reference. Sums to one. Equivalent to AC / AN.

14

variant_qc.AN

int32

Total number of called alleles.

15

variant_qc.homozygote_count

array<int32>

Number of homozygotes per allele. One element per allele, including the reference.

16

variant_qc.call_rate

float64

Fraction of calls neither missing nor filtered. Equivalent to n_called / count_cols().

17

variant_qc.n_called

int64

Number of samples with a defined GT.

18

variant_qc.n_not_called

int64

Number of samples with a missing GT.

19

variant_qc.n_filtered

int64

Number of filtered entries.

20

variant_qc.n_het

int64

Number of heterozygous samples.

21

variant_qc.n_non_ref

int64

Number of samples with at least one called non-reference allele.

22

variant_qc.het_freq_hwe

float64

Expected frequency of heterozygous samples under Hardy-Weinberg equilibrium. See functions.hardy_weinberg_test() for details.

23

variant_qc.p_value_hwe

float64

p-value from two-sided test of Hardy-Weinberg equilibrium. See functions.hardy_weinberg_test() for details.

24

variant_qc.p_value_excess_het

float64

p-value from one-sided test of Hardy-Weinberg equilibrium for excess heterozygosity. See functions.hardy_weinberg_test() for details.

25

info.QD

double

Quality by Depth (QD) of INFO field in the VCF.

Last updated

Was this helpful?