Variant-wise QC metrics file
Variant-wise QC metrics file is available to Sandbox users.
Sandbox directory
Variant-wise metrics file (and index file) are available in the following sandbox directory:
To generate SISu v4.2 reference panel, sample-, genotype- and variant-wise quality control (QC) filtering procedures were applied by an iterative manner on the high-coverage WGS (hcWGS) data. Then, a allele count (AC) > 2 cutting off was applied, symmetrically. Variant-wise metrics were exported for monitoring data quality after major QC steps.
We offer variant-wise QC metrics tsv files of Sisu v4.2 reference panel after major QC steps: raw data, after sample-wise QC data and after sample-, genotype- and variant-wise QC data, for autosomal chromosomes and chromosome X separately.
Variant-wise QC metrics file
Raw VCF data from WashU (w/ 10500 samples)
sisu4.2_panel_autosomal_raw_variant_wise_qc_metrics.tsv.gz / .tbi
sisu4.2_panel_chrX_raw_variant_wise_qc_metrics.tsv.gz
After sample-wise QC (w/ 8554 samples)
sisu4.2_panel_autosomal_after_sample_qc_variant_wise_qc_metrics.tsv.gz / .tbi
sisu4.2_panel_chrX_after_sample_qc_variant_wise_qc_metrics.tsv.gz
After genotype- and variant-wise QC
sisu4.2_panel_autosomal_after_sample_genotype_variant_qc_variant_wise_qc_metrics.tsv.gz / .tbi
sisu4.2_panel_chrX_XPAR_after_sample_genotype_variant_qc_variant_wise_qc_metrics.tsv.gz; sisu4.2_panel_chrX_nonXPAR_after_sample_genotype_variant_qc_variant_wise_qc_metrics.tsv.gz
After AC>2 filtering, symmetrically
General description
Autosomal chromosomes variant-wise metrics file contains 24 columns:
1
#chr
int
Chromosome number (1-22) where the variant is located
2
pos
int
Genomic position of the variant on the specified chromosome
3
Variant
string
Variant identifier in the format "chromosome:position:reference_allele:alternate_allele"
4
callRate
double
Fraction of samples with called genotypes
5
AC
int
Count of alternate alleles
6
AF
double
Calculated alternate allele frequency (q)
7
nCalled
int
Sum of nHomRef, nHet, and nHomVar
8
nNotCalled
int
Number of uncalled samples
9
nHomRef
int
Number of homozygous reference samples
10
nHet
int
Number of heterozygous samples
11
nHomVar
int
Number of homozygous alternate samples
12
dpMean
double
Depth mean across all samples
13
dpStDev
double
Depth standard deviation across all samples
14
gqMean
double
The average genotype quality across all samples
15
dpStDev
double
Depth standard deviation across all samples
16
nNonRef
int
Sum of nHet and nHomVar
17
rHeterozygosity
double
Proportion of heterozygotes
18
rHetHomVar
double
Ratio of heterozygotes to homozygous alternates
19
rExpectedHetFrequency
double
Expected rHeterozygosity based on HWE
20
pHWE
double
p-value from Hardy Weinberg Equilibrium null model
21
FILTERS
list of strings
FILTER entry in the VCF, [] means PASS
22
QD
double
Quality by Depth (QD) of INFO field in the VCF
23
IS_INDEL
boolean
Insertion-deletion variant
24
IS_SNP
boolean
Single nucleotide variant
Chromosome X variant-wise metrics file contains 25 columns:
1
locus
tlocus
Hail type for a genomic coordinate with a contig and a position, e.g., chrX:10009
2
alleles
tarray of tstr
Hail type for variable-length arrays of text strings, e.g., ["A","G"]
3
filters
list of strings
FILTER entry in the VCF, [] means PASS
4
variant_qc.dp_stats.mean
float64
Mean depth of coverage (DP) across samples.
5
variant_qc.dp_stats.stdev
float64
Standard deviation of depth of coverage (DP) across samples.
6
variant_qc.dp_stats.min
int32
Minimum depth of coverage (DP) across samples.
7
variant_qc.dp_stats.max
int32
Maximum depth of coverage (DP) across samples.
8
variant_qc.gq_stats.mean
float64
Mean genotype quality (GQ) across samples.
9
variant_qc.gq_stats.stdev
float64
Standard deviation of genotype quality (GQ) across samples.
10
variant_qc.gq_stats.min
int32
Minimum genotype quality (GQ) across samples.
11
variant_qc.gq_stats.max
int32
Maximum genotype quality (GQ) across samples.
12
variant_qc.AC
array<int32>
Calculated allele count, one element per allele, including the reference. Sums to AN.
13
variant_qc.AF
array<float64>
Calculated allele frequency, one element per allele, including the reference. Sums to one. Equivalent to AC / AN.
14
variant_qc.AN
int32
Total number of called alleles.
15
variant_qc.homozygote_count
array<int32>
Number of homozygotes per allele. One element per allele, including the reference.
16
variant_qc.call_rate
float64
Fraction of calls neither missing nor filtered. Equivalent to n_called / count_cols().
17
variant_qc.n_called
int64
Number of samples with a defined GT.
18
variant_qc.n_not_called
int64
Number of samples with a missing GT.
19
variant_qc.n_filtered
int64
Number of filtered entries.
20
variant_qc.n_het
int64
Number of heterozygous samples.
21
variant_qc.n_non_ref
int64
Number of samples with at least one called non-reference allele.
22
variant_qc.het_freq_hwe
float64
Expected frequency of heterozygous samples under Hardy-Weinberg equilibrium. See functions.hardy_weinberg_test() for details.
23
variant_qc.p_value_hwe
float64
p-value from two-sided test of Hardy-Weinberg equilibrium. See functions.hardy_weinberg_test() for details.
24
variant_qc.p_value_excess_het
float64
p-value from one-sided test of Hardy-Weinberg equilibrium for excess heterozygosity. See functions.hardy_weinberg_test() for details.
25
info.QD
double
Quality by Depth (QD) of INFO field in the VCF.
Last updated
Was this helpful?