Finemapping results format
Descriptions of the contents and formats of the finemapping pipeline output files
The finemapping results come from two different finemapping methods: FINEMAP and SuSiE.
The purpose of finemapping is to find the set of 1 or more variants most likely to be responsible for the association at that locus. This set of likely variants is referred to as a "credible set". You can read more about the motivations for finemapping in the main concepts: Finemapping.
Most severe transcript is chosen by first taking the most severe among canonical protein coding transcripts, if no canonical transcript exists, then first (random) other protein coding transcript is chosen corresponding to the most severe annotation. Precedence of severity is chosen according to Ensembl Variant Effect Predictor (VEP) default.
Quick links to relevant formats
For easier navigation of this page, here are some quick links to the different files formats:
Pipeline meta-data outputs
Region status file
The region status file was a tab-separated file that reported which regions were sent to finemapping and if there were any problems that prevented finemapping. This file is no longer output by the currently supported finemapping workflows, but the description has been retained for legacy results. The file had the following columns:
region
The span of the region, specified in chromosomal coordinates chromosome.start-end
status
Status of the region, either "OK" if the region was passed on to finemapping, or "Failure" if the region was not successfully formed.
windowsize
The window size when determining a region. Region selection works by extending a window (in basepairs) around each genome-wide significant variable. If windows overlap each other, those windows get merged. These possibly merged windows are the resulting regions that are finemapped. In case a region is larger than the maximum allowed region size (currently 6 megabases), that region is retried with a smaller window. The final window size that is tried is the one showed here.
failure
Empty if the region was successful. In case the region was not successful, the reason will read here. Most likely the region was too long, and it could not be formed even when lowering the window size to its minimum value.
Regions were typically skipped if their merged size (after combining with proximal regions) was greater than the user-specified maximum allowed size (default 6Mb) and could not be successfully shrunk to individual regions >1Mb in size using this algorithm.
SuSiE outputs
Both 95% credible set and 99% credible sets are provided. The file with _99 contains 99% credsets as below. The SuSiE outputs have been annotated using the variant annotation file.
PHENONAME.SUSIE.cred.bgz and PHENONAME.SUSIE_99.cred.bgz
These files contains all of the credible sets for this phenotype. The credible sets are the 95% (PHENONAME.SUSIE.cred.bgz) and 99% (PHENONAME.SUSIE_99.cred.bgz) credible sets, i.e. under the model they have a 95% or 99% probability of containing the causal variant. The files are bgzipped tab-separated values file, with one credible set per line.
Contains credible set summaries from SuSiE fine-mapping for all genome-wide significant regions.
Column
Description
region
Region for which the fine-mapping was run
cs
Running number for independent credible sets in a region
cs_log10bf
Log10 Bayes factor comparing the solution of this model (cs independent credible sets) to cs -1 credible sets.
cs_avg_r2
Average correlation R2 between variants in the credible set
cs_min_r2
Minimum R2 between variants in the credible set
cs_size
How many SNPs the credible set contains
PHENONAME.SUSIE.cred.summary.tsv, PHENONAME.SUSIE_99.cred.summary.tsv and PHENONAME.SUSIE_EXTEND.cred.summary.tsv
These files contain a summary of the credible sets for this phenotype. The credible sets are the 95% credible sets, i.e. under the model they have a 95% (PHENONAME.SUSIE.cred.summary.tsv) or 99% (PHENONAME.SUSIE_99.cred.summary.tsv) probability of containing the causal variant. The file PHENOTYPE.SUSIE_EXTEND.cred.summary.tsv contains the 95% credible set, but extended with the 99% credible set variants where possible. The files are tab-delimited with one credible set per line. The columns are described in the following table:
Column
Description
trait
Phenotype
region
Region for which the fine-mapping was run
cs
Running number for independent credible sets in a region
cs_log10bf
Log10 Bayes factor comparing the solution of this model (cs independent credible sets) to cs -1 credible sets.
cs_avg_r2
Average correlation R2 between variants in the credible set
cs_min_r2
Minimum R2 between variants in the credible set
low_purity
boolean (TRUE, FALSE) indicator if the CS is low purity (low min R2)
cs_size
How many SNPs the credible set contains
good_cs
boolean (TRUE, FALSE) indicator if this CS is considered reliable. IF this is FALSE then top variant reported for the CS will be chosen based on minimum p-value in the credible set, otherwise the top variant is chosen by maximum PIP
cs_id
Credible set ID
v
Top variant (chr:pos:ref:alt). The top variant is the max PIP variant if the credible set has good_cs==TRUE, otherwise it is the min p variant.
p
Top variant p-value
beta
Top variant beta
sd
Top variant standard deviation
prob
overall PIP of the variant in the region
cs_specific_prob
PIP of the variant in the current credible set (this and previous are typically almost identical)
0..n
Configured annotation columns. Typical default most_severe, gene_most_severe giving consequence and gene of top variant
PHENONAME.SUSIE.snp.bgz and PHENONAME.SUSIE_99.snp.bgz
This file contains SuSIE data for all of the variants in all of the regions. The files are tab-delimited and bgzipped and has a tabix index PHENONAME.SUSIE.snp.bgz.tbi and PHENONAME.SUSIE_99.snp.bgz.tbi. One line containts one variant. The columns are described in the table below.
Column
Description
trait
Phenotype
region
Region for which the fine-mapping was run
v, rsid
Variant IDs
chromosome
Chromosome no.
position
Position on the chromosome
allele1
Major allele
allele2
Minor allele
maf
Minor allele frequency
beta
Original marginal beta
se
Original standard error
p
Original p-value
mean
Posterior mean beta after fine-mapping
sd
Posterior standard deviation after fine-mapping
prob
Posterior inclusion probability
cs
Credible set index within region
lead_r2
R2 value for a lead variant (the one with maximum PIP) in a credible set
alphax
Posterior inclusion probability for the xth single effect (x := 1..L where L is the number of single effects/causal variants specified; default: L = 10).
PHENONAME.SUSIE.snp.filter.tsv, PHENONAME.SUSIE_99.snp.filter.tsv and PHENONAME.SUSIE_extend.snp.filter.tsv
This file contains the filtered SNPs for the 95% (PHENONAME.SUSIE.snp.filter.tsv) and 99% (PHENONAME.SUSIE_99.snp.filter.tsv) credible sets. Variants not included in the 95% or 99% credible sets are not included in the respective files. Neither are those that were part of low_purity credible sets. The file PHENONAME.SUSIE_extend.snp.filter.tsv contains the filtered SNPs for the 95% credible sets, extended with 99% credible set variants where applicable, and credible sets not included in the 95%/99% credible sets are not included in this file. Variants are listed one per line and the files are tab-delimited. The columns are described in the table below:
Column
Description
trait
Phenotype
region
Region for which the fine-mapping was run
v
Variant ID (chr:pos:ref:alt)
cs
Running credible set ID within region
cs_specific_prob
Posterior inclusion probability for this CS
chromosome
Chromosome no.
position
Position on the chromosome
allele1
Major allele
allele2
Minor allele
maf
Minor allele frequency
beta
Original association beta
p
Original p-value
se
Original standard error
most_severe
Most severe consequence of the variant
gene_most_severe
Gene corresponding to most severe consequence
FINEMAP outputs
PHENONAME.FINEMAP.config.bgz
This file contains posterior summaries for all of the causal configuration, one per line. The columns are described in the following table. More information can be found at http://www.christianbenner.com/.
Column
Description
trait
Phenotype
region
Region for which the fine-mapping was run
rank
Rank of this configuration within a region
config
Causal variants in this configuration
prob
Probability across all n independent signal configurations
log10bf
Log10 Bayes factor for this configuration
odds
Odds for this configuration
k
How many independent signals are in this configuration
prob_norm_k
Probability of this configuration within k independent signals solution
h2
SNP heritability of this solution
#NAME?
95% confidence interval limits of SNP heritability of this solution
mean
Marginalized shrinkage estimates of the posterior effect size mean
sd
marginalized shrinkage estimates of the posterior effect standard deviation
PHENONAME.FINEMAP.region.bgz
This bgzipped, tab-delimited file contains all of the finemapped regions for the endpoint, one region per line. The columns are described in the following table. More information can be found at http://www.christianbenner.com/.
Column
Description
trait
Phenotype
region
Region for which the fine-mapping was run
h2g_snp or h2g
SNP heritability of this region
h2g_sd
Standard deviation of SNP heritability of this region
h2g_lower95
Lower limit of 95% CI for SNP heritability
h2g_upper95
Upper limit of 95% CI for SNP heritability
log10bf
Log10 Bayes factor compared against null (no signals in the region)
prob_xSNP
x columns for probabilities of different numbers of independent signals
expectedvalue
Expectation (average) of the number of signals
PHENOTYPE.FINEMAP.snp.bgz
This tab-delimited bgzipped file contains finemapping information for each of the snps that were finemapped with one variant per line. This file also has a tabix index named PHENOTYPE.FINEMAP.snp.bgz.tbi. The columns of the file are described in the table below.
Column
Description
trait
Phenotype
region
Region for which the fine-mapping was run
v
Variant
index
Running index
rsid
Variant ID
chromosome
Chromosome no.
position
Position on the chromosome
allele1
Major allele
allele2
Minor allele
maf
Minor allele frequency
beta
Original marginal beta (effect size)
se
Original standard error
z
Original z-score
prob
Posterior inclusion probability
log10bf
Log10 Bayes factor
mean
Marginalized shrinkage estimates of the posterior effect size mean
sd
Marginalized shrinkage estimates of the posterior effect standard deviation
mean_incl
Conditional estimates of the posterior effect size mean
sd_incl
Conditional estimates of the posterior effect size standard deviation
p
Original p-value
csx
Credible set index for given number of causal variants x
Read more about Finemapping
Last updated
Was this helpful?