Finemapping results format

Descriptions of the contents and formats of the finemapping pipeline output files

The finemapping results come from two different finemapping methods: FINEMAP and SuSiE.

The purpose of finemapping is to find the set of 1 or more variants most likely to be responsible for the association at that locus. This set of likely variants is referred to as a "credible set". You can read more about the motivations for finemapping in the main concepts: Finemapping.

Most severe transcript is chosen by first taking the most severe among canonical protein coding transcripts, if no canonical transcript exists, then first (random) other protein coding transcript is chosen corresponding to the most severe annotation. Precedence of severity is chosen according to Ensembl Variant Effect Predictor (VEP) default.

Quick links to relevant formats

For easier navigation of this page, here are some quick links to the different files formats:

Pipeline meta-data outputs

Region status file

SuSIE outputs

Finemap outputs

Pipeline meta-data outputs

Region status file

The region status file was a tab-separated file that reported which regions were sent to finemapping and if there were any problems that prevented finemapping. This file is no longer output by the currently supported finemapping workflows, but the description has been retained for legacy results. The file had the following columns:

Column name

Description

region

The span of the region, specified in chromosomal coordinates chromosome.start-end

status

Status of the region, either "OK" if the region was passed on to finemapping, or "Failure" if the region was not successfully formed.

windowsize

The window size when determining a region. Region selection works by extending a window (in basepairs) around each genome-wide significant variable. If windows overlap each other, those windows get merged. These possibly merged windows are the resulting regions that are finemapped. In case a region is larger than the maximum allowed region size (currently 6 megabases), that region is retried with a smaller window. The final window size that is tried is the one showed here.

failure

Empty if the region was successful. In case the region was not successful, the reason will read here. Most likely the region was too long, and it could not be formed even when lowering the window size to its minimum value.

Regions were typically skipped if their merged size (after combining with proximal regions) was greater than the user-specified maximum allowed size (default 6Mb) and could not be successfully shrunk to individual regions >1Mb in size using this algorithm.

SuSiE outputs

Both 95% credible set and 99% credible sets are provided. The file with _99 contains 99% credsets as below. The SuSiE outputs have been annotated using the variant annotation file.

PHENONAME.SUSIE.cred.bgz and PHENONAME.SUSIE_99.cred.bgz

These files contains all of the credible sets for this phenotype. The credible sets are the 95% (PHENONAME.SUSIE.cred.bgz) and 99% (PHENONAME.SUSIE_99.cred.bgz) credible sets, i.e. under the model they have a 95% or 99% probability of containing the causal variant. The files are bgzipped tab-separated values file, with one credible set per line.

Contains credible set summaries from SuSiE fine-mapping for all genome-wide significant regions.

Column

Description

region

Region for which the fine-mapping was run

Running number for independent credible sets in a region

cs_log10bf

Log10 Bayes factor comparing the solution of this model (cs independent credible sets) to cs -1 credible sets.

cs_avg_r2

Average correlation R2 between variants in the credible set

cs_min_r2

Minimum R2 between variants in the credible set

cs_size

How many SNPs the credible set contains

PHENONAME.SUSIE.cred.summary.tsv, PHENONAME.SUSIE_99.cred.summary.tsv and PHENONAME.SUSIE_EXTEND.cred.summary.tsv

These files contain a summary of the credible sets for this phenotype. The credible sets are the 95% credible sets, i.e. under the model they have a 95% (PHENONAME.SUSIE.cred.summary.tsv) or 99% (PHENONAME.SUSIE_99.cred.summary.tsv) probability of containing the causal variant. The file PHENOTYPE.SUSIE_EXTEND.cred.summary.tsv contains the 95% credible set, but extended with the 99% credible set variants where possible. The files are tab-delimited with one credible set per line. The columns are described in the following table:

Column

Description

trait

Phenotype

region

Region for which the fine-mapping was run

Running number for independent credible sets in a region

cs_log10bf

Log10 Bayes factor comparing the solution of this model (cs independent credible sets) to cs -1 credible sets.

cs_avg_r2

Average correlation R2 between variants in the credible set

cs_min_r2

Minimum R2 between variants in the credible set

low_purity

boolean (TRUE, FALSE) indicator if the CS is low purity (low min R2)

cs_size

How many SNPs the credible set contains

good_cs

boolean (TRUE, FALSE) indicator if this CS is considered reliable. IF this is FALSE then top variant reported for the CS will be chosen based on minimum p-value in the credible set, otherwise the top variant is chosen by maximum PIP

cs_id

Credible set ID

Top variant (chr:pos:ref:alt). The top variant is the max PIP variant if the credible set has good_cs==TRUE, otherwise it is the min p variant.

Top variant p-value

beta

Top variant beta

Top variant standard deviation

prob

overall PIP of the variant in the region

cs_specific_prob

PIP of the variant in the current credible set (this and previous are typically almost identical)

0..n

Configured annotation columns. Typical default most_severe, gene_most_severe giving consequence and gene of top variant

PHENONAME.SUSIE.snp.bgz and PHENONAME.SUSIE_99.snp.bgz

This file contains SuSIE data for all of the variants in all of the regions. The files are tab-delimited and bgzipped and has a tabix index PHENONAME.SUSIE.snp.bgz.tbi and PHENONAME.SUSIE_99.snp.bgz.tbi. One line containts one variant. The columns are described in the table below.

Column

Description

trait

Phenotype

region

Region for which the fine-mapping was run

v, rsid

Variant IDs

chromosome

Chromosome no.

position

Position on the chromosome

allele1

Major allele

allele2

Minor allele

maf

Minor allele frequency

beta

Original marginal beta

Original standard error

Original p-value

mean

Posterior mean beta after fine-mapping

Posterior standard deviation after fine-mapping

prob

Posterior inclusion probability

Credible set index within region

lead_r2

R2 value for a lead variant (the one with maximum PIP) in a credible set

alphax

Posterior inclusion probability for the xth single effect (x := 1..L where L is the number of single effects/causal variants specified; default: L = 10).

PHENONAME.SUSIE.snp.filter.tsv, PHENONAME.SUSIE_99.snp.filter.tsv and PHENONAME.SUSIE_extend.snp.filter.tsv

This file contains the filtered SNPs for the 95% (PHENONAME.SUSIE.snp.filter.tsv) and 99% (PHENONAME.SUSIE_99.snp.filter.tsv) credible sets. Variants not included in the 95% or 99% credible sets are not included in the respective files. Neither are those that were part of low_purity credible sets. The file PHENONAME.SUSIE_extend.snp.filter.tsv contains the filtered SNPs for the 95% credible sets, extended with 99% credible set variants where applicable, and credible sets not included in the 95%/99% credible sets are not included in this file. Variants are listed one per line and the files are tab-delimited. The columns are described in the table below:

Column

Description

trait

Phenotype

region

Region for which the fine-mapping was run

Variant ID (chr:pos:ref:alt)

Running credible set ID within region

cs_specific_prob

Posterior inclusion probability for this CS

chromosome

Chromosome no.

position

Position on the chromosome

allele1

Major allele

allele2

Minor allele

maf

Minor allele frequency

beta

Original association beta

Original p-value

Original standard error

most_severe

Most severe consequence of the variant

gene_most_severe

Gene corresponding to most severe consequence

FINEMAP outputs

PHENONAME.FINEMAP.config.bgz

This file contains posterior summaries for all of the causal configuration, one per line. The columns are described in the following table. More information can be found at http://www.christianbenner.com/.

Column

Description

trait

Phenotype

region

Region for which the fine-mapping was run

rank

Rank of this configuration within a region

config

Causal variants in this configuration

prob

Probability across all n independent signal configurations

log10bf

Log10 Bayes factor for this configuration

odds

Odds for this configuration

How many independent signals are in this configuration

prob_norm_k

Probability of this configuration within k independent signals solution

SNP heritability of this solution

#NAME?

95% confidence interval limits of SNP heritability of this solution

mean

Marginalized shrinkage estimates of the posterior effect size mean

marginalized shrinkage estimates of the posterior effect standard deviation

PHENONAME.FINEMAP.region.bgz

This bgzipped, tab-delimited file contains all of the finemapped regions for the endpoint, one region per line. The columns are described in the following table. More information can be found at http://www.christianbenner.com/.

Column

Description

trait

Phenotype

region

Region for which the fine-mapping was run

h2g_snp or h2g

SNP heritability of this region

h2g_sd

Standard deviation of SNP heritability of this region

h2g_lower95

Lower limit of 95% CI for SNP heritability

h2g_upper95

Upper limit of 95% CI for SNP heritability

log10bf

Log10 Bayes factor compared against null (no signals in the region)

prob_xSNP

x columns for probabilities of different numbers of independent signals

expectedvalue

Expectation (average) of the number of signals

PHENOTYPE.FINEMAP.snp.bgz

This tab-delimited bgzipped file contains finemapping information for each of the snps that were finemapped with one variant per line. This file also has a tabix index named PHENOTYPE.FINEMAP.snp.bgz.tbi. The columns of the file are described in the table below.

Column

Description

trait

Phenotype

region

Region for which the fine-mapping was run

Variant

index

Running index

rsid

Variant ID

chromosome

Chromosome no.

position

Position on the chromosome

allele1

Major allele

allele2

Minor allele

maf

Minor allele frequency

beta

Original marginal beta (effect size)

Original standard error

Original z-score

prob

Posterior inclusion probability

log10bf

Log10 Bayes factor

mean

Marginalized shrinkage estimates of the posterior effect size mean

Marginalized shrinkage estimates of the posterior effect standard deviation

mean_incl

Conditional estimates of the posterior effect size mean

sd_incl

Conditional estimates of the posterior effect size standard deviation

Original p-value

csx

Credible set index for given number of causal variants x