Colocalization results format

The colocalization data contains data that was acquired by running a colocalization analysis between FinnGen data and other datasets.

The colocalization results are tab-separated files. The results are separated to different files, depending on the type of data produced.

More information about data sources, phenotypes and loci can be found in the Colocalization in FinnGen page.

For information about the results format for colocalization results before DF13, using the older colocalization pipeline, see Colocalization results in colocalization before DF13.

The files that are created from the susie-coloc pipeline are:

File
Description

colocQC.tsv.gz

Colocalization summaries between FinnGen endpoints and other resources

coloc.credsets.tsv.gz

Credible set variants that were involved in filtered colocalization results

coloc.H4_tables.tsv.gz

Per-variant H4 posterior probabilities for variants in filtered colocalization results

unfiltered_summaries/FinnGen-R12-GWAS-----{Other resource}.sum.unfiltered.tsv.gz

Unfiltered colocalization summaries for a single resource

Colocalization summaries

The colocalization summaries contain one colocalization per row. The colocalizations have been filtered to not include all of the possible colocalization pairs. The files contain the following columns:

Field
Description

dataset1

name of dataset 1

dataset2

name of dataset 2

trait1

name of trait 1

trait2

name of trait 2

region1

finemapped region of 1st trait

region2

finemapped region of 2nd trait

cs1

credible set index of 1st trait

cs2

credible set index of 2nd trait

nsnps

Amount of snps in overlap of regions 1 and 2

hit1

variant that coloc predicted to be the most likely causal variant in trait 1

hit2

variant that coloc predicted to be the most likely causal variant in trait 2

PP.H0.abf

posterior probability of hypothesis 0: No genetic association in either trait

PP.H1.abf

posterior probability of hypothesis 1: Genetic association in trait 1 only

PP.H2.abf

posterior probability of hypothesis 2: Genetic association in trait 2 only

PP.H3.abf

posterior probability of hypothesis 3: Both traits associated, but with different causal variants

PP.H4.abf

posterior probability of hypothesis 4: Both traits associated and share single causal variant

low_purity1

If credible set 1 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information

low_purity2

If credible set 2 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information

nsnps1

Amount of snps in region 1

nsnps2

Amount of snps in region 2

cs1_log10bf

log10 bayes factor of credible set 1

cs2_log10bf

log10 bayes factor of credible set 2

clpp

CLPP between credible sets

clpa

CLPA between credible sets

cs1_size

Number of variants in credible set 1

cs2_size

Number of variants in credible set 2

cs_overlap

Overlapping credible set variants in both credible sets

topInOverlap

Whether the maximum PIP variant was in overlap of regions or not, for both traits

probmass_1

Amount of PIP mass of credible set 1 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 1.

probmass_2

Amount of PIP mass of credible set 2 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 2.

hit1_info

Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma

hit2_info

Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma

colocRes

Name of the colocalization intermediate file that this row is in.

credible set variants in colocalizations

The credible set variants file contains the credible set variants for all of the credible sets that appear in the filtered colocalizations. They are listed only once per credible set. The file contains the following columns:

Field
Description

trait

Trait the credible set variant is from

region

Finemapping region

rsid

variant identifier, in format chromosome_position_reference allele_alternate allele

cs

credible set index

low_purity

Whether variants in credible set were not in high LD.

p

p-value of variant association

beta

effect size of variant association

se

standard error of variant association effect

cs_specific_prob

posterior inclusion probability of this variant in credible set

dataset

Dataset identifier

Posterior probability of colocalization table

The H4 table contains the posterior probability of colocalization for all of the variants that were included in the overlap of regions of the colocalization. These variants are only available for the filtered colocalizations int he colocalization summary.

Field
Description

dataset1

dataset of trait1

dataset2

dataset of trait2

trait1

trait of first credible set

trait2

trait of second credible set

region1

finemapped region of first trait

region2

finemapped region of second trait

cs1

credible set index for credible set 1

cs2

credible set index for credible set 2

snp

snp identifier, in format chromosome_position_reference allele_alternate allele

SNP.PP.H4

Posterior probability of this variant being the causal variant in this colocalization, given that hypothesis 4 is true

Unfiltered colocalization summaries

These files contain the colocalization summaries that are not filtered. The columns are largely the same as in the filtered colocalization summaries, except that the dataset columns are not present, as those can be taken from the filename, and the colocRes column is missing as well, since it is the filename.The results contains the following columns:

Field
Description

trait1

name of trait 1

trait2

name of trait 2

region1

finemapped region of 1st trait

region2

finemapped region of 2nd trait

cs1

credible set index of 1st trait

cs2

credible set index of 2nd trait

nsnps

Amount of snps in overlap of regions 1 and 2

hit1

variant that coloc predicted to be the most likely causal variant in trait 1

hit2

variant that coloc predicted to be the most likely causal variant in trait 2

PP.H0.abf

posterior probability of hypothesis 0: No genetic association in either trait

PP.H1.abf

posterior probability of hypothesis 1: Genetic association in trait 1 only

PP.H2.abf

posterior probability of hypothesis 2: Genetic association in trait 2 only

PP.H3.abf

posterior probability of hypothesis 3: Both traits associated, but with different causal variants

PP.H4.abf

posterior probability of hypothesis 4: Both traits associated and share single causal variant

low_purity1

If credible set 1 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information

low_purity2

If credible set 2 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information

nsnps1

Amount of snps in region 1

nsnps2

Amount of snps in region 2

cs1_log10bf

log10 bayes factor of credible set 1

cs2_log10bf

log10 bayes factor of credible set 2

clpp

CLPP between credible sets

clpa

CLPA between credible sets

cs1_size

Number of variants in credible set 1

cs2_size

Number of variants in credible set 2

cs_overlap

Overlapping credible set variants in both credible sets

topInOverlap

Whether the maximum PIP variant was in overlap of regions or not, for both traits

probmass_1

Amount of PIP mass of credible set 1 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 1.

probmass_2

Amount of PIP mass of credible set 2 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 2.

hit1_info

Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma

hit2_info

Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma

More information about the methods can be found in th Colocalization in FinnGen page, as well as the methods document:

/finngen/library-green/finngen_R12_analysis_data/colocalization/methods.pdf

More documentation about the data, including the columns and their descriptions, can be found in the release notes.

Read more about Colocalization and Colocalization in FinnGen

Last updated

Was this helpful?