Colocalization results format
Last updated
Was this helpful?
Last updated
Was this helpful?
The colocalization data contains data that was acquired by running a colocalization analysis between FinnGen data and other datasets.
The colocalization results are tab-separated files. The results are separated to different files, depending on the type of data produced.
More information about data sources, phenotypes and loci can be found in the page.
For information about the results format for colocalization results before DF13, using the older colocalization pipeline, see .
The files that are created from the susie-coloc pipeline are:
colocQC.tsv.gz
Colocalization summaries between FinnGen endpoints and other resources
coloc.credsets.tsv.gz
Credible set variants that were involved in filtered colocalization results
coloc.H4_tables.tsv.gz
Per-variant H4 posterior probabilities for variants in filtered colocalization results
unfiltered_summaries/FinnGen-R12-GWAS-----{Other resource}.sum.unfiltered.tsv.gz
Unfiltered colocalization summaries for a single resource
The colocalization summaries contain one colocalization per row. The colocalizations have been filtered to not include all of the possible colocalization pairs. The files contain the following columns:
dataset1
name of dataset 1
dataset2
name of dataset 2
trait1
name of trait 1
trait2
name of trait 2
region1
finemapped region of 1st trait
region2
finemapped region of 2nd trait
cs1
credible set index of 1st trait
cs2
credible set index of 2nd trait
nsnps
Amount of snps in overlap of regions 1 and 2
hit1
variant that coloc predicted to be the most likely causal variant in trait 1
hit2
variant that coloc predicted to be the most likely causal variant in trait 2
PP.H0.abf
posterior probability of hypothesis 0: No genetic association in either trait
PP.H1.abf
posterior probability of hypothesis 1: Genetic association in trait 1 only
PP.H2.abf
posterior probability of hypothesis 2: Genetic association in trait 2 only
PP.H3.abf
posterior probability of hypothesis 3: Both traits associated, but with different causal variants
PP.H4.abf
posterior probability of hypothesis 4: Both traits associated and share single causal variant
low_purity1
If credible set 1 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information
low_purity2
If credible set 2 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information
nsnps1
Amount of snps in region 1
nsnps2
Amount of snps in region 2
cs1_log10bf
log10 bayes factor of credible set 1
cs2_log10bf
log10 bayes factor of credible set 2
clpp
CLPP between credible sets
clpa
CLPA between credible sets
cs1_size
Number of variants in credible set 1
cs2_size
Number of variants in credible set 2
cs_overlap
Overlapping credible set variants in both credible sets
topInOverlap
Whether the maximum PIP variant was in overlap of regions or not, for both traits
probmass_1
Amount of PIP mass of credible set 1 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 1.
probmass_2
Amount of PIP mass of credible set 2 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 2.
hit1_info
Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma
hit2_info
Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma
colocRes
Name of the colocalization intermediate file that this row is in.
The credible set variants file contains the credible set variants for all of the credible sets that appear in the filtered colocalizations. They are listed only once per credible set. The file contains the following columns:
trait
Trait the credible set variant is from
region
Finemapping region
rsid
variant identifier, in format chromosome_position_reference allele_alternate allele
cs
credible set index
low_purity
Whether variants in credible set were not in high LD.
p
p-value of variant association
beta
effect size of variant association
se
standard error of variant association effect
cs_specific_prob
posterior inclusion probability of this variant in credible set
dataset
Dataset identifier
The H4 table contains the posterior probability of colocalization for all of the variants that were included in the overlap of regions of the colocalization. These variants are only available for the filtered colocalizations int he colocalization summary.
dataset1
dataset of trait1
dataset2
dataset of trait2
trait1
trait of first credible set
trait2
trait of second credible set
region1
finemapped region of first trait
region2
finemapped region of second trait
cs1
credible set index for credible set 1
cs2
credible set index for credible set 2
snp
snp identifier, in format chromosome_position_reference allele_alternate allele
SNP.PP.H4
Posterior probability of this variant being the causal variant in this colocalization, given that hypothesis 4 is true
These files contain the colocalization summaries that are not filtered. The columns are largely the same as in the filtered colocalization summaries, except that the dataset columns are not present, as those can be taken from the filename, and the colocRes column is missing as well, since it is the filename.The results contains the following columns:
trait1
name of trait 1
trait2
name of trait 2
region1
finemapped region of 1st trait
region2
finemapped region of 2nd trait
cs1
credible set index of 1st trait
cs2
credible set index of 2nd trait
nsnps
Amount of snps in overlap of regions 1 and 2
hit1
variant that coloc predicted to be the most likely causal variant in trait 1
hit2
variant that coloc predicted to be the most likely causal variant in trait 2
PP.H0.abf
posterior probability of hypothesis 0: No genetic association in either trait
PP.H1.abf
posterior probability of hypothesis 1: Genetic association in trait 1 only
PP.H2.abf
posterior probability of hypothesis 2: Genetic association in trait 2 only
PP.H3.abf
posterior probability of hypothesis 3: Both traits associated, but with different causal variants
PP.H4.abf
posterior probability of hypothesis 4: Both traits associated and share single causal variant
low_purity1
If credible set 1 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information
low_purity2
If credible set 2 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information
nsnps1
Amount of snps in region 1
nsnps2
Amount of snps in region 2
cs1_log10bf
log10 bayes factor of credible set 1
cs2_log10bf
log10 bayes factor of credible set 2
clpp
CLPP between credible sets
clpa
CLPA between credible sets
cs1_size
Number of variants in credible set 1
cs2_size
Number of variants in credible set 2
cs_overlap
Overlapping credible set variants in both credible sets
topInOverlap
Whether the maximum PIP variant was in overlap of regions or not, for both traits
probmass_1
Amount of PIP mass of credible set 1 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 1.
probmass_2
Amount of PIP mass of credible set 2 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 2.
hit1_info
Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma
hit2_info
Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma
/finngen/library-green/finngen_R12_analysis_data/colocalization/methods.pdf
More documentation about the data, including the columns and their descriptions, can be found in the release notes.
More information about the methods can be found in th page, as well as the methods document:
Read more about and