Colocalizations in FinnGen
Current primary co-localization method in FinnGen is Coloc method based on Susie finemapped results (Wallace et al. 2021). We also report previous colocalization approach metrics (CLPP and CLPA, see below). CLPP metric uses the probabilistic model for integrating GWAS and eQTL data presented in eCAVIAR (Hormozdiari et al. 2016). Compared to eCAVIAR, we are using SuSiE (Wang et al. 2019) to fine-map our inputs and provide an additional colocalization metric (CLPA). We use coloc version 5 to finemap the whole finemapped regions returned by SuSie.
Our goal is to extract a list of genomic regions that show colocalization between two phenotypes p1 and p2. Further, we assume that the summary statistics of p1 and p2 have been finemapped. The finemapping output for each phenotype contains three columns: the variant identifier (VAR), posterior inclusion probability (PIP), and the credible set (CS) identifier.
Methods
SuSie-Coloc
In the coloc framework, any pair of finemapped regions between two traits can be given one of the following five hypotheses:
H0: No association between either traits and the genomic regions
H1: Association between trait 1 and genomic region
H2: Association between trait 2 and genomic region
H3: Associations to traits 1 and 2, but different causal variants
H4: Associations to traits 1 and 2 for the same causal variant, i.e. colocalization.
One part of the colocalization results is posterior probabilities output by coloc for each of these hypotheses.
Since the move to implement colocalization using the Coloc package , the colocalization pipeline uses the per-variant log-bayes factors of the finemapped regions, instead of using only the credible set variants to identify the colocalization. This makes it possible to provide the posterior probability estimates for colocalization hypotheses that coloc creates. Not relying on credible set variants is more sensitive, for example if a FinnGen phenotype association is finemapped to a single credible set variant and that variant happens to be missing in other resources (e.g. UKBB pQTL data)
Quality control
Due to the move to coloc, and no longer requiring the signals credible set variants to overlap, we do some quality control filtering to the colocalization results:
We require the signals to have at least 90% of their PIP probability mass in the shared region of the colocalization. This measure lets us discard the colocalizations where the signals are not actually in the same region.
We require the posterior probability of colocalization, i.e. probability of hypothesis 4 to be at least 0.8.
We require the log bayes factors of the credible sets to be at least 0.9.
CLPP
The Causal Posterior Probability (CLPP) is computed between two credible sets cs1 and cs2, with cs1 coming from a given phenotype p1 and cs2 coming from phenotype p2. CLPP is defined as follows: For vectors x and y, containing the PIP for variants in cs1 and cs2, respectively, CLPP is calculated by

This CLPP calculation is similar to equation 8 in Hormozdiari et al. 2016.
CLPP is dependent on the credible set size. By definition, any credible set size > 1 will yield a CLPP < 1.
CLPA
We derived another colocalization metric called causal posterior agreement (CLPA) that is independent of credible set size.

The picture below shows how colocalizations are defined.

Example Comparison
This rough example shows why we mostly use CLPA since it is independent of sample size.

Data
The colocalization is performed between FinnGen endpoints as well as between FinnGen endpoints and various QTL resources, as shown in the image below.

These resources are listed below:
FinnGen resources
We use the following FinnGen data sources:
The SuSiE finemapping results for the release GWAS were used as the FinnGen data. This resource was used as the FinnGen endpoints to colocalize against every other resource.
FinnGen Somascan proteomics: EA5 proteomics, pQTLs using Somascan V4.1 assay, 865 unrelated samples from R12 imputed genotypes.
FinnGen Olink proteomics: EA5 proteomics, Olink proteomics pQTLs, released in 11th October 2023. Olink Explore 3072 library, 1732 unrelated samples from R12 imputed genotypes.
FinnGen Metabolomics data: mQTL data from NMR measurements for 34,218 samples.
FinnGen Sincle cell transciptomics: Finemapped results of the EA5 single cell transcriptomics data.
Kanta lab associations: 234 Lab value GWASes, finemapped.
Expression QTL datasets
GTEx v8: SuSiE fine-mapping, 49 tissues, donors of mixed ancestry, Aguet et al. (2019, BioRxiv) (49 tissues only involve tissues with a sample size of n >= 50). Fine-mapping performed by Hilary Finucane, Jacob Ulirsch, Masahiro Kanai from the Finucane Lab. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.
EMBL-EBI (European Bioinformatics Institute) eQTL catalogue datasets. eQTL data from 24 tissues/cell types, 16 RNAseq sources, 6 Microarray, SuSiE fine-mapping, donors of 88% European ancestry, Kerimov et al. (2021,Nature Genetics, doi: 10.1038/s41588-021-00924-w). For RNAseq data, four quantification methods (gene expression, exon expression, transcript usage, txrevise event usage). Fine-mapping was performed by Kaur Alasoo and Nurlan Kerimov. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.
FUSION study (RNAseq), muscle and adipose tissue.
Kolberg: mega-analysis of immune cells from the microarray datasets.
FinnLiver
Metabolom QTL datasets
- GeneRISK: 186 lipid species QTLs, SuSiE fine-mapping of Widen et al. (2020), 7632 Finnish samples. Effect size interpretation: change in standard deviation of the lipid species per alternate allele.
Biomarkers
- UK Biobank: 36 continuous endpoints, 57 biomarkers from UKBB prepared by Finucane lab, 361'194 White British samples, SuSiE fine-mapping. Effect size interpretation for quantitative traits: change in standard deviation of the normalized outcome per alternate allele. Effect size interpretation for binary traits increase in log(odds ratios) per alternate allele.
Release outputs
The following resources are released each release (with release number changing between releases):
colocQC.tsv.gz
Colocalization summaries between FinnGen endpoints and other resources
coloc.credsets.tsv.gz
Credible set variants that were involved in filtered colocalization results
coloc.H4_tables.tsv.gz
Per-variant H4 posterior probabilities for variants in filtered colocalization results
unfiltered_summaries/FinnGen-R12-GWAS-----{Other resource}.sum.unfiltered.tsv.gz
Unfiltered colocalization summaries for a single resource
In addition to the datafiles, the following documentation is also included in each release:
Documentation
Description
methods.pdf
Description of colocalization method and data
data_dictionary.txt
Column description of all data files
readme.md
Colocalization release notes
Acknowledgements
We thank the following people for helping us assembling the QTL resources:
- Kaur Alasoo and Nurlan Kerimov provided us the fine-mapped EMBL-EBI eQTL catalogue datasets.
- Hilary Finucane, Jacob Ulirsch, Masahiro Kanai gave us access to their fine-mapped GTEx data.
Read more about colocalization and file format of the FinnGen colocalization results.
Last updated
Was this helpful?