Colocalizations in FinnGen

Current primary co-localization method in FinnGen is Coloc method based on Susie finemapped results (Wallace et al. 2021). We also report previous colocalization approach metrics (CLPP and CLPA, see below). CLPP metric uses the probabilistic model for integrating GWAS and eQTL data presented in eCAVIAR (Hormozdiari et al. 2016). Compared to eCAVIAR, we are using SuSiE (Wang et al. 2019) to fine-map our inputs and provide an additional colocalization metric (CLPA). We use coloc version 5 to finemap the whole finemapped regions returned by SuSie.

Our goal is to extract a list of genomic regions that show colocalization between two phenotypes p1 and p2. Further, we assume that the summary statistics of p1 and p2 have been finemapped. The finemapping output for each phenotype contains three columns: the variant identifier (VAR), posterior inclusion probability (PIP), and the credible set (CS) identifier.

Methods

SuSie-Coloc

In the coloc framework, any pair of finemapped regions between two traits can be given one of the following five hypotheses:

H0: No association between either traits and the genomic regions

H1: Association between trait 1 and genomic region

H2: Association between trait 2 and genomic region

H3: Associations to traits 1 and 2, but different causal variants

H4: Associations to traits 1 and 2 for the same causal variant, i.e. colocalization.

One part of the colocalization results is posterior probabilities output by coloc for each of these hypotheses.

Since the move to implement colocalization using the Coloc package , the colocalization pipeline uses the per-variant log-bayes factors of the finemapped regions, instead of using only the credible set variants to identify the colocalization. This makes it possible to provide the posterior probability estimates for colocalization hypotheses that coloc creates. Not relying on credible set variants is more sensitive, for example if a FinnGen phenotype association is finemapped to a single credible set variant and that variant happens to be missing in other resources (e.g. UKBB pQTL data)

Quality control

Due to the move to coloc, and no longer requiring the signals credible set variants to overlap, we do some quality control filtering to the colocalization results:

  • We require the signals to have at least 90% of their PIP probability mass in the shared region of the colocalization. This measure lets us discard the colocalizations where the signals are not actually in the same region.

  • We require the posterior probability of colocalization, i.e. probability of hypothesis 4 to be at least 0.8.

  • We require the log bayes factors of the credible sets to be at least 0.9.

CLPP

The Causal Posterior Probability (CLPP) is computed between two credible sets cs1 and cs2, with cs1 coming from a given phenotype p1 and cs2 coming from phenotype p2. CLPP is defined as follows: For vectors x and y, containing the PIP for variants in cs1 and cs2, respectively, CLPP is calculated by

This CLPP calculation is similar to equation 8 in Hormozdiari et al. 2016.

CLPP is dependent on the credible set size. By definition, any credible set size > 1 will yield a CLPP < 1.

CLPA

We derived another colocalization metric called causal posterior agreement (CLPA) that is independent of credible set size.

The picture below shows how colocalizations are defined.

Example Comparison

This rough example shows why we mostly use CLPA since it is independent of sample size.

Data

The colocalization is performed between FinnGen endpoints as well as between FinnGen endpoints and various QTL resources, as shown in the image below.

These resources are listed below:

FinnGen resources

We use the following FinnGen data sources:

  • The SuSiE finemapping results for the release GWAS were used as the FinnGen data. This resource was used as the FinnGen endpoints to colocalize against every other resource.

  • FinnGen Somascan proteomics: EA5 proteomics, pQTLs using Somascan V4.1 assay, 865 unrelated samples from R12 imputed genotypes.

  • FinnGen Olink proteomics: EA5 proteomics, Olink proteomics pQTLs, released in 11th October 2023. Olink Explore 3072 library, 1732 unrelated samples from R12 imputed genotypes.

  • FinnGen Metabolomics data: mQTL data from NMR measurements for 34,218 samples.

  • FinnGen Sincle cell transciptomics: Finemapped results of the EA5 single cell transcriptomics data.

  • Kanta lab associations: 234 Lab value GWASes, finemapped.

Expression QTL datasets

  • GTEx v8: SuSiE fine-mapping, 49 tissues, donors of mixed ancestry, Aguet et al. (2019, BioRxiv) (49 tissues only involve tissues with a sample size of n >= 50). Fine-mapping performed by Hilary Finucane, Jacob Ulirsch, Masahiro Kanai from the Finucane Lab. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.

  • EMBL-EBI (European Bioinformatics Institute) eQTL catalogue datasets. eQTL data from 24 tissues/cell types, 16 RNAseq sources, 6 Microarray, SuSiE fine-mapping, donors of 88% European ancestry, Kerimov et al. (2021,Nature Genetics, doi: 10.1038/s41588-021-00924-w). For RNAseq data, four quantification methods (gene expression, exon expression, transcript usage, txrevise event usage). Fine-mapping was performed by Kaur Alasoo and Nurlan Kerimov. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.

  • FUSION study (RNAseq), muscle and adipose tissue.

  • Kolberg: mega-analysis of immune cells from the microarray datasets.

  • FinnLiver

Metabolom QTL datasets

- GeneRISK: 186 lipid species QTLs, SuSiE fine-mapping of Widen et al. (2020), 7632 Finnish samples. Effect size interpretation: change in standard deviation of the lipid species per alternate allele.

Biomarkers

- UK Biobank: 36 continuous endpoints, 57 biomarkers from UKBB prepared by Finucane lab, 361'194 White British samples, SuSiE fine-mapping. Effect size interpretation for quantitative traits: change in standard deviation of the normalized outcome per alternate allele. Effect size interpretation for binary traits increase in log(odds ratios) per alternate allele.

Release outputs

The following resources are released each release (with release number changing between releases):

File
Description

colocQC.tsv.gz

Colocalization summaries between FinnGen endpoints and other resources

coloc.credsets.tsv.gz

Credible set variants that were involved in filtered colocalization results

coloc.H4_tables.tsv.gz

Per-variant H4 posterior probabilities for variants in filtered colocalization results

unfiltered_summaries/FinnGen-R12-GWAS-----{Other resource}.sum.unfiltered.tsv.gz

Unfiltered colocalization summaries for a single resource

In addition to the datafiles, the following documentation is also included in each release:

Documentation

Description

methods.pdf

Description of colocalization method and data

data_dictionary.txt

Column description of all data files

readme.md

Colocalization release notes

Acknowledgements

We thank the following people for helping us assembling the QTL resources:

- Kaur Alasoo and Nurlan Kerimov provided us the fine-mapped EMBL-EBI eQTL catalogue datasets.

- Hilary Finucane, Jacob Ulirsch, Masahiro Kanai gave us access to their fine-mapped GTEx data.

Read more about colocalization and file format of the FinnGen colocalization results.

Last updated

Was this helpful?