# Colocalization results format

**The colocalization data contains data that was acquired by running a colocalization analysis between FinnGen data and other datasets.**

The colocalization results are tab-separated files. The results are separated to different files, depending on the type of data produced.

More information about data sources, phenotypes and loci can be found in the [Colocalization in FinnGen](https://docs.finngen.fi/finngen-data-specifics/green-library-data-aggregate-data/other-analyses-available/colocalizations) page.

For information about the results format for colocalization results before DF13, using the older colocalization pipeline, see [Colocalization results in colocalization before DF13](https://docs.finngen.fi/finngen-data-specifics/green-library-data-aggregate-data/core-analysis-results-files/colocalization-results-format/results-format-in-colocalization-before-df13).

The files that are created from the susie-coloc pipeline are:

| File                                                                              | Description                                                                            |
| --------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
| colocQC.tsv.gz                                                                    | Colocalization summaries between FinnGen endpoints and other resources                 |
| coloc.credsets.tsv.gz                                                             | Credible set variants that were involved in filtered colocalization results            |
| coloc.H4\_tables.tsv.gz                                                           | Per-variant H4 posterior probabilities for variants in filtered colocalization results |
| unfiltered\_summaries/FinnGen-R12-GWAS-----{Other resource}.sum.unfiltered.tsv.gz | Unfiltered colocalization summaries for a single resource                              |

#### Colocalization summaries

The colocalization summaries contain one colocalization per row. The colocalizations have been filtered to not include all of the possible colocalization pairs. The files contain the following columns:&#x20;

| Field        | Description                                                                                                                                                                        |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| dataset1     | name of dataset 1                                                                                                                                                                  |
| dataset2     | name of dataset 2                                                                                                                                                                  |
| trait1       | name of trait 1                                                                                                                                                                    |
| trait2       | name of trait 2                                                                                                                                                                    |
| region1      | finemapped region of 1st trait                                                                                                                                                     |
| region2      | finemapped region of 2nd trait                                                                                                                                                     |
| cs1          | credible set index of 1st trait                                                                                                                                                    |
| cs2          | credible set index of 2nd trait                                                                                                                                                    |
| nsnps        | Amount of snps in overlap of regions 1 and 2                                                                                                                                       |
| hit1         | variant that coloc predicted to be the most likely causal variant in trait 1                                                                                                       |
| hit2         | variant that coloc predicted to be the most likely causal variant in trait 2                                                                                                       |
| PP.H0.abf    | posterior probability of hypothesis 0: No genetic association in either trait                                                                                                      |
| PP.H1.abf    | posterior probability of hypothesis 1: Genetic association in trait 1 only                                                                                                         |
| PP.H2.abf    | posterior probability of hypothesis 2: Genetic association in trait 2 only                                                                                                         |
| PP.H3.abf    | posterior probability of hypothesis 3: Both traits associated, but with different causal variants                                                                                  |
| PP.H4.abf    | posterior probability of hypothesis 4: Both traits associated and share single causal variant                                                                                      |
| low\_purity1 | If credible set 1 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information                                            |
| low\_purity2 | If credible set 2 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information                                            |
| nsnps1       | Amount of snps in region 1                                                                                                                                                         |
| nsnps2       | Amount of snps in region 2                                                                                                                                                         |
| cs1\_log10bf | log10 bayes factor of credible set 1                                                                                                                                               |
| cs2\_log10bf | log10 bayes factor of credible set 2                                                                                                                                               |
| clpp         | CLPP between credible sets                                                                                                                                                         |
| clpa         | CLPA between credible sets                                                                                                                                                         |
| cs1\_size    | Number of variants in credible set 1                                                                                                                                               |
| cs2\_size    | Number of variants in credible set 2                                                                                                                                               |
| cs\_overlap  | Overlapping credible set variants in both credible sets                                                                                                                            |
| topInOverlap | Whether the maximum PIP variant was in overlap of regions or not, for both traits                                                                                                  |
| probmass\_1  | Amount of PIP mass of credible set 1 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 1. |
| probmass\_2  | Amount of PIP mass of credible set 2 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 2. |
| hit1\_info   | Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma                                                             |
| hit2\_info   | Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma                                                             |
| colocRes     | Name of the colocalization intermediate file that this row is in.                                                                                                                  |

#### credible set variants in colocalizations

The credible set variants file contains the credible set variants for all of the credible sets that appear in the filtered colocalizations. They are listed only once per credible set. The file contains the following columns:

| Field              | Description                                                                            |
| ------------------ | -------------------------------------------------------------------------------------- |
| trait              | Trait the credible set variant is from                                                 |
| region             | Finemapping region                                                                     |
| rsid               | variant identifier, in format chromosome\_position\_reference allele\_alternate allele |
| cs                 | credible set index                                                                     |
| low\_purity        | Whether variants in credible set were not in high LD.                                  |
| p                  | p-value of variant association                                                         |
| beta               | effect size of variant association                                                     |
| se                 | standard error of variant association effect                                           |
| cs\_specific\_prob | posterior inclusion probability of this variant in credible set                        |
| dataset            | Dataset identifier                                                                     |

#### Posterior probability of colocalization table

The H4 table contains the posterior probability of colocalization for all of the variants that were included in the overlap of regions of the colocalization. These variants are only available for the filtered colocalizations int he colocalization summary.

| Field     | Description                                                                                                            |
| --------- | ---------------------------------------------------------------------------------------------------------------------- |
| dataset1  | dataset of trait1                                                                                                      |
| dataset2  | dataset of trait2                                                                                                      |
| trait1    | trait of first credible set                                                                                            |
| trait2    | trait of second credible set                                                                                           |
| region1   | finemapped region of first trait                                                                                       |
| region2   | finemapped region of second trait                                                                                      |
| cs1       | credible set index for credible set 1                                                                                  |
| cs2       | credible set index for credible set 2                                                                                  |
| snp       | snp identifier, in format chromosome\_position\_reference allele\_alternate allele                                     |
| SNP.PP.H4 | Posterior probability of this variant being the causal variant in this colocalization, given that hypothesis 4 is true |

#### Unfiltered colocalization summaries

These files contain the colocalization summaries that are not filtered. The columns are largely the same as in the filtered colocalization summaries, except that the dataset columns are not present, as those can be taken from the filename, and the colocRes column is missing as well, since it is the filename.The results contains the following columns:

| Field        | Description                                                                                                                                                                        |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| trait1       | name of trait 1                                                                                                                                                                    |
| trait2       | name of trait 2                                                                                                                                                                    |
| region1      | finemapped region of 1st trait                                                                                                                                                     |
| region2      | finemapped region of 2nd trait                                                                                                                                                     |
| cs1          | credible set index of 1st trait                                                                                                                                                    |
| cs2          | credible set index of 2nd trait                                                                                                                                                    |
| nsnps        | Amount of snps in overlap of regions 1 and 2                                                                                                                                       |
| hit1         | variant that coloc predicted to be the most likely causal variant in trait 1                                                                                                       |
| hit2         | variant that coloc predicted to be the most likely causal variant in trait 2                                                                                                       |
| PP.H0.abf    | posterior probability of hypothesis 0: No genetic association in either trait                                                                                                      |
| PP.H1.abf    | posterior probability of hypothesis 1: Genetic association in trait 1 only                                                                                                         |
| PP.H2.abf    | posterior probability of hypothesis 2: Genetic association in trait 2 only                                                                                                         |
| PP.H3.abf    | posterior probability of hypothesis 3: Both traits associated, but with different causal variants                                                                                  |
| PP.H4.abf    | posterior probability of hypothesis 4: Both traits associated and share single causal variant                                                                                      |
| low\_purity1 | If credible set 1 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information                                            |
| low\_purity2 | If credible set 2 has low purity. In short, not all variants in signal have high LD. See finemapping documentation for more information                                            |
| nsnps1       | Amount of snps in region 1                                                                                                                                                         |
| nsnps2       | Amount of snps in region 2                                                                                                                                                         |
| cs1\_log10bf | log10 bayes factor of credible set 1                                                                                                                                               |
| cs2\_log10bf | log10 bayes factor of credible set 2                                                                                                                                               |
| clpp         | CLPP between credible sets                                                                                                                                                         |
| clpa         | CLPA between credible sets                                                                                                                                                         |
| cs1\_size    | Number of variants in credible set 1                                                                                                                                               |
| cs2\_size    | Number of variants in credible set 2                                                                                                                                               |
| cs\_overlap  | Overlapping credible set variants in both credible sets                                                                                                                            |
| topInOverlap | Whether the maximum PIP variant was in overlap of regions or not, for both traits                                                                                                  |
| probmass\_1  | Amount of PIP mass of credible set 1 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 1. |
| probmass\_2  | Amount of PIP mass of credible set 2 that was in region that was overlapping between the regions. Low value implies that the colocalization can not capture the signal in trait 2. |
| hit1\_info   | Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma                                                             |
| hit2\_info   | Information about the most likely causal variant as predicted by coloc. Values are beta and p-value separated by comma                                                             |

More information about the methods can be found in th [Colocalization in FinnGen](https://docs.finngen.fi/finngen-data-specifics/green-library-data-aggregate-data/other-analyses-available/colocalizations) page, as well as the methods document:

`/finngen/library-green/finngen_R12_analysis_data/colocalization/methods.pdf`

\
More documentation about the data, including the columns and their descriptions, can be found in the release notes.

Read more about [Colocalization](https://docs.finngen.fi/background-reading/colocalization) and [Colocalization in FinnGen](https://docs.finngen.fi/finngen-data-specifics/green-library-data-aggregate-data/other-analyses-available/colocalizations)
