Finemapping of Custom GWAS analyses

This page explains the following:

Background

Finemapping is a method used to narrow down the region of interest to identify the most likely causal variants in a given genomic locus. It attempts to find the most likely causal variant and the "credible set", a set of variants which has a high probability of containing the causal variant. See Finemapping for more details.

Finemapping process

The finemapping process consists of two steps: Region selection and actual fine-mapping of the selected regions.

Region selection algorithm

In short, region selection selects the regions that have genome-wide significant variants for finemapping. Sometimes regions can be too large to finemap, in which case those regions will be marked as not possible to finemap. By default, the HLA region (chr6:25,000,000-34,000,000 inclusive) is skipped, due to the high variability and complexity of this region.

In more detail, the region selection algorithm works in the following way: Taking the summary statistics as input, the region selection algorithm expands a window region around each genome-wide significant variant, with window size of 3Mb (lead variant position ) and P-value threshold of 5x10^-8. If any of these windows overlap between different regions, the regions are combined into a single region. If a window overlaps the HLA region (25Mb to 34Mb, inclusive, on chromosome 6) the overlapping section is removed prior to finemapping, due to the difficulty of finemapping such a variant-rich region.

It is possible for regions defined in this way to become very large due regions of long range LD or multiple loci close together. Therefore, the finemapping pipelines impose a maximum region width, set by the user, with a default width of 6Mb. In the cases where the maximum width is exceeded, the original 3Mb (±1.5Mb) windows are shrunk by 10% and then nearby regions are again merged into a single region. This process repeats, with the region windows being iteratively shrunk by 10% and then merged if overlapping, until no merged region exceeds the maximum region width. We recommend that the user does not set the maximum region width to more than 10Mb, as large regions are likely to cause workflow failure due to memory limitations.

Note In older versions of the finemapping pipelines, a lower limit of 1Mb window size was imposed and any regions reduced to <1Mb in size were skipped and users were notified in the results region status file. This is not implemented in the current pipelines.

Fine-mapping of regions

These regions are then finemapped using both FINEMAP and SuSiE. More information about the methods can be found both in the release finemapping documentation in release data bucket, e.g. at /finngen/library-green/finngen_R12/finngen_R12_analysis_documentation/finngen_R12_finemap.md for R12, as well as the finemapping pipeline repository here.

What variants are included in the finemapping process?

Finemapping is performed on variants inside a region that fill the following prerequisites:

They are included in the GWAS summary statistic for that endpoint
Their INFO score for the data release was greater than 0.6
They are not in the HLA region (chr6:25,000,000-34,000,000 inclusive)

How to finemap your custom GWAS results?

The preferred way to get your custom GWAS endpoint finemapped is with the unmodifiable finemapping pipeline. After a successful execution, the results will appear in the userresults browser and in your endpoint's green library folder as before.

While we recommend using the unmodifiable finemapping pipeline, there is also the older modifiable finemapping pipeline which can be used, for example, for finemapping custom regions thay you explicitly define. In contrast to the unmodifiable pipeline the results from the modifiable pipeline won't be automatically uploaded to the green library folder, and accessing results outside of sandbox requires a download request.

How to access the results?

Unmodifiable pipeline

The finemapping results from the unmodifiable pipeline are available in two places: In the userresults browser, as well as in the green library. The files can be found under /finngen/library-green/finngen_RX/sandbox_custom_gwas/YOUR_ENDPOINT_NAME/finemap/ where X is the FinnGen release (e.g. 12).

Finemapped endpoints are automatically loaded to the userresults PheWeb browser. You can find the finemap data when examining a single genome-wide significant region or in the Credible Sets Tab right below the manhattan plot. Finemapped results, unfortunately, not yet yet in the phenotype view.

You can get to individual regions to by first going to your endpoint in userresults browser, and then either clicking on a GWAS peak in the manhattan plot, or on the 'locus' link in the table, like in the below image.

In the region view, the credible set data should show as both a listing of how many signals were found on both SuSiE and FINEMAP, as well as a locuszoom plot. These have been highlighted with red in the image below.

You can find the finemapping data in the green library in the folder /finngen/library-green/finngen_R12/sandbox_custom_gwas/PHENOTYPE/finemap, given release 12 and phenotype PHENOTYPE. Note that if this phenotype has not been finemapped, the finemap subfolder will not exist.

Modifiable pipeline

If you used the modifiable finemapping pipeline to generate your results you need to do a download request to get the results out from the Sandbox. Unfortunately there is no automatic upload of the results to green library or userresults browser.

How are the results structured?

The finemapping results follow the structure of the finemapping results of the core analysis results. If you have used the unmodifibale finemapping pipeline, the results can be found in /finngen/library-green/finngen_RX/sandbox_custom_gwas/PHENOTYPE/finemap where X is the FinnGen release (e.g. 12) and PHENOTYPE is the name of your phenotype. Some of the files are on this top-level directory, while some are in nested directories. The folder contains region selection outputs, FINEMAP and SuSiE outputs.

Here is a table describing each of those files or directories:

Filename

Description

had_results

This file tells if there were any regions to finemap in your endpoint. It will contain the text "True" if there were regions that were sent to finemapping, and "False" if there were no regions to finemap. Having regions to finemap in this context means the endpoint had genome-wide significant (GWS) variants.

PHENOTYPE.region_status

This tab-separated file (TSV) shows a brief summary of the regions identified in region selection. See here for format of this file.

too_many_regions

This file contains the word "True" if your endpoint contained too many regions to finemap (currently the limit is set to 300 regions).

finemap/

This folder contains the finemapped results of FINEMAP

susie/

This folder contains the finemapped results of SuSiE

Next, the contents of the region status file, as well as finemap and susie folders are described.

finemap folder

The finemap folder contains the files and folders listed in the table below. The file contents and formats are described in the Finemapping results format page.

Filename

Description

PHENOTYPE.FINEMAP.config.bgz

A bgzipped, tab-separated file containing the posterior summaries for each causal configuration, one per line

PHENOTYPE.FINEMAP.region.bgz

A bgzipped, tab-separated file containing each region and the probabilities of the predicted causal variant configurations

PHENOTYPE.FINEMAP.snp.bgz

A bgzipped, tab-separated file containing the credible set status for each of the snps in the finemapped regions.

PHENOTYPE.FINEMAP.snp.bgz.tbi

A tabix index file for the snp file

cred_regions/

A folder containing the individual credible set predictions, with one file per model with amount of k causal SNPs. For example, a file ending with .cred3 has the predictions for the scenario that there are 3 independent causal variants in the region, and therefore 3 credible sets in the region

susie folder

The susie folder contains the files listed in the table below. folder contains the files and folders listed in the table below. The file contents and formats are described in the Finemapping results format page.

Filename

Description

PHENOTYPE.SUSIE.cred.bgz

A bgzipped TSV, containing SUSIE per-credible set output.

PHENOTYPE.SUSIE.cred.summary.tsv

A summary of the SUSIE credible set output, in TSV form.

PHENOTYPE.SUSIE.cred_99.bgz

A bgzipped TSV, containing SUSIE per-credible set output for 99% credible sets.

PHENOTYPE.SUSIE.snp.bgz

SUSIE output for every variant in the regions inspected

PHENOTYPE.SUSIE.snp.bgz.tbi

A tabix index file for PHENOTYPE.SUSIE.snp.gbz

PHENOTYPE.SUSIE.snp.filter.tsv

Filtered SUSIE SNP output for 95% credible sets

PHENOTYPE.SUSIE_99.cred.summary.tsv

A summary of the SUSIE 99% credible set output, in TSV form.

PHENOTYPE.SUSIE_99.snp.filter.tsv

Filtered SUSIE SNP output for 99% credible sets, in TSV form.

PHENOTYPE.SUSIE_EXTEND.cred.summary.tsv

A summary of the SUSIE 95% credible sets extended with 99% variants, in TSV form.

PHENOTYPE.SUSIE_EXTEND.snp.filter.tsv

Filtered SUSIE SNP output for 95% credible sets, extended with 99% CS variants, in TSV form.

PreviousHow to make your summary stats viewable in a PheWeb-style?NextPheWeb Users Input Validator tool

Last updated 5 months ago

Was this helpful?

Background

​Finemapping process

Region selection algorithm

Fine-mapping of regions

What variants are included in the finemapping process?

How to finemap your custom GWAS results?

How to access the results?

Unmodifiable pipeline

Modifiable pipeline

How are the results structured?

finemap folder

susie folder

Finemapping process