Finemapping of Custom GWAS analyses
This page explains the following:
Background
Finemapping is a method used to narrow down the region of interest to identify the most likely causal variants in a given genomic locus. It attempts to find the most likely causal variant and the "credible set", a set of variants which has a high probability of containing the causal variant. See Finemapping for more details.
Finemapping process
The finemapping process consists of two steps: Region selection and actual fine-mapping of the selected regions.
Region selection algorithm
In short, region selection selects the regions that have genome-wide significant variants for finemapping. Sometimes regions can be too large to finemap, in which case those regions will be marked as not possible to finemap. By default, the HLA region (chr6:25,000,000-34,000,000 inclusive) is skipped, due to the high variability and complexity of this region.
In more detail, the region selection algorithm works in the following way: Taking the summary statistics as input, the region selection algorithm expands a window region around each genome-wide significant variant, with window size of 3Mb (lead variant position ) and P-value threshold of 5x10-8. If any of these windows overlap between different regions, the regions are combined into a single region. If a window overlaps the HLA region (25Mb to 34Mb, inclusive, on chromosome 6) the overlapping section is removed prior to finemapping, due to the difficulty of finemapping such a variant-rich region.
It is possible for regions defined in this way to become very large due regions of long range LD or multiple loci close together. Therefore, the finemapping pipelines impose a maximum region width, set by the user, with a default width of 6Mb. In the cases where the maximum width is exceeded, the original 3Mb (±1.5Mb) windows are shrunk by 10% and then nearby regions are again merged into a single region. This process repeats, with the region windows being iteratively shrunk by 10% and then merged if overlapping, until no merged region exceeds the maximum region width. We recommend that the user does not set the maximum region width to more than 10Mb, as large regions are likely to cause workflow failure due to memory limitations.
Note In older versions of the finemapping pipelines, a lower limit of 1Mb window size was imposed and any regions reduced to <1Mb in size were skipped and users were notified in the results region status file. This is not implemented in the current pipelines.
Fine-mapping of regions
These regions are then finemapped using both FINEMAP and SuSiE. More information about the methods can be found both in the release finemapping documentation in release data bucket, e.g. at /finngen/library-green/finngen_R12/finngen_R12_analysis_documentation/finngen_R12_finemap.md
for R12, as well as the finemapping pipeline repository here.
What variants are included in the finemapping process?
Finemapping is performed on variants inside a region that fill the following prerequisites:
They are included in the GWAS summary statistic for that endpoint
Their INFO score for the data release was greater than 0.6
They are not in the HLA region (chr6:25,000,000-34,000,000 inclusive)
How to finemap your custom GWAS results?
The preferred way to get your custom GWAS endpoint finemapped is with the unmodifiable finemapping pipeline. After a successful execution, the results will appear in the userresults browser and in your endpoint's green library folder as before.
While we recommend using the unmodifiable finemapping pipeline, there is also the older modifiable finemapping pipeline which can be used, for example, for finemapping custom regions thay you explicitly define. In contrast to the unmodifiable pipeline the results from the modifiable pipeline won't be automatically uploaded to the green library folder, and accessing results outside of sandbox requires a download request.
How to access the results?
Unmodifiable pipeline
The finemapping results from the unmodifiable pipeline are available in two places: In the userresults browser, as well as in the green library. The files can be found under /finngen/library-green/finngen_RX/sandbox_custom_gwas/YOUR_ENDPOINT_NAME/finemap/
where X
is the FinnGen release (e.g. 12).
Finemapped endpoints are automatically loaded to the userresults PheWeb browser. You can find the finemap data when examining a single genome-wide significant region or in the Credible Sets Tab right below the manhattan plot. Finemapped results, unfortunately, not yet yet in the phenotype view.
You can get to individual regions to by first going to your endpoint in userresults browser, and then either clicking on a GWAS peak in the manhattan plot, or on the 'locus' link in the table, like in the below image.

In the region view, the credible set data should show as both a listing of how many signals were found on both SuSiE and FINEMAP, as well as a locuszoom plot. These have been highlighted with red in the image below.

You can find the finemapping data in the green library in the folder /finngen/library-green/finngen_R12/sandbox_custom_gwas/PHENOTYPE/finemap
, given release 12 and phenotype PHENOTYPE. Note that if this phenotype has not been finemapped, the finemap
subfolder will not exist.
Modifiable pipeline
If you used the modifiable finemapping pipeline to generate your results you need to do a download request to get the results out from the Sandbox. Unfortunately there is no automatic upload of the results to green library or userresults browser.
How are the results structured?
The finemapping results follow the structure of the finemapping results of the core analysis results. If you have used the unmodifibale finemapping pipeline, the results can be found in /finngen/library-green/finngen_RX/sandbox_custom_gwas/PHENOTYPE/finemap
where X
is the FinnGen release (e.g. 12) and PHENOTYPE
is the name of your phenotype. Some of the files are on this top-level directory, while some are in nested directories. The folder contains region selection outputs, FINEMAP and SuSiE outputs.
Here is a table describing each of those files or directories:
had_results
This file tells if there were any regions to finemap in your endpoint. It will contain the text "True" if there were regions that were sent to finemapping, and "False" if there were no regions to finemap. Having regions to finemap in this context means the endpoint had genome-wide significant (GWS) variants.
PHENOTYPE.region_status
This tab-separated file (TSV) shows a brief summary of the regions identified in region selection. See here for format of this file.
too_many_regions
This file contains the word "True" if your endpoint contained too many regions to finemap (currently the limit is set to 300 regions).
finemap/
This folder contains the finemapped results of FINEMAP
susie/
This folder contains the finemapped results of SuSiE
Next, the contents of the region status file, as well as finemap and susie folders are described.
finemap folder
The finemap
folder contains the files and folders listed in the table below. The file contents and formats are described in the Finemapping results format page.
PHENOTYPE.FINEMAP.config.bgz
A bgzipped, tab-separated file containing the posterior summaries for each causal configuration, one per line
PHENOTYPE.FINEMAP.region.bgz
A bgzipped, tab-separated file containing each region and the probabilities of the predicted causal variant configurations
PHENOTYPE.FINEMAP.snp.bgz
A bgzipped, tab-separated file containing the credible set status for each of the snps in the finemapped regions.
PHENOTYPE.FINEMAP.snp.bgz.tbi
A tabix index file for the snp file
cred_regions/
A folder containing the individual credible set predictions, with one file per model with amount of k causal SNPs. For example, a file ending with .cred3 has the predictions for the scenario that there are 3 independent causal variants in the region, and therefore 3 credible sets in the region
susie folder
The susie
folder contains the files listed in the table below. folder contains the files and folders listed in the table below. The file contents and formats are described in the Finemapping results format page.
PHENOTYPE.SUSIE.cred.bgz
A bgzipped TSV, containing SUSIE per-credible set output.
PHENOTYPE.SUSIE.cred.summary.tsv
A summary of the SUSIE credible set output, in TSV form.
PHENOTYPE.SUSIE.cred_99.bgz
A bgzipped TSV, containing SUSIE per-credible set output for 99% credible sets.
PHENOTYPE.SUSIE.snp.bgz
SUSIE output for every variant in the regions inspected
PHENOTYPE.SUSIE.snp.bgz.tbi
A tabix index file for PHENOTYPE.SUSIE.snp.gbz
PHENOTYPE.SUSIE.snp.filter.tsv
Filtered SUSIE SNP output for 95% credible sets
PHENOTYPE.SUSIE_99.cred.summary.tsv
A summary of the SUSIE 99% credible set output, in TSV form.
PHENOTYPE.SUSIE_99.snp.filter.tsv
Filtered SUSIE SNP output for 99% credible sets, in TSV form.
PHENOTYPE.SUSIE_EXTEND.cred.summary.tsv
A summary of the SUSIE 95% credible sets extended with 99% variants, in TSV form.
PHENOTYPE.SUSIE_EXTEND.snp.filter.tsv
Filtered SUSIE SNP output for 95% credible sets, extended with 99% CS variants, in TSV form.
Last updated
Was this helpful?