LogoLogo
FinnGen Handbook
  • Introduction
  • Where to begin
    • Quick guides
      • New to FinnGen
      • Green data users
      • Red data users
    • I'm new to FinnGen, where is the best place for me to start?
    • What kind of questions can I ask of FinnGen data?
    • How do I make a custom endpoint?
    • How do I run a GWAS of a phenotype I created myself?
    • I'm interested in FinnGen rare variant phenotypes
  • Background Concepts
    • Basics of Genetics
    • Linkage Disequilibrium (LD)
    • Genotype Imputation
    • Genotype Data Processing and Quality Control (QC)
    • GWAS Analysis
    • P Values
    • Heritability and genetic correlations
    • Finemapping
    • Conditional analysis
    • Colocalization
    • Using Polygenic Risk Scores
    • PheWAS analysis
    • Survival analysis
    • Longitudinal Data Analysis
    • GWAS Association to Biological Function
    • Genetic Data Resources outside FinnGen
    • Getting Started with Unix
    • Getting Started with R
    • Structure of the FinnGen project
    • Finnish gene pool and health register data
  • FinnGen Data Specifics
    • FinnGen Data Freezes and Releases
    • Analysis proposals
      • What is a FinnGen analysis proposal and when do I need to submit one?
      • How do I submit an analysis proposal?
      • How are analysis proposals handled?
      • What is a FinnGen bespoke analysis proposal and when do I need to submit one?
      • How do I submit a bespoke analysis proposal?
      • How are bespoke analysis proposals handled?
      • What is the difference between FinnGen analysis proposals and FinnGen bespoke analyses?
      • Existing analysis proposals
    • Finnish Health Registries and Medical Coding
      • Finnish health registries
      • Register data pre-processing
      • Data Masking/Blurring of Visit Dates
      • International and Finnish Health Code Sets
      • More information on health code sets
      • VNR code mapping to RxNorm
      • Register code translation files
    • Endpoints
      • FinnGen clinical endpoints
      • History of creating the FinnGen endpoints
      • Location of FinnGen Endpoint and Control Description Files
        • What's new in DF13 endpoints
        • What’s new in DF12 endpoints
        • What’s new in DF11 endpoints
        • What’s new in the DF10 endpoints
        • What’s new in DF9 endpoints
        • What’s new in DF8 endpoints
      • Interpretation of Endpoint Definition file
      • Location of Endpoint Quality Control Report
      • Creating a User-defined Endpoint(s)
      • Requesting a User-defined Endpoint to be included in Core Analysis
      • Complete follow-up time of the FinnGen registries – primary endpoint data
        • Survival analysis using the truncated endpoint file – secondary endpoint data
    • Biobanks in Finland
    • Publishing FinnGen results
      • Preparing manuscripts or conference abstracts
      • The 1-year “Exclusivity Period” Policy
      • List of Publications using FinnGen Data
      • How to share GWAS summary statistics with FinnGen community
      • How to publish GWAS summary statistics
      • Public Result Releases
    • Red Library Data (individual level data)
      • Genotype data
        • Genotype Arrays Used
          • Legacy cohorts and chips
        • Imputation Panel
          • Sisu v4 reference panel
          • Sisu v3 reference panel
          • Sisu v4.2 reference panel
            • Variant-wise QC metrics file
        • Genome build used in FinnGen
        • Genotype Data Processing Flow
        • Genotype Files in Sandbox
          • Imputed genotypes in VCF format
          • Imputed genotypes in BGEN format
          • Imputed genotypes in PLINK format
          • Chip data
          • Imputed HLA alleles
          • Principal components analysis (PCA) data
          • Kinship data
          • Analysis covariates
          • Polygenic risk scores (PRS)
          • Genetic Ancestry
          • Genetic relationships (GRM)
          • Mosaic chromosomal alterations (mCA)
          • Prune data (R9)
          • Imputed STR genotypes (R8)
      • Phenotype data
        • Register data
        • Detailed longitudinal data
          • Splitting combination codes in detailed longitudinal data
        • Service sector data
          • Service sector data code translations
        • Endpoint and endpoint longitudinal data
        • Kanta lab values
          • Data
          • FAQ
          • How-to guides
        • Kanta prescriptions
        • Minimum extended phenotype data
          • Extracting minimum phenotype data per biobank
          • DNA isolation protocols per biobank
        • Minimum longitudinal data
        • Minimum phenotype data (before R11)
        • Cohort data (before R11)
        • Other register data files in Sandbox
          • Register of Congenital Malformations
          • Finnish Registry for Kidney Diseases
          • Reproductive history data
          • Finnish Cancer Registry: Cervical cancer screening
          • Finnish Cancer Registry: Breast cancer screening
          • Finnish Cancer Registry: Detailed cancer data
          • Finnish Register of Visual Impairment
          • Parental cause of death data
          • Ejection fraction data
          • Finnish National Infectious Disease Register
          • Finnish National Vaccination Register
          • Covid-19 primary care data
          • Blood donor data from the Finnish Red Cross Blood Service (FRCBS)
          • Dental data
          • Socioeconomic data
          • Hilmo and avohilmo extended data
      • Omics data
        • Proteomics
          • Expansion Area 5 proteomics data
          • FinnGen 3 proteomics data
        • Metabolomics
        • Single-cell transcriptomics and immune profiling
        • High-content cell imaging
        • Full blood counts and clinical chemistry
      • Hospital administered medications
      • Whole exome sequencing (WES) data
    • Green Library Data (aggregate data)
      • What is "Green" Data?
      • Accessing Green Data
      • Other analyses available
        • Colocalizations in FinnGen
        • Autoreporting – information on overlaps
          • Index of Autoreporting variables
        • HLA
        • LoF burden test
        • Meta-analyses
      • Core analysis results files
        • Recessive GWAS results format
        • Variant annotation file format
        • Genotype cluster plots format
        • GWAS results format
        • Finemapping results format
        • Colocalization results format
          • Results format in colocalization before DF13
        • Autoreporting results format
        • Sex-specific GWAS results format
        • UKBB-FinnGen meta-analysis file formats
        • Pairwise endpoint genetic correlation format
        • Heritabilities
        • Coding variant associations format
        • HLA association results
        • Proteomics results
        • Coding variant results including CHIP EWAS (Exome-Wide Association Scan)
        • Kanta lab association results v1
    • Disease specific Task Force data
      • Inflammatory bowel disease (IBD) SNOMED codes data
    • Expansion Area 3 (EA3) studies
      • EA3 study: Fatty liver disease study and data in Sandbox
      • EA3 study: Age-related macular degeneration study and data in Sandbox
      • EA3 study: Women's health studies
        • EA3 study: Women’s health – Endometriosis and data in Sandbox
        • EA3 study: Human papilloma virus-related gynecological lesions, and data in Sandbox
        • EA3 study: Women’s health – PCOS and infertility study, and data in Sandbox
      • EA3 study: Diabetic Kidney Disease and Rare Kidney Disease study and data in Sandbox
      • EA3 study: Oncology studies
        • EA3 study: Oncology – Breast cancer study and data in Sandbox
        • EA3 study: Oncology –Prostate cancer study and data in Sandbox
        • EA3 study: Oncology – Ovarian cancer study and data in Sandbox
      • EA3 study: Pulmonary diseases (IPF, asthma and COPD) study and data in Sandbox
      • EA3 study: Immune-mediated diseases
      • EA3 study: Heart Failure study and data in Sandbox
      • FinnGen EA3 leads
  • Disease Specific Task Forces
    • Inflammatory bowel disease (IBD)
    • Kidney Diseases
    • Eye Diseases
    • Rheumatic Diseases
    • Atopic Dermatitis
    • Pulmonary Diseases
    • Neurological Diseases
    • Heart Failure
    • Fibrotic Diseases
    • Metabolic diseases
    • Parkinson's diseases
  • Working in the Sandbox
    • How to get started with Sandbox
    • What is Sandbox and what can you do there
    • What do we mean by "red" and "green" data?
    • General workflows for the most common analyses
    • Quirks and Features
      • Managing your files in Sandbox
      • Navigating the Sandbox
      • How to save Sandbox window configuration
      • Copying and pasting in and out of your IVM
      • How to report issues from within the Sandbox
      • Sharing individual-level data within the Sandbox
      • How to download results from your IVM
        • Sandbox download requests – rules and examples for minimum N
      • Keyboard combinations
      • Running analyses in your IVM vs. Pipelines
      • Timeouts and saving your work (backups, github)
      • How to install a R package into Sandbox?
        • How to install R packages with many dependencies
      • Install R and Python packages from the local Sandbox repository
      • How to install a Python package into Sandbox
      • How to install GNU Debian package
      • How to upload your own files to IVM via /finngen/green
      • How to remove files from /finngen/green
      • Using Sandbox as a Chrome application (full screen mode)
      • How to reset your finngen.fi account password
      • Sandbox IVM tool request handling policy
      • Docker images
        • How to get a new Docker image to Sandbox
        • How to mount data into Docker container image
        • Containers available to Sandbox
        • Containers with user customized tool sets
        • How to write a Docker file
        • Anaconda Python environment in the Sandbox
      • Python Virtual Environment in Sandbox
      • How to shut down your IVM
    • Which tools are available?
      • FinnGen exome query tool
      • Custom GWAS tools
        • Custom GWAS GUI tool
        • Custom GWAS command line (CLI) tool
          • Custom GWAS CLI Binary mode
          • Custom GWAS CLI Quantitative mode
        • How to make your summary stats viewable in a PheWeb-style?
        • Finemapping of Custom GWAS analyses
        • PheWeb Users Input Validator tool
        • Conditional analysis of Custom GWAS analyses
      • Pipelines
      • Pre-installed Linux tools
      • PGS Browser
      • Lmod Linux tools
      • Anaconda Python module with ready set of scientific packages
      • Python packages
      • R packages
      • Atlas
        • Quick guide
          • Introduction to OHDSI, OMOP CDM and Atlas
          • From research question to concepts and cohort building
          • Using Atlas in Sandbox
          • Examples on cohort building with Atlas
        • Detailed guide
          • Atlas data model
          • Standard and non-standard codes
          • How to define a cohort in Atlas
            • Select FinnGen data release in Atlas for Search
            • How to define a simple ICD case-control cohort in Atlas
              • Define a simple ICD Concept Set in Atlas
              • Define a simple ICD case cohort in Atlas
              • Define a simple ICD control cohort in Atlas
            • Concept Sets
              • Create Concept Sets using descendants
              • Exclude and Remove codes from Concept Set
              • Simplify Concept Sets that use standard code descendants
              • Create Concept Sets using equivalent standard and non-standard codes
              • View standard code hierarchy in Atlas
            • Cohort Definitions
              • Using the Death register in Atlas
              • Filtering by clinical registries in Atlas
              • Filtering by demographic criteria in Atlas
              • Defining exit rules for a cohort in Atlas
              • Selecting the correct box in Atlas for events and medical codes
            • How to export FinnGen IDs from Atlas
          • Downstream analyses after the Atlas cohorts are created
          • Data Release Summary Statistics in Atlas
          • Cohort Summary Statistics in Atlas
            • Time-dependent Cohort Summary Statistics in Atlas
            • Event inclusion in Cohort Summary Statistics in Atlas
          • Cohort Pathways
      • BigQuery (relational database)
      • Atlas vs BigQuery cohorts
      • Genotype Browser
      • Cohort Operations tool (CO)
        • Upload cohorts to CO
        • Combine cohorts with CO
        • Operate on Atlas cohorts and data with entries and exit events
        • Explore code and endpoint enrichments with CO (CodeWAS)
        • Explore endpoint overlaps with CO
        • Compare custom endpoint to FinnGen endpoint with CO
        • Launch custom GWAS with CO
        • Export FinnGen IDs using CO
        • Understanding phenotypic overlaps using CO
      • Trajectory Visualization Tool (TVT)
        • Running TVT
          • Filtering timelines with TVT
          • Reordering timelines with TVT
          • Clustering timelines with TVT
          • Viewing TVT results
        • Viewing Atlas, CO, and Genotype cohorts in TVT
        • Exporting cohorts from TVT
        • TVT help page
      • LifeTrack
      • Miscellaneous helper scripts/tools
        • Tool to annotate variants with RSIDs
        • Proper translations of medical, service sector and provider codes
        • BigQuery Connection – R
          • Case study – All register data for a person
          • Case study – UpSet plot
          • Case study – Tornado plot
          • Case study – defining simple cohorts using medical codes for running case-control GWAS
        • BigQuery Connection - Python
          • BigQuery Python - Downstream analysis - Active Ingredient - Bar plot
          • BigQuery Python - Case Study - Sex different - Tornado plot
          • BigQuery Python - Case Study - Comorbidity - Upset plot
          • BigQuery Python - Case Study - Patient Timeline - Scatter plot
      • Sandbox internal API for software developers
    • Working with Phenotype Data
      • Variant PheWas
      • How to select controls for your cases
      • Using the R libraries to look at Phenotype data
      • How to check case counts from the data
      • Creating your own user-defined endpoint
    • Working with Genotype Data
      • Genotype Browser how to
      • Cluster Plots
      • ClusterPlot viewer V3C
      • Rare Variant Calling in V3C
      • Create map of allele
      • Genotypes from VCF files
      • Variant PheWas
      • Interpreting rare-variant analysis results
      • Tools for geno-pheno explorations
        • Example: transferring data from Genotype Browser to LifeTrack
        • Example: Visualizing Genotype Browser output data with TVT
    • Running analyses in Sandbox
      • How to run survival analyses
      • How to create custom endpoint using bigquery: example
      • How to use the Pipelines tool
      • How to submit a pipeline from the command line (finngen-cli)
      • How to run genome-wide association studies (GWAS)
        • How to run GWAS using REGENIE
        • Running quantitative GWAS with REGENIE
        • Conditional analysis
        • Conditional Analysis with custom regions and loci
        • How to run GWAS using SAIGE
        • Adding new covariates in GWAS using REGENIE and SAIGE
        • How to run GWAS using plink2 (for unrelated individuals only)
        • How to run GWAS using GATE (survival models)
        • How to run trajGWAS
        • How to run GWAS using the Regenie unmodifiable pipeline
        • How to run an interaction GWAS using the Regenie unmodifiable pipeline
        • How to run survival analysis using GATE unmodifiable pipeline
        • How to run GWAS on imputed HLA alleles using Regenie
      • How to run finemapping pipeline
        • Finemapping with custom regions in DF12
        • Unmodifiable Finemapping pipeline
      • How to run colocalization pipeline
      • How to run the LDSC pipeline
      • How to run PRS pipeline
      • How to calculate PRS weights for FinnGen data
      • Sandbox path and pipeline mappings
      • If your pipeline job fails
      • Tips on how to find a pipeline job ID
      • Managing memory in Sandbox and data filtering tips
      • Using Google Life Sciences API in Sandbox
      • Pipelines is based on Cromwell and WDL
    • Billing information and where to find more details
      • Monitoring Sandbox costs by Sandbox billing report
      • Monitoring Sandbox costs directly from your Google billing account
  • Working outside the Sandbox
    • Risteys
    • Endpoint Browser
    • PheWeb
      • Volcano plots with LAVAA
    • Meta-analysis PheWeb(s)
    • Coding variant browser
    • Multiple Manhattan Plot (MMP)
      • How to prepare an input file for MMP
      • How to use MMP
    • LD browser
    • Green library data
  • FAQ
    • FinnGen Spin Offs
    • FinnGen access and accounts
      • How do I apply for data access?
      • What is "red" or "green" data?
      • I already have green data access, how do I apply for red data access?
      • I cannot access the /finngen/red?
      • How do I enable two-factor authentication (2FA)?
      • I cannot access my FinnGen account?
      • How to reset account credentials
      • What to do if you suspect your account has been compromised
      • Can't access your smartphone for 2FA?
      • How do I access the FinnGen members' area?
      • How do I access FinnGen All Sharepoint?
      • How can I view existing analysis proposals?
      • How can I join the FinnGen Slack?
      • How do I join the FinnGen Teams group?
      • How to apply SES sandbox access
      • How to request a FinnGen account?
    • FinnGen data
      • What to do if I think I found a mistake in the data?
      • What are the field/column names in FinnGen?
      • What covariates are used in FinnGen's core GWAS analyses?
      • Does FinnGen have lab results available?
      • Does FinnGen have family and relatedness information available?
      • Where can I find a list of unrelated individuals in FinnGen?
      • When moving from BCOR to .txt files, what does the column called "correlation" mean?
      • Is there really no participant birth year data?
      • How do I calculate time between events?
      • Can I select only the columns needed for my analysis to import into RStudio?
      • What is the difference is between LD-clumping and the Saige conditional analysis?
      • Can I download all pairwise LD data across the genome at once?
      • How to find latest data releases?
      • Why are there differences in the GWAS results between Data Freezes/Releases?
    • Where can I find
      • COVID association results?
      • Users' Meeting materials?
      • A list of what coding variants are enriched in Finland?
      • A comprehensive list of key file locations in FinnGen?
      • Medical code translations?
    • PheWeb
      • What are QQ and Manhattan plots?
      • How can I access PheWeb?
      • Are fine-mapping results that available in PheWeb also available as flat files?
      • Do the autoreports report the 95% or 99% credible set?
    • Registries
      • What do KELA reimbursement codes map to?
      • What's the cutoff date for FinnGen data?
    • Sandbox
      • What is the FinnGen Sandbox?
      • Why does my IVM freeze while loading data into R/Rstudio
      • Where can I find tutorials and documentation on Sandbox?
      • How do I get my own analysis code into Sandbox?
      • Where to ask for software you'd like to see in Sandbox
      • Can I share individual level data between different Sandbox users?
      • Is there a sun grid engine for running long scripts?
      • How to clear browser cache after sandbox update
      • How do I increase the window resolution on my IVM?
      • How can I view pdf, jpg and HTML files?
      • My Sandbox job was killed - why?
      • How to unzip files in the command line
      • Why aren't my keyboard/shortcuts working in Sandbox like they do in my local computer?
      • How to know if my pipeline job was failed due preemption of worker VM
    • Risteys
      • Why is the case number dropping after the "Check pre-conditions, main-only, mode, ICD version" step?
    • Endpoints
      • Where do I find the most recent list of FinnGen endpoints?
      • What does it mean when an endpoint has “mode” at the end?
      • What scenario would cause an NA (missing data) entry rather than a zero?
      • Does it mean anything when a value is written as $!$ instead of NA?
      • Why is there an inconsistency between ICD10 code J84.1 (IPF) and J84.112?
      • How are control endpoints calculated?
      • Can I get a list of FinnGen IDs by control group for my endpoint?
      • What does Level C mean in the endpoints data table?
      • What does the SUBSET_COV field show?
      • Why is there a "K." prefix on some endpoints?
      • Why there are fewer endpoints going from R5 (N = 2,925) to R8 (N = 2,202)?
      • Should I include primary care registry (PRIM_OUT) codes in my cohort definitions?
      • I found BL_AGE after FU_END_AGE in the endpoint data, how is it possible?
      • Why do individuals who are not dead have death age in endpoint data?
      • I found EVENT_AGE after FU_END_AGE in endpoint data, how is it possible?
    • Pipelines
      • Are there example SAIGE pipelines?
      • How do I apply finemapping to my SAIGE results?
      • Why Pipelines is claiming that my files or folders are not in /finngen/red?
    • Citing
      • How do I cite analysis using publicly available FinnGen results?
      • How do I cite FinnGen results that use individual level data?
    • For biobanks
      • How to apply for data return
    • Data Security and Protection
      • How do I report a data breach?
  • Release Notes
    • Data Releases 2025
    • Data Releases 2024
    • Data Releases 2023
    • Data Releases 2022
    • Data Releases 2021
  • Tool Catalog
  • Glossary
  • User Support
  • Data Protection & Security
Powered by GitBook
On this page
  • ​Finemapping process
  • How to finemap your custom GWAS results?
  • Data availability
  • Available files

Was this helpful?

  1. Working in the Sandbox
  2. Which tools are available?
  3. Custom GWAS tools

Finemapping of Custom GWAS analyses

PreviousHow to make your summary stats viewable in a PheWeb-style?NextPheWeb Users Input Validator tool

Last updated 1 month ago

Was this helpful?

This page explains the following:

  1. How is finemapping performed for Custom GWAS analyses?

  2. How to finemap your custom GWAS results?

  3. How to access the data?

  4. What data is available and how it is structured?

​Finemapping process

The finemapping process consists of two steps: Region selection and actual fine-mapping of the selected regions.

Region selection algorithm

In short, region selection selects the regions that have genomewide significant variants for finemapping. Sometimes regions can be too large to finemap, in which case those regions will be marked as not possible to finemap.

In more detail, the region selection algorithm works in the following way: Taking the summary statistics as input, the region selection algorithm expands a window region around each genome-wide significant variant, with window size of 3MB and significance threshold 5e-8. Then, if any of these windows overlap, it merges them, and in an ideal case, that would be the end of region selection. However, due to practical reasons, we can not finemap arbitrarily large regions. Therefore, we have a maximum width of 6Mbp for region width, which the merged regions sometimes do exceed. In those cases, we try the following: For those too large regions, we try to re-form the regions using a 10% smaller window size than in the previous try, down until 1Mbp in width. In most cases the regions split and form smaller, manageable regions. In some cases we reach this lower threshold of 1 Mbp window size without being able to form finemappable regions, and in those cases we give up on that region and mark it as not possible to finemap in the outputs.

Fine-mapping of regions

These regions are then finemapped using both FINEMAP and SuSiE. More information about the methods can be found both in the release finemapping documentation in release data bucket green_library/finngen_R12/finngen_analysis_documentation/finngen_R12_finemap.md, as well as the finemapping pipeline repository .

What variants are included in the finemapping process?

Finemapping is performed on variants inside a region that fill the following prerequisites:

  1. They are included in the GWAS summary statistic for that endpoint

  2. Their INFO score for the data release was greater than 0.6

How to finemap your custom GWAS results?

Data availability

The finemapping results are available in two places: In the userresults pheweb browser, as well as in the green library.

Finemapped endpoints are automatically loaded to the pheweb browser. In the pheweb browser, you can find the finemap data when examining a single genome-wide significant region. Fine-mapped results are not unfortunately listed yet in the phenotype view.

You can get to individual regions to by first going to your endpoint in userresults Pheweb, and then either clicking on a GWAS peak in the manhattan plot, or on the 'locus' link in the table, like in the below image.

In the region view, the credible set data should show as both a listing of how many signals were found on both SuSiE and FINEMAP, as well as a locuszoom plot. These have been highlighted with red in the image below.

You can find the finemapping data in the green library under green_library/finngen_R12/sandbox_custom_gwas/PHENOTYPE/finemap, given release 12 and phenotype PHENOTYPE. Note that if this phenotype has not been finemapped, the finemap subfolder does not exist.

Available files

All of the finemapping results are in a bucket /green_library/finngen_R12/sandbox_custom_gwas/PHENOTYPE/finemap. Some of the files are on this top-level directory, while some are in nested directories. The folder contains region selection outputs, FINEMAP and SuSiE outputs.

Here is a table describing each of those files or directories:

Filename
Description

had_results

This file tells if there were any regions to finemap in your endpoint. It will contain the text "True" if there were regions that were sent to finemapping, and "False" if there were no regions to finemap. Having regions to finemap in this context means the endpoint had genome-wide significant (GWS) variants.

PHENOTYPE.region_status

This tab-separated file (TSV) shows a brief summary of the regions identified in region selection.

too_many_regions

This file contains the word "True" if your endpoint contained too many regions to finemap (currently the limit is set to 300 regions).

finemap/

This folder contains the finemapped results of FINEMAP

susie/

This folder contains the finemapped results of SuSiE

Next, the contents of the region status file, as well as finemap and susie folders are described.

Region status file

The region status file is a tab-separated file that tells which regions were sent to finemapping and if there were any problems that prevented finemapping. It has the following columns:

Column name
Description

region

The span of the region, specified in chromosomal coordinates chromosome.start-end

status

Status of the region, either "OK" if the region was passed on to finemapping, or "Failure" if the region was not successfully formed.

windowsize

The window size when determining a region. Region selection works by extending a window (in basepairs) around each genome-wide significant variable. If windows overlap each other, those windows get merged. These possibly merged windows are the resulting regions that are finemapped. In case a region is larger than the maximum allowed region size (currently 6 megabases), that region is retried with a smaller window. The final window size that is tried is the one showed here.

failure

Empty if the region was successful. In case the region was not successful, the reason will read here. Most likely the region was too long, and it could not be formed even when lowering the window size to its minimum value.

For example, it might be that there is a genome-wide significant region that is 10 Mbp or even 20 Mbp long. In those cases, it is likely that the region selection algorithm will not be able to narrow down the region into one that can be finemapped.

finemap folder

The finemap folder contains the following files and folders:finemap folder

Filename
Description

PHENOTYPE.FINEMAP.config.bgz

A bgzipped, tab-separated file containing the posterior summaries for each causal configuration, one per line

PHENOTYPE.FINEMAP.region.bgz

A bgzipped, tab-separated file containing each region and the probabilities of the predicted causal variant configurations

PHENOTYPE.FINEMAP.snp.bgz

A bgzipped, tab-separated file containing the credible set status for each of the snps in the finemapped regions.

PHENOTYPE.FINEMAP.snp.bgz.tbi

A tabix index file for the snp file

cred_regions/

A folder containing the individual credible set predictions, with one file per model with amount of k causal SNPs. For example, a file ending with .cred3 has the predictions for the scenario that there are 3 independent causal variants in the region, and therefore 3 credible sets in the region

The files are described in more detail below.

PHENOTYPE.FINEMAP.config.bgz

This file contains posterior summaries for all of the causal configuration, one per line. The columns are described in the following table:

Column name
Description

rank

ranking of this configuration

config

the SNP identifiers

prob

posterior probability of the configuration being the causal configuration

log10bf

log10 Bayes factor of the configuration. The Bayes factor quantifies the evidence for the causal configuration over the null ocnfiguration (no causal variants)

odds

Odds of the causal configuration

k

number of SNPS in the causal configuration

prob_norm_k

posterior probability of this configuration being the causal configuration, normalized over the set of configurations with the same number of causal variants

h2

heritability contribution of SNPs

h2_0.95CI

95% credible interval of heritability contribution of SNPs

mean

mean of joint posterior effect size

sd

standard deviation of joint posterior effect size

More information can be found in http://www.christianbenner.com/

PHENOTYPE.FINEMAP.region.bgz

This bgzipped, tab-separated value file contains all of the finemapped regions for the endpoint, one region per line.

Column name
Description

trait

phenotype in question

region

finemapped region

h2g

Model-averaged heritability

h2g_sd

Model-averaged heritability, standard deviation

h2g_lower95

lower bound of the heritability 95% credible interval

h2g_upper95

upper bound of the heritability 95% credible interval

log10bf

log10 Bayes factor for the region

prob_1..LSNP

Posterior probability for number of causal SNPS (= number of credible sets) from 1 to L, where L is the maximum amount of causal SNPs considered

expectedvalue

Expected number of causal SNPs in the genomic region

More information can be found in http://www.christianbenner.com/

PHENOTYPE.FINEMAP.snp.bgz

This tabixed, bgzipped file contains finemapping information for each of the snps that were finemapped. The file is a tab-separated value (TSV) file with one variant per line. The columns of the file are described in the table below:

Column name
Description

trait

phenotype in question

region

finemapped region

v

variant identifier, in form chromosome:position:ref:alt

index

index

rsid

variant identifier in the for 'chr'chromosome_position_ref_alt

chromosome

chromosome of the variant, prefixed with 'chr'

position

chromosomal position of the variant

allele1

reference allele of the variant

allele2

alternate allele of the variant

maf

minor allele frequency of the variant

beta

effect size of the variant in the GWAS summary statistic

se

standard error for the variant in GWAS summary statistic

z

z-score for the variant

prob

Posterior Inclusion Probability for this variant, i.e. the probability that this variant is causal

log10bf

log10 of Bayes factor. The Bayes factor quantifies the evidence that the variant is causal.

mean

This column contains the marginalized shrinkage estimates of the posterior effect size mean for the alternate allele. The marginalized shrinkage estimate for a SNP is computed by averaging the posterior effect size means of this SNP from all causal configurations in the PHENOTYPE.FINEMAP.config.bgz file, assuming that the effect size of the SNP is zero if the SNP is not in the causal configuration.

sd

This column contains the marginalized shrinkage estimates of the posterior effect size standard deviation. The estimates are computed in the same way as the marginalized shrinkage estimates of the posterior effect size mean.

mean_incl

This column contains the conditional estimates of the posterior effect size mean for the alternate allele. The conditional estimate for a SNP is computed by averaging the posterior effect size means of this SNP from causal configurations in PHENOTYPE.FINEMAP.config.bgz file in which it is included.

sd_incl

This column contains the conditional estimates of the posterior effect size standard deviation. The estimates are computed in the same way as the conditional estimates of the posterior effect size mean

p

p-value of the association in the summary statistic

csx

credible set index for given number of causal variants x

PHENOTYPE.FINEMAP.snp.bgz.tbi

The tabix index file for PHENOTYPE.FINEMAP.snp.bgz. It is not directly used, but needs to be in same folder as the snp file in case tabix is used to search the file.

susie folder

The susie folder contains the following files:

Filename
Description

PHENOTYPE.SUSIE.cred.bgz

A bgzipped TSV, containing SUSIE per-credible set output.

PHENOTYPE.SUSIE.cred.summary.tsv

A summary of the SUSIE credible set output, in TSV form.

PHENOTYPE.SUSIE.cred_99.bgz

A bgzipped TSV, containing SUSIE per-credible set output for 99% credible sets.

PHENOTYPE.SUSIE.snp.bgz

SUSIE output for every variant in the regions inspected

PHENOTYPE.SUSIE.snp.bgz.tbi

A tabix index file for PHENOTYPE.SUSIE.snp.gbz

PHENOTYPE.SUSIE.snp.filter.tsv

Filtered SUSIE SNP output for 95% credible sets

PHENOTYPE.SUSIE_99.cred.summary.tsv

A summary of the SUSIE 99% credible set output, in TSV form.

PHENOTYPE.SUSIE_99.snp.filter.tsv

Filtered SUSIE SNP output for 99% credible sets, in TSV form.

PHENOTYPE.SUSIE_EXTEND.cred.summary.tsv

A summary of the SUSIE 95% credible sets extended with 99% variants, in TSV form.

PHENOTYPE.SUSIE_EXTEND.snp.filter.tsv

Filtered SUSIE SNP output for 95% credible sets, extended with 99% CS variants, in TSV form.

The files are described in more detail below:

PHENOTYPE.SUSIE.cred.bgz

This file contains all of the credible sets for this phenotype. The credible sets are the 95% credible sets, i.e. under the model they have a 95% probability of containing the causal variant. The file is a bgzipped tab-separated values file, with one credible set per line. The columns are described in the following table:

Column name
Description

trait

phenotype in question

region

region which was finemapped, formatted as chrCHROMOSOME:START-END

cs

credible set index. The credible set index can be used to match credible sets with their variants.

cs_log10bf

Log10 of the credible set's Bayes factor. This quantifies the evidence for the model. For example, a value of 3 means that the model with this specific credible set had a 10^3 = 1000 larger likelihood than the null model without that credible set.

cs_avg_r2

Average r2 between credible set's variants

cs_min_r2

Minmum r2 between credible set's variants

low_purity

This will be 'TRUE' if minimum r2 between all variants in credible set was less than a given threshold, currently 0.25, and 'FALSE' otherwise.

cs_size

Size of the credible set, in amount of variants included.

PHENOTYPE.SUSIE.cred_99.bgz

This file contains the 99% credible sets for this phenotype. A 99% credible set is one which under the model contains 99% probability mass that the causal variant is part of the credible set. The columns are otherwise the same as in PHENOTYPE.SUSIE.cred.bgz.

PHENOTYPE.SUSIE.cred.summary.tsv

This file contains a summary of the credible sets for this phenotype. The credible sets are the 95% credible sets, i.e. under the model they have a 95% probability of containing the causal variant. The file is a tab-separated values file, with one credible set per line. The columns are described in the following table:

Column name
Description

trait

phenotype in question

region

region which was finemapped, formatted as chrCHROMOSOME:START-END

cs

credible set index. The credible set index can be used to match credible sets with their variants.

cs_log10bf

Log10 of the credible set's Bayes factor. This quantifies the evidence for the model. For example, a value of 3 means that the model with this specific credible set had a 10^3 = 1000 larger likelihood than the null model without that credible set.

cs_avg_r2

Average r2 between credible set's variants

cs_min_r2

Minmum r2 between credible set's variants

low_purity

This will be 'True' if minimum r2 between all variants in credible set was less than a given threshold, currently 0.25, and 'False' otherwise.

cs_size

Size of the credible set, in amount of variants included.

good_cs

This column is currently the inverse of low_purity column, and indicates whether the credible set consists of variants that are in reasonably strong LD together.

cs_id

Unique identifier to the credible sed, consisting of the credible set region and credible set index in the following format: REGION_CS_INDEX

v

Credible set lead variant (largest PIP in the credible set). In format CHROMOSOME:POSITION:REF:ALT

rsid

Credible set lead variant in format chrCHROMOSOME_POSITION_REF_ALT

p

lead variant p-value

beta

lead variant effect size

sd

lead variant effect standard error

prob

lead variant PIP in the region

cs_specific_prob

lead variant PIP in this specific credible set. This and the prob column are almost always equal or very close to each other.

most_severe

most severe predicted effect of this variant.

gene_most_severe

Gene in which the most severe predicted effect of this variant is.

PHENOTYPE.SUSIE_99.cred.summary.tsv

This file contains a summary of the credible sets for this phenotype. The credible sets are the 99% credible sets, i.e. under the model they have a 99% probability of containing the causal variant. The columns are otherwise the same as in PHENOTYPE.SUSIE.cred.summary.tsv.

PHENOTYPE.SUSIE_EXTEND.cred.summary.tsv

This file contains a summary of the credible sets for this phenotype. The credible sets are the 95% credible sets, i.e. under the model they have a 95% probability of containing the causal variant, but they have been extended with the 99% credible set variants where possible. The columns are otherwise the same as in PHENOTYPE.SUSIE.cred.summary.tsv.

PHENOTYPE.SUSIE.snp.bgz

This file contains susie data for all of the variants in all of the regions. The file is in bgzipped, tabixed tab-separated value form. One line containts one variant. The columns are described in the below table:

Column name
Description

trait

phenotype in question

region

region which was finemapped, formatted as chrCHROMOSOME:START-END

v

variant identifier in format CHROMOSOME:POSITION:REF:ALT

rsid

variant identifier in format chrCHROMOSOME_POSITION_REF_ALT

chromosome

chromosome of variant, in format chrCHROMOSOME

position

chromosomal position of the variant

allele1

variant reference allele

allele2

variant alternate allele

maf

variant alternate allele frequency

beta

variant effect size in summary statistic

se

variant effect standard error in summary statistic

p

variant p-value in summary statistic

mean

posterior mean beta after fine-mapping

sd

posterior standard deviation after fine-mapping

prob

posterior inclusion probability (PIP) in this region

cs

credible set index, can be used to reference credible set in this region.

cs_specific_prob

posterior inclusion probability (PIP) for this variant in its credible set. Almost always almost equal to the prob column.

low_purity

Whether this credible set had r2 between variants that was lower than 0.25.

lead_r2

Pearsonr correlation to the credible set lead variant

mean_99

posterior mean beta after finemapping, for 99% credible set

sd_99

posterior standard deviation after fine-mapping, for 99% credible set

prob_99

posterior inclusion probability (PIP) in this region, for 99% credible set

cs_99

99% credible set index

cs_specific_prob_99

posterior inclusion probability (PIP) for this variant in its 99% credible set. Almost always almost equal to the prob_99 column.

low_purity_99

Whether this 99% credible set had r2 between variants that was lower than 0.25.

lead_r2_99

Pearsonr correlation to the 99% credible set lead variant

alpha1..L

posterior inclusion probability for the x-th single effect (x := 1..L where L is the number of single effects (causal variants) specified; default: L = 10)

mean1..L

posterior mean beta for the xth single effect (x := 1..L where L is the number of single effects (causal variants) specified; default: L = 10)

sd1..L

posterior standard deviation for the xth single effect (x := 1..L where L is the number of single effects (causal variants) specified; default: L = 10)

lbf_variable1..L

Log-Bayes factor for each variant and effect, conditional on all other signals

PHENOTYPE.SUSIE.snp.bgz.tbi

Tabix index file for the SUSIE snp file.

PHENOTYPE.SUSIE.snp.filter.tsv

This file contains the filtered SNPs for the 95% credible sets. Variants not included in the 95% credible sets are not included. Neither are those that were part of low_purity credible sets. Variants are listed one per line. The file is in tab-separated value form. The columns are described in the table below:

Column name
Description

trait

phenotype in question

region

region which was finemapped, formatted as chrCHROMOSOME:START-END

v

variant identifier in format CHROMOSOME:POSITION:REF:ALT

cs

credible set index, can be used to reference credible set in this region.

cs_specific_prob

posterior inclusion probability (PIP) for this variant in its credible set.

chromosome

chromosome of the variant

position

chromosomal position of the variant

allele1

reference allele of the variant

allele2

alternate allele of the variant

maf

alternate allele frequency for the variant

beta

variant effect size in summary statistic

p

variant p-value in summary statistic

se

variant standard error in summary statistic

most_severe

most severe predicted consequence for this variant

gene_most_severe

gene in which this consequence is shown

PHENOTYPE.SUSIE_99.snp.filter.tsv

This file contains the filtered SNPs for the 99% credible sets. Credible sets not included in the 99% credible sets are not included. Neither are those that were part of low_purity credible sets. Variants are listed one per line. The file is in tab-separated value form. This file contains the same columns as the PHENOTYPE.SUSIE.snp.filter.tsv file.

PHENOTYPE.SUSIE_extend.snp.filter.tsv

This tab-separated values file contains the filtered SNPs for the 95% credible sets, extended with 99% credible set variants where applicable. Credible sets not included in the 95%/99% credible sets are not included. Neither are those that were part of low_purity credible sets. Variants are listed one per line. The file is in tab-separated value form. This file contains the same columns as the PHENOTYPE.SUSIE.snp.filter.tsv file.

With the introduction of unmodifiable finemapping pipeline, the preferred way to get your custom GWAS endpoint finemapped is with the . The only change that you will need to make is when you do not have a phenotype file (for example, if you created your endpoint with case-control lists or through Atlas). You will need to find the phenotype definition file path in your custom GWAS endpoint metadata, which is located in the green library custom GWAS folder. For R12 this would be:/finngen/library-green/finngen_R12/sandbox_custom_gwas/YOUR_ENDPOINT_NAME/metadata.json

In this json file, the phenotype definition will be under key "phenotype_file". You can then fill that phenotype definition filepath to your workflow inputs, as instructed in the . After a successful execution, the results will appear in the userresults browser and in your endpoint's green library folder as before.

here
unmodifiable finemapping pipeline
unmodifiable finemapping pipeline instructions
How to access region view in the custom GWAS browser
Region view in custom GWAS browser. Location of finemapping data highlighted in red