Data Releases 2026
March 24th
FinnGen DF14 reproductive history data released to the Sandbox.
This data is further described in the Handbook: https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/other-registers/reproductive-history-data
Data is available in red library:
/finngen/library-red/finngen_R14/birth_and_dvv_register_1.0/
March 23rd
FinnGen DF14 analysis covariate, GRM, PCA, and kinship data released to Sandbox.
These data and documentation are available in Sandbox red library:
/finngen/library-red/finngen_R14/analysis_covariates/
/finngen/library-red/finngen_R14/grm_1.0/
/finngen/library-red/finngen_R14/pca_1.0/
/finngen/library-red/finngen_R14/kinship_1.0/
See the related readme-files in the folders, or the FinnGen Handbook pages for more information about the released files: https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/genotype-data/types-of-genotype-files-available
March 18th
We have released FinnGen DF14 vaccination register data (version 1.0) to the Sandbox.
The data contains vaccination data of 472 090 FinnGen participants. The data is further described in the Handbook: https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/other-registers/finnish-national-vaccination-registry
Data is available in red library:
/finngen/library-red/finngen_R14/vaccination_register_1.0/
March 17th
QCd Olink proteomics data combining batches 1 and 2 (n=4,415) of 5.4k proteins.
Data & detailed documentation available: /library-red/omics/proteomics/olink_genewiz_batch1_and_2_harmonized_QCd_2026_16_03
March 13th
release to SB of Kanta lab data V3 (DF14). The data can be found here:
/finngen/library-red/finngen_R14/kanta_lab_1.0/
The new data contains ~20% more entries compared to v2 as 12 more months of labs are added to the data set (late 2024 to Autumn 2025), going from 211M rows to 257M. Generally speaking our focus for V3 has been to improve existing mappings as much as possible both by rearranging tests in more appropriate concept IDs as well as, occasionally, removing ambiguous tests (e.g. tests with no unit and values indicating a mix of units in the source data) when we felt it would be impossible to assign an OMOP id without introducing biases. A quick summary of major updates:
Updated OMOP mappings. You can navigate visually the concept space here along with a summary of how many combos of tests/units are mapped
We changed/injected unit for ~100 new tests, allowing us now to harmonize 99% of the data with source values
`OUTCOME_POS_EXTRACTED` now contains also info from free text strings with the `+` symbol in them. For these cases the text is also kept in the `TEST_OUTCOME_TEXT_EXTRACTED` column to allow to see the level of abnormality (e.g. `+` vs `3+`)
QC has been included in the data in the columns `QC_PASS` and `QC_NOTES`
For more info about the changes in the pipeline in general see in the github release page.
The handbook entry will soon be updated to V3 but the information present is still relevant in the meantime.
March 13th
We have released FinnGen DF14 service sector data to the Sandbox.
Detailed description of the service sector longitudinal data can be found in the Handbook: https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/service-sector-data
Data is available in red library:
/finngen/library-red/finngen_R14/service_sector_data_1.0/
March 12th
We have released to Sandbox data of weight, height, and BMI measurements harmonized from FinnGen registers.
The file contains anthropometric measurements (weight, height, BMI) from FinnGen registers in longitudinal format, with one row per measurement event per person. Multiple data sources have been integrated, deduplicated by prioritizing higher-quality sources, and quality controlled using robust outlier detection methods.
The data is further described in the README file in the Sandbox and in Handbook: https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/harmonized-data/height-weight-bmi-data
Data is available in red library: /finngen/library-red/finngen_R13/harmonized_data/weight_height/
March 10th
We have released FinnGen DF14 phenotype data (version 2.0) to the Sandbox.
This version includes a new extract from the Cancer Registry, which also contains non-officially reportable cancers that were not included in the first version (1.0). In addition, version 2.0 includes a new extract from the Hilmo registry for the years 1969–1986 (ICD8-era).
Detailed description of the endpoint and endpoint longitudinal data in Handbook is here: https://finngen.gitbook.io/finngen-handbook/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/endpoint-and-endpoint-longitudinal-data
Detailed description of the detailed longitudinal data in Handbook is here: https://finngen.gitbook.io/finngen-handbook/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/detailed-longitudinal-data
Data is available in red library:
/finngen/library-red/finngen_R14/phenotype_2.0/
March 6th
Dear all, We have released FinnGen R14 visual impairment register data to Sandbox.
The data and its documentation are available in red library: gs://finngen-production-library-red/finngen_R14/visual_impairment_register_1.0/
March 5th
Dear all,
We have released the Kidney disease Task Force data to the Sandbox. The released files include data on defined eGFR decline slopes. Decline has been calculated from Kanta creatinine lab measurements and decline has been defined as 25% and 40% decline from baseline. The data is further described in the README files provided in the Sandbox.
The data is available in red library: /finngen/library-red/task_force_data/Kidney/Refined_phenotypes/
March 5th
Dear all,
Biologic medications used in Task Force diseases have been harmonized from all FinnGen register sources and released to Sandbox. The file contains 36 unique drugs and 16370 unique FGIDs.The dataset combines biologic medications used for Task Force–specific diseases into a single file. Data sources include hospital records; medication purchase data from Kela and Kanta; the Finnish Quality Registry for Rheumatic Diseases; Kanta laboratory measurements (including measured drug concentrations and autoantibodies); and NOMESCO procedure codes related to hospital infusion administrations.The data are structured in longitudinal format, with one row per event per individual. Exact duplicate records have been removed. However, partial or apparent duplicates may still remain because the data originate from multiple sources and lack unique identifiers linking related events across systems. For example, a prescription recorded in hospital data may later appear as a purchase in Kanta purchase data and as a drug concentration measurement in Kanta lab measurement data.For this reason, we do not recommend using the dataset to calculate the exact number of individual treatment events. Instead, it is better suited for analyses such as estimating drug exposure or treatment periods.We have identified certain NOMESCO procedure codes that likely correspond to specific ATC drug codes in order to account for hospital-administered infusion medications. However, this linkage has not been formally validated. Users are therefore advised to assess the suitability and validity of these data for their specific research purposes.
The data is further described in the README files provided in the Sandbox. The data is available in red library:/finngen/library-red/task_force_data/harmonized_biologics/
March 3rd
Dear all, The smoking data harmonized from all FinnGen register sources with smoking information has been released to Sandbox by the Pulmonary Task Force. The data contains smoking measurements from FinnGen registers in longitudinal format, with one row per measurement event per person. It also includes an aggregated person-level ever-smoker/never-smoker variable. The data contains 519972 persons (402760 persons with at least one smoking measurement, 117212 without any smoking information) and a total of 2877207 measurement events. The data is further described in the README files provided in the Sandbox. Detailed description of the data in Handbook will be added later.
The data is available in red library:
/finngen/library-red/finngen_R13/harmonized_data/
March 3rd
We have released the Pulmonary disease Task Force data to the Sandbox. The files contains curated phenotypes for asthma severity and asthma/COPD overlap (ACO). Asthma severity has been estimated based on the drug purchases (specific drugs and doses are shown in the end of the document) for each month of the follow-up period (ERS/ATS 2014 criteria):
* gina_classification_max: You can determine each patient’s highest GINA class and the duration they have continuously remained in that class. * gina_classification_max_GINA5_GINA4_CONT_TRANS: You can identify each patient’s highest GINA class in which they have remained continuously for at least one year. * gina_classification_longitudinal_eras: Here you can find GINA eras * gina_classification_longitudinal_monthly: Here you can find ‘raw data’, GINA class for each month of the follow-up * ACO phenotype contains individuals with the overlap of asthma and chronic obstructive pulmonary disease. *
The data is further described in the README files provided in the Sandbox. Detailed description of the data in Handbook will be added later.
The data is available in red library:
/finngen/library-red/task_force_data/Pulmonary/Refined_phenotypes/
March 2nd
Dear all,
We have released the Rheumatic diseases Task Force data to the Sandbox.
This data includes curated phenotypes with adjusted disease onset from Rheumatic diseases Task Force diseases of interest: Rheumatoid arthritis (RA, seropositive and negative), Ankylosing spondylitis, Psoriatic arthritis (PsA), Systemic lupus erythematous (SLE), Systemic scleroderma (SSc), Mixed connective tissue diseases (MCTD), Sjögren’s syndrome. Data also includes severity classes defined for Rheumatoid arthritis. Severity has been defined by medication usage; mild RA = controlled with convential DMARD, moderate RA = controlled with 1-2 biologics/JAK inhibitors, severe RA = three or more biologic/JAK inhibitors needed.
Detailed description of the data in Handbook will be added later.
Data is available in red library:
/finngen/library-red/task_force_data/Rheumatic_diseases/Refined_phenotypes/
Dear all,
We have released the Parkinson's disease Task Force data to the Sandbox.
The released files include data on: -the refined endpoint, date/age at diagnosis and date/age at disease onset (symptoms); -motor progression defined by the times drug is taken/day; -subgroups selected for proteomics analyses; -stages of disease progression.
The data is further described in the README files provided in the Sandbox.
Detailed description of the data in Handbook will be added later.
Data is available in red library:
/finngen/library-red/task_force_data/PD/Refined_phenotypes/
February 26th
Dear FinnGen Community,
We are delighted to announce that FinnGen Data Freeze (DF) 14 genotype and phenotype data are released to the Sandbox.
The release data statistics are:
Number of individuals with genotypes = 519,870
Number of imputed variants = 21,299,087
Number of individuals with endpoints = 519,329
Number of endpoints = 4,867
Number of individuals in the detailed longitudinal data file (with hospital, primary care, drug reimbursement, and cancer registry data) = 519,634
Number of individuals with minimum extended data = 519,870
Different data sets and their specifications are described in the FinnGen Handbook: https://docs.finngen.fi/finngen-data-specifics.
The data and readmes are available in the Sandbox red library at /finngen/library-red/finngen_R14/
Genotypes in vcf, bgen and plink format:
/finngen/library-red/finngen_R14/genotype_1.0/
/finngen/library-red/finngen_R14/bgen_1.0/
/finngen/library-red/finngen_R14/plink_1.0/
Phenotypes:
/finngen/library-red/finngen_R14/phenotype_1.0/
NOTE. We will update all other register data, phenotype files and Sandbox tools during the upcoming 2-3 months, which we will announce via email and log at: https://docs.finngen.fi/release-notes/data-releases-2026.
Core analysis results will be released to the green library by the end of May 2026.
We wish you happy times exploring the new data!
February 23th
The coding variant association results for FinnGen Release 13 are now available in the green library:
gs://finngen-production-library-green/finngen_R13/finngen_R13_analysis_data/coding/
Documentation:
gs://finngen-production-library-green/finngen_R13/finngen_R13_analysis_data/coding/finngen_R13_coding.md
Results are also available in the pheweb browser:
https://results.finngen.fi/coding
February 18th
We have released covariate data to EA5 multiome batch1_5 folder in Sandbox.
These files are cell-type specific covariate files used in the eQTL/caQTL analyses. These include age at donation, sex, and top four genotype PCs, as well as PEER factors for eQTLs or PCs for caQTLs.
The data is available in red library:
/finngen/library-red/EA5/multiome/batch1_5/release/covariates
February 17th
We are delighted to announce the release of a new set of 60 models and scores based on data from The Polygenic Index (PGI) Repository: https://www.thessgac.org/pgi-repository.
Scores, models and, additionally, PGS-based Phenome-Wide Association Studies for incidence of 2,583 FinnGen endpoints are available at:
/finngen/library-red/finngen_R13/pgs_browser_db_2.0/.
For more details about PGS models and analysis, please see meta-information table at: /finngen/library-red/finngen_R13/pgs_browser_db_2.0/data/meta/.
For more details about model construction see Alemu et al (2025): https://doi.org/10.1101/2025.05.14.653986.
Please, note, these results are generated by Artomov lab ahead of publication, therefore we would appreciate you for following the data usage disclaimer placed in the same folder.
The PGS Browser tool is further described in Handbook: https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/pgs-browser
January 30th
We have released another version of Blood Cell Painting EA5 data to the Sandbox.
Blood Cell Painting (BCP) is a project where single cells from 390 healthy blood donors were imaged, single-cell features were extracted from the images, and genetic analyses were done for these features. Released are the features in raw, scaled to controls and batch corrected formats.
The data is further described in the readme provided and also in Handbook: https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/omics-data/high-content-cell-imaging
The data and documentation are available here:
/finngen/library-red/EA5/cell_painting/BCP_EA5_2.0/
January 22nd
Please find an additional data file released to Sandbox for FinnGen 2 Expansion Area 5 (EA5) pilot. Specifically, this is an updated version of a compiled file to make analysis easier on the already released Metabolomic data.
A ready-to-use file connecting the raw data to FINNGENIDs (66/1000 samples do not have a FINNGENID) and sample timestamps is available here:
/finngen/library-red/EA5/metabolomics/final/EA5_Metabolomics_data.tsv /finngen/library-red/EA5/metabolomics/final/README_EA5_Metabolomics_data.tsv
and the metabolites annotation to chemical names and pathways can be found in:
/finngen/library-red/EA5/metabolomics/raw_data/HELS-01-21ML+DATA_TABLES.XLSX under the sheet "Chemical Annotation".
January 20th
Released a version of Kanta lab value regenie nulls in:
/finngen/library-red/finngen_R13/kanta_lab_2.0/core_regenie_analysis_null_files/
Last updated
Was this helpful?