# New to FinnGen

### What is FinnGen?

FinnGen is a public-private collaborative research project initiated in 2017. The study combines genomics, omics and health data from 520,210 individuals to understand human diseases and traits. In FinnGen all research must follow the[ FinnGen Scientific Plan](https://www.finngen.fi/en/members/document/218) and its amendments ([FinnGen 2 Scientific Plan](https://www.finngen.fi/en/members/document/217) and[ FinnGen 3 Scientific Plan](https://www.finngen.fi/en/members/document/1354)). Data use is limited to research outlined in the plan, with analyses requiring a clear scientific purpose and potential for publishable results. To read more about FinnGen, please see the [FinnGen website](https://www.finngen.fi/en) and the [FinnGen Handbook](https://finngen.gitbook.io/finngen-handbook).&#x20;

### FinnGen Data

**Genetic data:**&#x20;

All 520,210 FinnGen study subjects have undergone genome-wide genotyping. About 450,000 were genotyped using [FinnGen ThermoFisher Axiom custom array](https://www.finngen.fi/en/genetic_data), while \~70,000 "[legacy genotypes](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/genotype-data/affymetrix-chip-and-its-design/legacy-cohorts-and-chips)" originate primarily from the National Institute of Health and Welfare biobank samples genotyped before FinnGen, using various Illumina GWAS arrays.&#x20;

To enhance utility, all samples were imputed using a [Finnish whole-genome reference](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/genotype-data/imputation-panel/sisu-v4-reference-panel) (≈8,700 individuals), yielding inferred genomes with \~21 million variants per individual. All genotype data is in human genome build GRCh38/hg38.

Additionally, FinnGen includes "legacy next-generation sequencing (NGS) variant" data from [\~25,000 whole-exome sequenced (WES)](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/whole-exome-sequencing-wes-data) and \~2,000 whole-genome sequenced (WGS) study subjects, primarily from the THL biobank.

**Health register data:**&#x20;

[FinnGen’s health register data](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/registers-in-the-detailed-longitudinal-data) comprises detailed, harmonized longitudinal records from multiple Finnish registries, capturing health events, drug purchases, and hospitalizations for all participants. Majority of the health register data is available from all 520,210 individuals. [Supplementary registries](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/other-registers) provide additional information. [The Kanta Lab data](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/kanta-lab-values), obtained from Finland’s national Kanta register, includes lab test results from public and private healthcare providers.&#x20;

**Other phenotype and biological data:**

The bulk of the data in FinnGen consists of the genotypes and the health register data which is available from all FinnGen study subjects. However, during the project timeline the study is expanding to generate other data types of subset of its participants including [additional phenotype data](https://www.finngen.fi/en/node/2001) related to study subjects with particular diseases and [other biological data](https://www.finngen.fi/en/node/1997), such as [proteomics](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/omics-data/proteomics), [metabolomics](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/omics-data/metabolomics) and [single-nuclei RNA & ATAC sequencing](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/omics-data/single-cell-transcriptomics-and-immune-profiling) data.

FinnGen data is categorized into so-called “[<mark style="color:red;background-color:red;">**red**</mark>” and “<mark style="color:green;background-color:green;">**green**</mark>](https://docs.finngen.fi/faq/about-finngen-access-and-accounts/do-i-need-red-or-green-data-access)” data  that are accessible to researchers from [FinnGen partner organizations](https://www.finngen.fi/en/partners) who have requested access.

<figure><img src="https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2FLfpfTy9bjaezdKwBUGWQ%2Fimage.png?alt=media&#x26;token=3fa6da4b-a181-4c9d-a848-6ecbdf313507" alt=""><figcaption></figcaption></figure>

<mark style="color:red;background-color:red;">**Red data**</mark> is individual-level genotype or phenotype data which is located in the Sandbox cloud computing environment and which researchers can use to run their own analyses if individual-level genotypes and/or phenotypes are required as input. We call this data "<mark style="color:red;background-color:red;">**red**</mark>" to remind users that we always need to take extra care and security in working with this data. Each partner/research group that has a Sandbox must cover their own computing and storage costs. Information on costs can be found under [Billing information and where to find more details](https://docs.finngen.fi/working-in-the-sandbox/billing-information-and-where-to-find-more-details).&#x20;

To learn more about the <mark style="color:green;background-color:green;">**green**</mark> and <mark style="color:red;background-color:red;">**red**</mark> data and what you can do with them, please see the [Green data](https://docs.finngen.fi/where-to-begin.../quick-guides/green-data-users) and [Red data](https://docs.finngen.fi/where-to-begin.../quick-guides/red-data-users) user’s quick guides.

### Data access

FinnGen has important health and genetic information about people. Keeping this data safe is really important both for trust and GDPR reasons. Everyone at FinnGen has to make sure it stays private and secure!&#x20;

To get access to the <mark style="color:green;background-color:green;">**green**</mark> or <mark style="color:red;background-color:red;">**red**</mark> FinnGen data, see the [FinnGen access and accounts](https://docs.finngen.fi/faq/about-finngen-access-and-accounts). Approval to access the <mark style="color:green;background-color:green;">**green data**</mark> takes up to 7 working days and to the <mark style="color:red;background-color:red;">**red**</mark>  data from 1 to 2 months. <mark style="color:green;background-color:green;">**Green data**</mark>  is accessible by anyone with a @finngen.fi account and the data be downloaded directly to the user's local machine. For <mark style="color:red;background-color:red;">**red**</mark> data access you need access to the Sandbox in addition to having a @finngen.fi account. You are also required to take a data security exam once a year. This is to make sure the data related to the study subjects is not mishandled. Please read more in the FinnGen Handbook, section [Data Protection & Security](https://docs.finngen.fi/data-protection-and-security).&#x20;

<figure><img src="https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2FE0DPqH20n663GJYjtrzW%2FAccess%20process.png?alt=media&#x26;token=3fa5a392-c1b0-498f-9c90-7330a317296f" alt=""><figcaption><p>Green and red data access process</p></figcaption></figure>

Downloading results from Sandbox and proceeding to publication requires submitting an [analysis proposal](https://docs.finngen.fi/finngen-data-specifics/about-analysis-proposals). This ensures that on-going studies do not overlap and follow the FinnGen Scientific Plan. See also the requirements for [citing FinnGen](https://docs.finngen.fi/faq/about-public-releases) in your manuscript.&#x20;

FinnGen has Task Forces and Interest groups that concentrate on studying the progression of [selected phenotypes](https://www.finngen.fi/en/node/1977). All FinnGen Partner researchers are welcome to join the Task Forces or the Interest Groups. If you would like to join one of them please email <finngen-servicedesk@helsinki.fi>.&#x20;

### Where to get help?

Besides the extensive descriptions in the [FinnGen Handbook](https://finngen.gitbook.io/finngen-handbook), all new users are automatically joined to the **FinnGen Slack workspace**, where there are multiple channels where the FinnGen community can help each other.&#x20;

**FinnGen Science & Users' meetings** are held once a month (usually the third week of the month) on Tuesdays at 9:05 AM - 10:30 AM (EST) / 4:05 PM - 5:30 PM (HEL) via Zoom.

**The FinnGen office hours (Q\&A)** are held on Zoom the day after the monthly FinnGen Science & Users' meeting. The European session is from 1-2 PM Helsinki time, and the US session is at 1–2 PM Boston time.

Finnish academics can also contact their [**Local SupPers**](https://www.finngen.fi/en/node/1974)**, i.e. support persons:** Jaakko Tyrmi (University of Oulu; <jaakko.tyrmi@oulu.fi>), Tero Sievänen (University of Eastern Finland; <tero.sievanen@uef.fi>), Timo Pohjonen (University of Jyväskylä; <timo.pohjonen@hyvaks.fi>), Vidal Fey (University of Tampere; <vidal.fey@tuni.fi>), and Rui Jian Chu (University of Turku; <finngen-support@utu.fi> tai <ruijian.chu@utu.fi>).

If you would like to receive calendar invitations for the above-mentioned meetings or if you have any questions regarding FinnGen, please email <finngen-servicedesk@helsinki.fi>.&#x20;
