Red data users

What is red data?

The FinnGen data is constructed from the Finnish health and laboratory registries combined with the individual genotype information. In FinnGen, we refer to all sensitive individual-level data as "red" to emphasize the need for extra care and security when handling it. The red data is accessible within the Sandbox environment, where all the individual-level genotype, phenotype and other omics data can be found. The red data is pseudo anonymized (or pseudonymized), which means, there is no direct identifying information to the study subjects in the data.

How to access red data?

Red data is located in the Sandbox cloud computing environment and which researchers can use to run their own analyses. To get access to the red FinnGen data, see the FinnGen access and accounts. Only FinnGen partner organization affiliates can get access to the red data in the Sandox. Approval to access takes 1-2 months.

In FinnGen, red data is securely protected to prevent unauthorized access, loss, or damage. Access to the Sandbox is granted only after proper paperwork and passing a security exam. Your FinnGen account must have two-factor authentication (2FA) enabled to log in. For questions or concerns about data protection, contact the FinnGen Data Protection Officer at dpo-finngen@helsinki.fi or phone: +358 2941 24317 (mobile: +358 50 4793618).

Once granted access to FinnGen red data, an interactive virtual machine (IVM) will be created for you within your organization's Sandbox. This IVM, accessible via a web browser, is a Unix machine with a graphical interface hosted in the Google Cloud.

Using red data

To conduct research in the FinnGen Sandbox, it is essential to adhere to the FinnGen Scientific Plan and its amendments (FinnGen 2 Scientific Plan and FinnGen 3 Scientific Plan). In summary, the main goal of FinnGen is to better understand how health and disease change over time, interpret genetic signals, and develop personalized medicine and new analytical methods. The FinnGen scientific director and scientific committee oversees research using red data through the FinnGen analysis proposal. An analysis proposal is not mandatory for operating within the Sandbox, but it is required to download results. Please also familiarize yourself also with FinnGen guidelines regarding 1-year exclusivity period policy and citing guidelines.

FinnGen aims to group similar research under a single analysis proposal, granting only one analysis right for similar studies. You can check the active analysis proposals in the FinnGen appsheet, and then apply for the analysis proposal.

If you suspect a data breach, report it immediately using the online reporting form available in the members’ area or by contacting the DPO directly. Examples of data breaches include unauthorized access to red data, the red data outside FinnGen Sandbox, and sharing the red data in presentations or manuscripts. If you lose your @finngen.fi credentials or suspect they have been compromised, contact finngen-servicedesk@helsinki.fi immediately. Also, contact the service desk when you no longer need your account.

By following these guidelines, you can ensure your research is compliant with FinnGen's standards and secure access to necessary resources and support.

Red data tools

The FinnGen Sandbox is a secure, scalable environment for accessing individual-level data. It operates in a web browser or via an application, ensuring data security and compliance with privacy regulations. Each FinnGen partner has its own Sandbox, where members can use individual virtual machines (IVMs) for research.

The Sandbox remains open for 24 hours by default and supports R and Python programming languages. Analyses can be run in IVMs or using FinnGen Pipelines for large tasks. Costs vary, with GWAS runs costing 3-10 euros and storage at 0.03 € per gigabyte per month. Information on costs can be found under Billing information and where to find more details.

Sensitive individual level data must not be screenshotted or transferred outside the Sandbox. Text can be copied into the Sandbox but not out, ensuring data security. Data sharing within organizations is possible via the "red" bucket.

Files can be uploaded via Google Cloud and downloaded after verification. Only aggregate-level green data can be exported.

FinnGen data includes phenotype, laboratory, genomic and other omic data stored in specific directories in the Sandbox. Users with access to red data can find these files in the red and green libraries. Check out this short video about FinnGen Sandbox architecture: libraries, buckets and data.

Many of the red data types are also available in the FinnGen BigQuery database, integrated with the Sandbox. This serverless data warehouse supports eg. efficient SQL queries. A list of additional tools available in the Sandbox can be found from here.

Last updated

Was this helpful?