# General workflows for the most common analyses

In this section, we describe example workflows for the most common analyses researchers are conducting with FinnGen data in FinnGen Sandbox: endpoint analysis, survival time analysis, and genotype variant analysis.

See also a [presentation of New FinnGen tools and their application to example diseases from User's meeting 28th March 2023 recording ](https://www.finngen.fi/en/members/recordings/finngen-data-users-meeting-28th-march-2023)(at 25min 23sec).

### **Step 1. Create cohorts**

**Create cohort based on medical codes and/or medications:** Make [case and control cohorts](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas.md) for endpoints in Atlas. Set [cohort entry events, inclusion criteria](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas.md), [exclusion criteria](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas/cohort-definitions/defining-exit-rules-for-a-cohort-in-atlas.md), [filtering](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas/cohort-definitions/filtering-by-demographic-criteria-in-atlas.md), and [registries](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas/cohort-definitions/filter-by-clinical-registries-in-atlas.md) carefully following the instructions. Pay attention that[ the exclusion criteria are set right](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas/cohort-definitions/defining-exit-rules-for-a-cohort-in-atlas.md). See also the [Atlas Quick Guide](/working-in-the-sandbox/which-tools-are-available/atlas/quick-guide.md) for tips and examples.

**Create cohort based on genotype:** Use [Genotype Browser to extract carriers and non-carriers](/working-in-the-sandbox/working-with-genotype-data/genotype-browser.md). Design the genotype cohorts you use in further analyses. For example, you may like to export minor homozygotes 1|1, WT homozygotes 0|0, and heterozygotes 0|1 and 1|0. To combine e.g. two heterozygotes cohorts (0|1 and 1|0) into one cohort use [Operate Cohorts feature in the Cohort Operations tool](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/combine-cohorts-with-co.md). If the variant you are looking for is not in the Genotype Browser, see [Genotypes from VCF files](/working-in-the-sandbox/working-with-genotype-data/genotypes-from-vcf-files.md).

### **Step 2. Explore your cohorts with tools designed for the purpose.**

For fast inspection of cohorts build in Atlas use the [Cohort Characterizations tool in Atlas](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/cohort-characterizations-in-atlas.md). Make [first improvements to the cohorts based on Cohort Characterizations](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/cohort-characterizations-in-atlas/improving-cohorts-using-cohort-characterizations-tool.md) if needed.

**Inspect your cohorts in detail using the** [**Trajectory Visualization tool (TVT)**](/working-in-the-sandbox/which-tools-are-available/trajectory-visualization-tool-tvt.md)**.** Pay attention to the entry and exit events of the patients. Are the patients entering and exiting the cohort as was mentioned? Are the conditions appearing in the right temporal intervals according to the inclusion and exclusion rules set in Atlas? Atlas is a powerful tool that can create very complex cohorts. Also, some settings may easily be wrongly selected by accident.

Output files from the [Genotype Browser can be uploaded and visualized using TVT](/working-in-the-sandbox/which-tools-are-available/trajectory-visualization-tool-tvt/trajectory-visualization-tool-tvt.md).

**Tip!** If one individual appears interesting or e.g. outlying, you may use the [LifeTrack tool ](/working-in-the-sandbox/which-tools-are-available/lifetrack.md)to explore that person closely by viewing the whole medical history for that person in a single view.

**Explore the cohorts with the** [**Cohort Operations tool (CO)**](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co.md)**.** Compare [the cohorts to the FinnGen endpoints](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/compare-custom-endpoint-to-finngen-endpoint-with-co.md). Do similar endpoints already exist? Explore which conditions and medicines are enriched in the cases compared to the control cohorts by [running CodeWAS analyses using the Cohort Operations tool](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/explore-code-and-endpoint-enrichments-with-co.md). For genotype data, CodeWAS can be run e.g. for rarer homozygotes compared to hetero- & WT homozygotes using [the same instructions](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/explore-code-and-endpoint-enrichments-with-co.md). See also instructions on [how to conduct PheWAS for rare variants in R](/working-in-the-sandbox/working-with-genotype-data/variant-phewas.md). Consider if the results make sense. Are the right conditions and medicines enriched in the cases group compared to the controls? Are there conditions or medicines that should be included or excluded from the cohorts? Clinicians' help may be needed to interpret CodeWAS results and help to build the cohorts.

**For genotype variant analysis:** Consider the results from CodeWAS. Are the cohorts of carriers and non-carriers enough for your study or should the cohorts be modified using phenotypic information? Are carriers and non-carriers differentiating by diagnoses not expected or using medicine not expected? If so, you may [build phenotypic cohorts for diseases and medicines arising from CO results with Atlas](#step-1.-create-cohorts). You can then filter these phenotypes in or out of genotype cohorts using [Operate Cohorts feature in the Cohort Operations tool](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/combine-cohorts-with-co.md).

### **Step 3. Improve your cohorts until they are detailed.**

Consider the results from TVT and CO. If needed go back to Atlas and improve the cohorts based on the results from TVT and CO. Then inspect the cohorts again using TVT and CO. Repeat [step 1](#step-1.-create-cohorts) and [step 2](#step-2.-explore-your-cohorts-with-tools-designed-for-the-purpose.) until you are pleased with the cohorts. Help from a Clinician may be needed to interpret CodeWAS results and to build clinically meaningful cohorts.

**Tip!** If you need more complex filtering than is possible to conduct in Atlas consider [joining two or more cohorts with the CO](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/combine-cohorts-with-co.md). You may [create cohorts in Atlas](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas.md), [import them to CO](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/upload-cohorts-to-co.md) and [combine cohorts in CO](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/combine-cohorts-with-co.md) with the rules you select.

### **Step 4. Proceed to the downstream analysis**

When the cohorts are ready and checked with TVT and CO you may proceed to the downstream analyses. To select a suitable software and model for your study see [How to run genome-wide association studies (GWAS)](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas.md). The easiest way to conduct a GWAS is to use [the Custom GWAS tool](/working-in-the-sandbox/which-tools-are-available/untitled.md) launched directly from [Cohort Operations](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co.md). You can monitor your jobs in the [Pipelines](/working-in-the-sandbox/which-tools-are-available/pipelines.md) tool. In the [Pipelines](/working-in-the-sandbox/which-tools-are-available/pipelines.md) tool the user can also submit several analyses, using either readily made unmodifiable pipelines or more flexible modifiable pipelines. For using pipelines, users need to prepare part of the input files in correct format and run the pipeline.

**For Binary Phenotype analyses (yes/no for cases and controls):** The easiest way to conduct a custom GWAS is to launch Custom GWAS directly from [The Cohort Operations tool](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/launch-custom-gwas-with-co.md). Pipelines to run GWAS in binary mode with [REGENIE](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-regenie.md) or [SAIGE](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-saige.md) are also available.

**For Quantitative Phenotype analysis (continuous variables for cases and controls):** The easiest way to conduct quantitative GWAS is to use the [Custom GWAS CLI tool](/working-in-the-sandbox/which-tools-are-available/untitled/custom-gwas-command-line-cli-tool/custom-gwas-cli-quantitative-mode.md) in quantitative mode. Preparing an input ID list as a text file is easily done by using the [exporting cohorts function in the Cohort Operations tool](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/export-finngen-ids-using-co.md). Pipelines to conduct the same analyses of [GWAS in quantitative mode with REGENIE](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/running-quantitative-gwas-with-regenie.md) are also available.

**For Survival analyses:** To run survival analyses one needs to prepare input files. You can run survival analysis using [cox model](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-survival-analyses.md) or by running [GWAS using survival models (GATE)](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-gate-survival-models.md). See instructions for the file preparation and [running survival models with GATE](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-gate-survival-models.md). **Tip!** The ID list needed to build a phenotype-covariate file for GATE [can be exported as a text file using the Cohort Operations tool](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co/export-finngen-ids-using-co.md). Pipelines to conduct GATE analysis are also available.

For more instructions about other analyses and Pipelines, see [Running analyses in Sandbox](/working-in-the-sandbox/running-analyses-in-sandbox.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/working-in-the-sandbox/general-workflows-for-the-most-common-analyses.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
