General workflows for the most common analyses
Last updated
Was this helpful?
Last updated
Was this helpful?
In this section, we describe example workflows for the most common analyses researchers are conducting with FinnGen data in FinnGen Sandbox: endpoint analysis, survival time analysis, and genotype variant analysis.
See also a (at 25min 23sec).
Create cohort based on medical codes: Make for endpoints in Atlas. Set , , , and carefully following the instructions. Pay attention that.
Create cohort based on genotype: Use . Design the genotype cohorts you use in further analyses. For example, you may like to export minor homozygotes 1|1, WT homozygotes 0|0, and heterozygotes 0|1 and 1|0. To combine e.g. two heterozygotes cohorts (0|1 and 1|0) into one cohort use . If the variant you are looking for is not in the Genotype Browser, see .
For fast inspection of cohorts build in Atlas use the . Make if needed.
Inspect your cohorts in detail using the . Pay attention to the entry and exit events of the patients. Are the patients entering and exiting the cohort as was mentioned? Are the conditions appearing in the right temporal intervals according to the inclusion and exclusion rules set in Atlas? Atlas is a powerful tool that can create very complex cohorts. Also, some settings may easily be wrongly selected by accident.
Output files from the .
Tip! If one individual appears interesting or e.g. outlying, you may use the to explore that person closely by viewing all medical codes for that person in a single view.
Explore the cohorts with the . Compare . Do similar endpoints already exist? Explore which conditions and medicines are enriched in the cases compared to the control cohorts by . For genotype data, CodeWAS can be run e.g. for rarer homozygotes compared to hetero- & WT homozygotes using . See also instructions on . Consider if the results make sense. Are the right conditions and medicines enriched in the cases group compared to the controls? Are there conditions or medicines that should be included or excluded from the cohorts? Clinicians' help may be needed to interpret CodeWAS results and help to build the cohorts.
For genotype variant analysis: Consider the results from CodeWAS. Are the cohorts of carriers and non-carriers enough for your study or should the cohorts be modified using phenotypic information? Are carriers and non-carriers differentiating by diagnoses not expected or using medicine not expected? If so, you may . You can then filter these phenotypes in or out of genotype cohorts using .
Consider the results from TVT and CO. If needed go back to Atlas and improve the cohorts based on the results from TVT and CO. Then inspect the cohorts again using TVT and CO. Repeat and until you are pleased with the cohorts. Help from a Clinician may be needed to interpret CodeWAS results and to build clinically meaningful cohorts.
Tip! If you need more complex filtering than is possible to conduct in Atlas consider . You may , and with the rules you select.
When the cohorts are ready and checked with TVT and CO you may proceed to the downstream analyses. To select a suitable software and model for your study see . The easiest way to conduct a GWAS is to use . For these and other analyses not in the Custom GWAS tools, ready pipelines are available in the Sandbox. Using pipelines needs some coding skills. Users need to prepare part of the input files and run the pipeline.
For Binary Phenotype analyses (yes/no for cases and controls): The easiest way to conduct a custom GWAS is to use the or launch Custom GWAS directly from . Pipelines to run GWAS in binary mode with or are also available.
For Quantitative Phenotype analysis (continuous variables for cases and controls): The easiest way to conduct quantitative GWAS is to use the in quantitative mode. Preparing an input ID list as a text file is easily done using by . Pipelines to conduct the same analyses of are also available.
For Survival analyses: To run survival analyses one needs to prepare input files. You can run survival analysis using or by running . See instructions for the file preparation and . Tip! The ID list needed to build a phenotype-covariate file for GATE .
For more instructions about other analyses and Pipelines .