LogoLogo
FinnGen Handbook
  • Introduction
  • Where to begin
    • Quick guides
      • New to FinnGen
      • Green data users
      • Red data users
    • I'm new to FinnGen, where is the best place for me to start?
    • What kind of questions can I ask of FinnGen data?
    • How do I make a custom endpoint?
    • How do I run a GWAS of a phenotype I created myself?
    • I'm interested in FinnGen rare variant phenotypes
  • Background Concepts
    • Basics of Genetics
    • Linkage Disequilibrium (LD)
    • Genotype Imputation
    • Genotype Data Processing and Quality Control (QC)
    • GWAS Analysis
    • P Values
    • Heritability and genetic correlations
    • Finemapping
    • Conditional analysis
    • Colocalization
    • Using Polygenic Risk Scores
    • PheWAS analysis
    • Survival analysis
    • Longitudinal Data Analysis
    • GWAS Association to Biological Function
    • Genetic Data Resources outside FinnGen
    • Getting Started with Unix
    • Getting Started with R
    • Structure of the FinnGen project
    • Finnish gene pool and health register data
  • FinnGen Data Specifics
    • FinnGen Data Freezes and Releases
    • Analysis proposals
      • What is a FinnGen analysis proposal and when do I need to submit one?
      • How do I submit an analysis proposal?
      • How are analysis proposals handled?
      • What is a FinnGen bespoke analysis proposal and when do I need to submit one?
      • How do I submit a bespoke analysis proposal?
      • How are bespoke analysis proposals handled?
      • What is the difference between FinnGen analysis proposals and FinnGen bespoke analyses?
      • Existing analysis proposals
    • Finnish Health Registries and Medical Coding
      • Finnish health registries
      • Register data pre-processing
      • Data Masking/Blurring of Visit Dates
      • International and Finnish Health Code Sets
      • More information on health code sets
      • VNR code mapping to RxNorm
      • Register code translation files
    • Endpoints
      • FinnGen clinical endpoints
      • History of creating the FinnGen endpoints
      • Location of FinnGen Endpoint and Control Description Files
        • What's new in DF13 endpoints
        • What’s new in DF12 endpoints
        • What’s new in DF11 endpoints
        • What’s new in the DF10 endpoints
        • What’s new in DF9 endpoints
        • What’s new in DF8 endpoints
      • Interpretation of Endpoint Definition file
      • Location of Endpoint Quality Control Report
      • Creating a User-defined Endpoint(s)
      • Requesting a User-defined Endpoint to be included in Core Analysis
      • Complete follow-up time of the FinnGen registries – primary endpoint data
        • Survival analysis using the truncated endpoint file – secondary endpoint data
    • Biobanks in Finland
    • Publishing FinnGen results
      • Preparing manuscripts or conference abstracts
      • The 1-year “Exclusivity Period” Policy
      • List of Publications using FinnGen Data
      • How to share GWAS summary statistics with FinnGen community
      • How to publish GWAS summary statistics
      • Public Result Releases
    • Red Library Data (individual level data)
      • Genotype data
        • Genotype Arrays Used
          • Legacy cohorts and chips
        • Imputation Panel
          • Sisu v4 reference panel
          • Sisu v3 reference panel
          • Sisu v4.2 reference panel
            • Variant-wise QC metrics file
        • Genome build used in FinnGen
        • Genotype Data Processing Flow
        • Genotype Files in Sandbox
          • Imputed genotypes in VCF format
          • Imputed genotypes in BGEN format
          • Imputed genotypes in PLINK format
          • Chip data
          • Imputed HLA alleles
          • Principal components analysis (PCA) data
          • Kinship data
          • Analysis covariates
          • Polygenic risk scores (PRS)
          • Genetic Ancestry
          • Genetic relationships (GRM)
          • Mosaic chromosomal alterations (mCA)
          • Prune data (R9)
          • Imputed STR genotypes (R8)
      • Phenotype data
        • Register data
        • Detailed longitudinal data
          • Splitting combination codes in detailed longitudinal data
        • Service sector data
          • Service sector data code translations
        • Endpoint and endpoint longitudinal data
        • Kanta lab values
          • Data
          • FAQ
          • How-to guides
        • Kanta prescriptions
        • Minimum extended phenotype data
          • Extracting minimum phenotype data per biobank
          • DNA isolation protocols per biobank
        • Minimum longitudinal data
        • Minimum phenotype data (before R11)
        • Cohort data (before R11)
        • Other register data files in Sandbox
          • Register of Congenital Malformations
          • Finnish Registry for Kidney Diseases
          • Reproductive history data
          • Finnish Cancer Registry: Cervical cancer screening
          • Finnish Cancer Registry: Breast cancer screening
          • Finnish Cancer Registry: Detailed cancer data
          • Finnish Register of Visual Impairment
          • Parental cause of death data
          • Ejection fraction data
          • Finnish National Infectious Disease Register
          • Finnish National Vaccination Register
          • Covid-19 primary care data
          • Blood donor data from the Finnish Red Cross Blood Service (FRCBS)
          • Dental data
          • Socioeconomic data
          • Hilmo and avohilmo extended data
      • Omics data
        • Proteomics
          • Expansion Area 5 proteomics data
          • FinnGen 3 proteomics data
        • Metabolomics
        • Single-cell transcriptomics and immune profiling
        • High-content cell imaging
        • Full blood counts and clinical chemistry
      • Hospital administered medications
      • Whole exome sequencing (WES) data
    • Green Library Data (aggregate data)
      • What is "Green" Data?
      • Accessing Green Data
      • Other analyses available
        • Colocalizations in FinnGen
        • Autoreporting – information on overlaps
          • Index of Autoreporting variables
        • HLA
        • LoF burden test
        • Meta-analyses
      • Core analysis results files
        • Recessive GWAS results format
        • Variant annotation file format
        • Genotype cluster plots format
        • GWAS results format
        • Finemapping results format
        • Colocalization results format
          • Results format in colocalization before DF13
        • Autoreporting results format
        • Sex-specific GWAS results format
        • UKBB-FinnGen meta-analysis file formats
        • Pairwise endpoint genetic correlation format
        • Heritabilities
        • Coding variant associations format
        • HLA association results
        • Proteomics results
        • Coding variant results including CHIP EWAS (Exome-Wide Association Scan)
        • Kanta lab association results v1
    • Disease specific Task Force data
      • Inflammatory bowel disease (IBD) SNOMED codes data
    • Expansion Area 3 (EA3) studies
      • EA3 study: Fatty liver disease study and data in Sandbox
      • EA3 study: Age-related macular degeneration study and data in Sandbox
      • EA3 study: Women's health studies
        • EA3 study: Women’s health – Endometriosis and data in Sandbox
        • EA3 study: Human papilloma virus-related gynecological lesions, and data in Sandbox
        • EA3 study: Women’s health – PCOS and infertility study, and data in Sandbox
      • EA3 study: Diabetic Kidney Disease and Rare Kidney Disease study and data in Sandbox
      • EA3 study: Oncology studies
        • EA3 study: Oncology – Breast cancer study and data in Sandbox
        • EA3 study: Oncology –Prostate cancer study and data in Sandbox
        • EA3 study: Oncology – Ovarian cancer study and data in Sandbox
      • EA3 study: Pulmonary diseases (IPF, asthma and COPD) study and data in Sandbox
      • EA3 study: Immune-mediated diseases
      • EA3 study: Heart Failure study and data in Sandbox
      • FinnGen EA3 leads
  • Disease Specific Task Forces
    • Inflammatory bowel disease (IBD)
    • Kidney Diseases
    • Eye Diseases
    • Rheumatic Diseases
    • Atopic Dermatitis
    • Pulmonary Diseases
    • Neurological Diseases
    • Heart Failure
    • Fibrotic Diseases
    • Metabolic diseases
    • Parkinson's diseases
  • Working in the Sandbox
    • How to get started with Sandbox
    • What is Sandbox and what can you do there
    • What do we mean by "red" and "green" data?
    • General workflows for the most common analyses
    • Quirks and Features
      • Managing your files in Sandbox
      • Navigating the Sandbox
      • How to save Sandbox window configuration
      • Copying and pasting in and out of your IVM
      • How to report issues from within the Sandbox
      • Sharing individual-level data within the Sandbox
      • How to download results from your IVM
        • Sandbox download requests – rules and examples for minimum N
      • Keyboard combinations
      • Running analyses in your IVM vs. Pipelines
      • Timeouts and saving your work (backups, github)
      • How to install a R package into Sandbox?
        • How to install R packages with many dependencies
      • Install R and Python packages from the local Sandbox repository
      • How to install a Python package into Sandbox
      • How to install GNU Debian package
      • How to upload your own files to IVM via /finngen/green
      • How to remove files from /finngen/green
      • Using Sandbox as a Chrome application (full screen mode)
      • How to reset your finngen.fi account password
      • Sandbox IVM tool request handling policy
      • Docker images
        • How to get a new Docker image to Sandbox
        • How to mount data into Docker container image
        • Containers available to Sandbox
        • Containers with user customized tool sets
        • How to write a Docker file
        • Anaconda Python environment in the Sandbox
      • Python Virtual Environment in Sandbox
      • How to shut down your IVM
    • Which tools are available?
      • FinnGen exome query tool
      • Custom GWAS tools
        • Custom GWAS GUI tool
        • Custom GWAS command line (CLI) tool
          • Custom GWAS CLI Binary mode
          • Custom GWAS CLI Quantitative mode
        • How to make your summary stats viewable in a PheWeb-style?
        • Finemapping of Custom GWAS analyses
        • PheWeb Users Input Validator tool
        • Conditional analysis of Custom GWAS analyses
      • Pipelines
      • Pre-installed Linux tools
      • PGS Browser
      • Lmod Linux tools
      • Anaconda Python module with ready set of scientific packages
      • Python packages
      • R packages
      • Atlas
        • Quick guide
          • Introduction to OHDSI, OMOP CDM and Atlas
          • From research question to concepts and cohort building
          • Using Atlas in Sandbox
          • Examples on cohort building with Atlas
        • Detailed guide
          • Atlas data model
          • Standard and non-standard codes
          • How to define a cohort in Atlas
            • Select FinnGen data release in Atlas for Search
            • How to define a simple ICD case-control cohort in Atlas
              • Define a simple ICD Concept Set in Atlas
              • Define a simple ICD case cohort in Atlas
              • Define a simple ICD control cohort in Atlas
            • Concept Sets
              • Create Concept Sets using descendants
              • Exclude and Remove codes from Concept Set
              • Simplify Concept Sets that use standard code descendants
              • Create Concept Sets using equivalent standard and non-standard codes
              • View standard code hierarchy in Atlas
            • Cohort Definitions
              • Using the Death register in Atlas
              • Filtering by clinical registries in Atlas
              • Filtering by demographic criteria in Atlas
              • Defining exit rules for a cohort in Atlas
              • Selecting the correct box in Atlas for events and medical codes
            • How to export FinnGen IDs from Atlas
          • Downstream analyses after the Atlas cohorts are created
          • Data Release Summary Statistics in Atlas
          • Cohort Summary Statistics in Atlas
            • Time-dependent Cohort Summary Statistics in Atlas
            • Event inclusion in Cohort Summary Statistics in Atlas
          • Cohort Pathways
      • BigQuery (relational database)
      • Atlas vs BigQuery cohorts
      • Genotype Browser
      • Cohort Operations tool (CO)
        • Upload cohorts to CO
        • Combine cohorts with CO
        • Operate on Atlas cohorts and data with entries and exit events
        • Explore code and endpoint enrichments with CO (CodeWAS)
        • Explore endpoint overlaps with CO
        • Compare custom endpoint to FinnGen endpoint with CO
        • Launch custom GWAS with CO
        • Export FinnGen IDs using CO
        • Understanding phenotypic overlaps using CO
      • Trajectory Visualization Tool (TVT)
        • Running TVT
          • Filtering timelines with TVT
          • Reordering timelines with TVT
          • Clustering timelines with TVT
          • Viewing TVT results
        • Viewing Atlas, CO, and Genotype cohorts in TVT
        • Exporting cohorts from TVT
        • TVT help page
      • LifeTrack
      • Miscellaneous helper scripts/tools
        • Tool to annotate variants with RSIDs
        • Proper translations of medical, service sector and provider codes
        • BigQuery Connection – R
          • Case study – All register data for a person
          • Case study – UpSet plot
          • Case study – Tornado plot
          • Case study – defining simple cohorts using medical codes for running case-control GWAS
        • BigQuery Connection - Python
          • BigQuery Python - Downstream analysis - Active Ingredient - Bar plot
          • BigQuery Python - Case Study - Sex different - Tornado plot
          • BigQuery Python - Case Study - Comorbidity - Upset plot
          • BigQuery Python - Case Study - Patient Timeline - Scatter plot
      • Sandbox internal API for software developers
    • Working with Phenotype Data
      • Variant PheWas
      • How to select controls for your cases
      • Using the R libraries to look at Phenotype data
      • How to check case counts from the data
      • Creating your own user-defined endpoint
    • Working with Genotype Data
      • Genotype Browser how to
      • Cluster Plots
      • ClusterPlot viewer V3C
      • Rare Variant Calling in V3C
      • Create map of allele
      • Genotypes from VCF files
      • Variant PheWas
      • Interpreting rare-variant analysis results
      • Tools for geno-pheno explorations
        • Example: transferring data from Genotype Browser to LifeTrack
        • Example: Visualizing Genotype Browser output data with TVT
    • Running analyses in Sandbox
      • How to run survival analyses
      • How to create custom endpoint using bigquery: example
      • How to use the Pipelines tool
      • How to submit a pipeline from the command line (finngen-cli)
      • How to run genome-wide association studies (GWAS)
        • How to run GWAS using REGENIE
        • Running quantitative GWAS with REGENIE
        • Conditional analysis
        • Conditional Analysis with custom regions and loci
        • How to run GWAS using SAIGE
        • Adding new covariates in GWAS using REGENIE and SAIGE
        • How to run GWAS using plink2 (for unrelated individuals only)
        • How to run GWAS using GATE (survival models)
        • How to run trajGWAS
        • How to run GWAS using the Regenie unmodifiable pipeline
        • How to run an interaction GWAS using the Regenie unmodifiable pipeline
        • How to run survival analysis using GATE unmodifiable pipeline
        • How to run GWAS on imputed HLA alleles using Regenie
      • How to run finemapping pipeline
        • Finemapping with custom regions in DF12
        • Unmodifiable Finemapping pipeline
      • How to run colocalization pipeline
      • How to run the LDSC pipeline
      • How to run PRS pipeline
      • How to calculate PRS weights for FinnGen data
      • Sandbox path and pipeline mappings
      • If your pipeline job fails
      • Tips on how to find a pipeline job ID
      • Managing memory in Sandbox and data filtering tips
      • Using Google Life Sciences API in Sandbox
      • Pipelines is based on Cromwell and WDL
    • Billing information and where to find more details
      • Monitoring Sandbox costs by Sandbox billing report
      • Monitoring Sandbox costs directly from your Google billing account
  • Working outside the Sandbox
    • Risteys
    • Endpoint Browser
    • PheWeb
      • Volcano plots with LAVAA
    • Meta-analysis PheWeb(s)
    • Coding variant browser
    • Multiple Manhattan Plot (MMP)
      • How to prepare an input file for MMP
      • How to use MMP
    • LD browser
    • Green library data
  • FAQ
    • FinnGen Spin Offs
    • FinnGen access and accounts
      • How do I apply for data access?
      • What is "red" or "green" data?
      • I already have green data access, how do I apply for red data access?
      • I cannot access the /finngen/red?
      • How do I enable two-factor authentication (2FA)?
      • I cannot access my FinnGen account?
      • How to reset account credentials
      • What to do if you suspect your account has been compromised
      • Can't access your smartphone for 2FA?
      • How do I access the FinnGen members' area?
      • How do I access FinnGen All Sharepoint?
      • How can I view existing analysis proposals?
      • How can I join the FinnGen Slack?
      • How do I join the FinnGen Teams group?
      • How to apply SES sandbox access
      • How to request a FinnGen account?
    • FinnGen data
      • What to do if I think I found a mistake in the data?
      • What are the field/column names in FinnGen?
      • What covariates are used in FinnGen's core GWAS analyses?
      • Does FinnGen have lab results available?
      • Does FinnGen have family and relatedness information available?
      • Where can I find a list of unrelated individuals in FinnGen?
      • When moving from BCOR to .txt files, what does the column called "correlation" mean?
      • Is there really no participant birth year data?
      • How do I calculate time between events?
      • Can I select only the columns needed for my analysis to import into RStudio?
      • What is the difference is between LD-clumping and the Saige conditional analysis?
      • Can I download all pairwise LD data across the genome at once?
      • How to find latest data releases?
      • Why are there differences in the GWAS results between Data Freezes/Releases?
    • Where can I find
      • COVID association results?
      • Users' Meeting materials?
      • A list of what coding variants are enriched in Finland?
      • A comprehensive list of key file locations in FinnGen?
      • Medical code translations?
    • PheWeb
      • What are QQ and Manhattan plots?
      • How can I access PheWeb?
      • Are fine-mapping results that available in PheWeb also available as flat files?
      • Do the autoreports report the 95% or 99% credible set?
    • Registries
      • What do KELA reimbursement codes map to?
      • What's the cutoff date for FinnGen data?
    • Sandbox
      • What is the FinnGen Sandbox?
      • Why does my IVM freeze while loading data into R/Rstudio
      • Where can I find tutorials and documentation on Sandbox?
      • How do I get my own analysis code into Sandbox?
      • Where to ask for software you'd like to see in Sandbox
      • Can I share individual level data between different Sandbox users?
      • Is there a sun grid engine for running long scripts?
      • How to clear browser cache after sandbox update
      • How do I increase the window resolution on my IVM?
      • How can I view pdf, jpg and HTML files?
      • My Sandbox job was killed - why?
      • How to unzip files in the command line
      • Why aren't my keyboard/shortcuts working in Sandbox like they do in my local computer?
      • How to know if my pipeline job was failed due preemption of worker VM
    • Risteys
      • Why is the case number dropping after the "Check pre-conditions, main-only, mode, ICD version" step?
    • Endpoints
      • Where do I find the most recent list of FinnGen endpoints?
      • What does it mean when an endpoint has “mode” at the end?
      • What scenario would cause an NA (missing data) entry rather than a zero?
      • Does it mean anything when a value is written as $!$ instead of NA?
      • Why is there an inconsistency between ICD10 code J84.1 (IPF) and J84.112?
      • How are control endpoints calculated?
      • Can I get a list of FinnGen IDs by control group for my endpoint?
      • What does Level C mean in the endpoints data table?
      • What does the SUBSET_COV field show?
      • Why is there a "K." prefix on some endpoints?
      • Why there are fewer endpoints going from R5 (N = 2,925) to R8 (N = 2,202)?
      • Should I include primary care registry (PRIM_OUT) codes in my cohort definitions?
      • I found BL_AGE after FU_END_AGE in the endpoint data, how is it possible?
      • Why do individuals who are not dead have death age in endpoint data?
      • I found EVENT_AGE after FU_END_AGE in endpoint data, how is it possible?
    • Pipelines
      • Are there example SAIGE pipelines?
      • How do I apply finemapping to my SAIGE results?
      • Why Pipelines is claiming that my files or folders are not in /finngen/red?
    • Citing
      • How do I cite analysis using publicly available FinnGen results?
      • How do I cite FinnGen results that use individual level data?
    • For biobanks
      • How to apply for data return
    • Data Security and Protection
      • How do I report a data breach?
  • Release Notes
    • Data Releases 2025
    • Data Releases 2024
    • Data Releases 2023
    • Data Releases 2022
    • Data Releases 2021
  • Tool Catalog
  • Glossary
  • User Support
  • Data Protection & Security
Powered by GitBook
On this page
  • 1. How to build a cohort based on a diagnosis using local (non-standard) codes
  • 2. How to build a control cohort based on a diagnosis using local (non-standard) codes
  • 3. How to build a cohort based on a diagnosis using international (standard) codes
  • 4. How to build a sex-specific cohort
  • 5. How to build a cohort based on OHDSI PhenotypeLibrary
  • 6. How to build a cohort filtering events by FinnGen register, e.g only those with inpatient records
  • 7. How to build a cohort using only diagnoses from specialty clinics, i.e. filtering for visit type
  • 8. How to build a cohort filtering by medication use
  • 9. How to build a cohort filtering for the number of medications an individual has received
  • 10. How to build a cohort based on Drug Era
  • 11. How to build a cohort using KELA reimbursement codes
  • 12. How to build a cohort using birth/delivery as a variable
  • 13. How to build a cohort with multiple events per person
  • 14. How to build a cohort by filtering by main/side diagnosis
  • 15. How to build a cohort using Kanta lab values
  • 16. How to export a cohort built in Atlas into R

Was this helpful?

  1. Working in the Sandbox
  2. Which tools are available?
  3. Atlas
  4. Quick guide

Examples on cohort building with Atlas

PreviousUsing Atlas in SandboxNextDetailed guide

Last updated 21 days ago

Was this helpful?

1. How to build a cohort based on a diagnosis using local (non-standard) codes

The cohort is a group of persons starting at the first diagnosis of a local code X until the end of follow-up. In Atlas, local codes, such as ICD codes are referred to as non-standard codes and are displayed in red color while OHDSI OMOP Common data model (CDM) codes are called standard codes and are displayed in blue color. The important thing is not to mix standard and non-standard codes in a concept set. In this example we will create a cohort of people with a diagnosis of type 2 diabetes based on local (non-standard) ICD-10 code E11.

Quick guide:

  1. ‘Concept sets’: Create a concept set, t2d_icd10. Search by e11, limit the ‘Vocabulary’ to ICD10 and select the main code. Remember to select ‘Descendants’ to take all the sub codes of E11.

  2. ‘Cohort Definitions’: 1) Create a new cohort and add the concept set t2d_icd into ‘Cohort Entry Events’ using ‘Add Initial Event’ and select ‘Add Condition Occurrence’. For non-standard codes, remember to upload the concept sets via ‘Add attribute’ and choose ‘Add Condition Source Concept’. 2) To limit our cohort to those individuals who have at least three diagnosis codes, upload the concept sets into the ‘Inclusion criteria’ and change the number to 3 occurrences. Remember to use the attributes and source concepts for these concept sets with non-standard codes. Save changes and generate the cohort.

Detailed instructions:

  1. We start by creating the necessary concept set. To do the concept sets, we will go to the ‘Concept Sets’ and click the ‘New Concept Set’. A new window will open where we give a name to our concept set. We click the ‘Add Concepts’ button at the bottom of the page. This will take us to a search where we can enter a string or a code. By typing e11 we will get a list of things that include e11. We can limit our search to ICD-10 by clicking the term ICD10 on the left hand panel under Vocabulary. After doing this we’ll see that only non-standard codes in red font are displayed. Note that there is also a Vocabulary term ICD10fi which includes different combination codes marked by an asterisk. The person counts are much smaller than when selecting ICD10, so in this example we will choose ICD10 to demonstrate the use of main codes and their sub codes. Let’s choose the main code E11. Then we scroll at the bottom of the page to tick the box for ‘Descendants’. This will include all the sub codes in our concept set as well without manually having to select them, e.g. E11.1 and so on. Finally, we press the button ‘Add to Concept Set’.

  1. Now we are back in the ‘Concept Sets’. We can see from the tab ‘Included Concepts’ tab that there are 34 concepts included. If we go and explore these, we’ll see the different subcodes of the main ICD-10 code. We can now close this concept set by clicking on the save button and then the X icon next to it and start creating the other concept set for drug purchases.

  1. Once the concept set is done, we can move to ‘Cohort Definitions’. We’ll see a list of already defined cohorts. Click the ‘New Cohort’ and a new window will open. Give the cohort a name and enter a more detailed description. Give a clear name to later find your cohort. It is good practice to add your initials at the end of the name. Remember to save changes.

  2. We start by defining the ‘Cohort Entry Events’. Click the ‘Add Initial Event’ and select ‘Add Condition Occurrence’. Because in this example our concept sets are based on non-standard codes, we need to upload them using source concepts. In practice this means one additional step compared to using concept sets based on standard codes, i.e. we need to click ‘Add attribute’ and choose ‘Add Condition Source Concept’. Another dropdown menu will appear, and we’ll import here the concept set we made for the diagnosis. By doing this, diagnoses in the concept set will be taken from any of the registers included in Atlas. If you’d like to filter by a certain register, please see the example How to build a cohort filtering events by FinnGen register (e.g only those with inpatient records). If we wouldn’t have any other criteria than having a diagnosis, we could generate our cohort already. In this example, we however want to adjust the diagnoses to at least three occurrences.

  1. Next, we will define all the inclusion criteria. Click ‘New Inclusion Criteria’ and give a descriptive name. Click ‘Add criteria to group’ and select ‘Add Condition Occurrence’. We click ‘Add attribute’ and select ‘Add Condition Source Concept’. Here we import again our concept set for diagnosis. Note that unless you have a more specific inclusion criteria, it is not necessary to add the diagnosis again in the ‘Inclusion Criteria’ if you added it into the ‘Cohort Entry Events’. Here we want to include it because we want to limit our criteria to a number of occurrences. In this example we’ll require at least three occurrences of the diagnosis. This can be adjusted at the top of the box.

  1. The final step is to check the ‘Cohort Exit’ criterion. Usually, it is fine to leave it as its default settings, i.e. the Event will persist until the end of continuous observation.

  2. Now we can save the changes and go to the ‘Generation’ tab to generate our cohort in our preferred release of the FinnGen data. Press the ‘Generate’ button and see the number of individuals included in the cohort. In our example there are 74,827 cases in our cohort. To view the effects the inclusion criteria, click the ‘View Report’ button that has appeared after you generated the cohort. This will show as percentages how many of the cases passed the different criteria. You can switch between intersect and attrition views, as well as Person and Event views by clicking on the appropriate links and tabs. Note that if there is nothing defined in the ‘Inclusion Criteria’, the generation of the cohort will not produce any report for viewing because the report would show the percentage of the cohort fulfilling the inclusion criteria. The cohort will be however generated normally even though the report doesn’t exist.

2. How to build a control cohort based on a diagnosis using local (non-standard) codes

The control cohort is a group of persons not having conditions. The easiest way is to copy the case cohort and adjust the definitions accordingly. The entry to the control cohort is entry to any of the registers at any time, unlike for cases where we usually define the entry as the occurrence of first diagnosis. This has implications for the control cohort adjustments as explained below.

Quick guide:

  1. ‘Cohort Definitions’: 1) Open and copy the case cohort and give it a new name. In ‘Cohort Entry Events’ click the ‘Delete Criteria’. Click the ‘Add Initial Event’ and select ‘Add Visit occurrence’. Leave it like this. 2) In the ‘Inclusion Criteria’, delete any inclusion criteria made for the cases or adjust to exactly 0 occurrences. Add any additional criteria relevant for the controls, e.g. they may need to be free of other diseases as well. Finally, on top of the criteria, change having ‘any’ of the following criteria to having ’all’ of the following criteria. Generate the cohort.

Detailed instructions:

  1. Start by opening the case cohort in ‘Cohort Definitions’ and press the ‘Create a copy of this cohort definition’ button on top right next to the cohort definition name. This will make the name of the cohort to be COPY OF case-cohort-name. Adjust the name and the definition accordingly. You can for example open the cohort from example 1 and copy it.

  2. Next, adjust the ‘Cohort Entry Events’ by clicking the ‘Delete Criteria’. This will delete the entry we had defined for the cases. For controls, we want to define the entry as any entry to the registers. To do this, click the ‘Add Initial Event’ and select ‘Add Visit occurrence’. We do not have to add any attributes but can leave the definition like this.

  1. In the ‘Inclusion Criteria’, click the blue box with the name of the inclusion criteria that was created for the cases and modify the name accordingly. By clicking the name, all the criteria we added for cases will become visible. For potential criteria for diagnosis and/or drug purchases, change the number of occurrences from ‘at least’ 3 to ‘exactly’ 0 occurrences

  2. You may want to add some additional inclusion criteria. In our example of type 2 diabetes, we also want our controls to be free of type 1 diabetes. For this we have created a concept set using local non-standard codes. We click the ‘Add criteria to group’ and select ‘Add Condition Occurrence’. We click the ‘Add attribute’ and select ‘Add Condition Source Concept’. Here we import our concept set for type 1 diabetes and select exactly 0 occurrences at the top of the box.

  3. Once we have modified/added all the inclusion criteria, we select at the top of the boxes ‘having all of the following criteria’.

  1. We can now save the changes, generate the cohort and inspect the number of individuals in the cohort. In this example we generate our cohort in FinnGen CMD R12, i.e. the latest data release, and see that our control cohort includes 424,741 individuals.

3. How to build a cohort based on a diagnosis using international (standard) codes

NB! This example shows also how to exclude specific sub diagnoses from the cohort definition

The cohort is a group of persons starting at the first diagnosis of national (non-standard) code that maps to multiple international (standard) codes until the end of follow-up. In Atlas, local codes are referred to as non-standard codes and are displayed in red color while OHDSI OMOP Common data model (CDM) codes are called standard codes and are displayed in blue color. The important thing is not to mix standard and non-standard codes in a concept set. In the example below we will create a cohort for cases of type 2 diabetes using international (standard) codes.

Quick guide:

  1. ‘Concept Sets’: Search for the term ‘type 2 diabetes’ and limit the ‘Vocabulary’ to SNOMED. Standard codes in blue will remain and their hierarchy can be inspected in more detail by clicking the selected term, here ‘Type 2 diabetes mellitus’. Once you have added the term with its ‘Descendants’ into your concept set, you can use the ‘Included Concepts’ tab to inspect all the sub codes and if you want to exclude any, tick the box next to it and select ‘Exclude’ at the bottom of the page. Finally, click ‘Add to Concept Set’.

  2. ‘Cohort Definitions’: Create the cohort as usual. As the concept set is based on standard codes, there is no need to ‘Add Attribute’ and use source concept criteria but you can upload the concept set directly. Generate the cohort.

Detailed instructions:

  1. We start by creating the concept set. In the ‘Concept Sets’, click on ‘New concept’, give it a name, save it and click ‘Add concepts’. This will open a new window with a Search. Let’s write ‘type 2 diabetes’, click enter or the magnifying glass icon to search. On the ‘Vocabulary’, we can limit to SNOMED by clicking on that (see image below). Now we see that all the entries that remain are in blue and thus standard codes.

  1. Let’s click on the ‘Type 2 diabetes mellitus’. This will open a new window. For standard codes we can inspect the ‘Hierarchy’ tab to see the parents and children of this code. In our example, we see that there is a parent code ‘Diabetes mellitus’ and 12 children codes.

  1. Next we click the ‘Current concept’ box under the ‘Hierarchy’ tab and as we are happy with our initial selection, we can keep this concept by ticking the box next to it, selecting also the ‘Descendants’ and clicking the ‘Add To Concept Set’.

  1. Now when we go back to ‘Concept Sets’, we see that our concept includes 18 concepts. We can see them in more detail by going to the ’Included Concepts’ tab. We can order the concepts by record count by clicking the RC. If we wanted to exclude some of them, e.g. ‘Pre-existing type 2 diabetes mellitus’, we can select the box next to it and at the bottom of the page choose ‘Exclude’ and click the ‘Add To Concept Set’.

  1. Once we have done this we see that the number of included concepts has been updated to 17. This exclusion is also reflected on the ‘Concept Set Expression’ tab. Save the changes and go to ‘Cohort Definitions’.

  2. In ‘Cohort Definitions’, create ‘New Cohort’. Give it a name, click the save icon and start from the ‘Cohort Entry Events’. Click the ‘Add Initial Event’ and select ‘Add Condition Occurrence’. Now we can directly upload our concept set to the condition occurrence without having to add an attribute, unlike we do with local (non-standard) codes.

  1. If we don’t have any other specific inclusion criteria, we can proceed to generating the cohort in the ‘Generation’ tab in our preferred release of the FinnGen data.

  2. Now with the definitions above, our cohort generated on FinnGen R12 includes 85,917 individuals.

4. How to build a sex-specific cohort

A sex-specific cohort is created by adding an additional inclusion criterion for sex. That is, you will first build your cohort as usual by creating the necessary concept sets and then in the ‘Cohort Definitions’, add the criterion for sex. In the example below, we use an existing cohort for type 2 diabetes we created in the Example 1. We modify this cohort and create a cohort for females only. Similarly, a cohort for males can be built.

Quick guide:

  1. ‘Concept Sets’: Create as usual.

  2. ‘Cohort Definitions’: Add new ‘Inclusion Criteria’ and select ‘Add Demographic’. Click the ‘Add attribute’ button and select ‘Add Gender Criteria’. Search for ‘female’. A list of terms including ‘FEMALE’ will appear. Choose this standard code by clicking on it and next, click the ‘Add and close’ button. Generate the cohort.

Detailed instructions:

  1. Create your cohort from scratch or copy an existing cohort that you would like to stratify by sex. We will now copy a cohort by opening an existing cohort in ‘Cohort Definitions’ and clicking the clone button in the top right corner and edit the name of the cohort. Save the cohort.

  2. In the ‘Inclusion Criteria’, click the green ‘New inclusion criteria’ button on the left and give your criterion a name, e.g. “Females only”. Click the ‘Add Criteria to group’ and select ‘Add Demographic’.

  1. Click the ‘Add attribute’ button and select ‘Add Gender Criteria’.

  1. A new window will open. Search for ‘female’. A list of terms including ‘FEMALE’ will appear. Choose this standard code by clicking on it and next, click the ‘Add and close’ button.

  1. Your cohort now includes a criterion for females (see the figure below).

  1. Save your cohort and proceed to generating the cohort in the ‘Generate’ tab. Create the cohort for males similarly.

  2. Explore in the ‘Characterizations’ that the cohorts are as intended, i.e. include only males or females. Start by creating a ‘New Characterization’ and give a name to it. Stay on the ‘Design’ tab and import your cohorts in the ‘Cohort definition’ section and select features of interest in the ‘Feature analyses’ section. When you click the ‘Import’ button in the ‘Feature analysis’, a list of possible features will appear in a new window. You can filter e.g. on demographics only by selecting on the left hand panel ‘Domain’. Once you have selected your features of interest, scroll down, and click the ‘Import’ button.

  1. Save your characterization and go to the ‘Executions’ tab. Choose the FinnGen release in which you would like to run the analyses. In the figure below we have chosen the latest release, R12. After the analysis has been generated, you can click the ‘View latest results’ to see the characterizations.

  1. A new window will open. By default, only one cohort is chosen, so to inspect your cohorts at the same time, in the ‘Filter panel’ you’ll need to tick the box for both cohorts. In the figure below, the results show that we have generated our cohorts correctly, i.e. the female cohort has 0 males and the male cohort has 0 females in it, respectively. The figure also shows no overlap in our cohorts.

  1. Now that we are confident in our sex-specific case cohorts, we can proceed to making the sex-specific control cohorts by cloning the sex-specific case cohorts and adjusting the definitions accordingly.

5. How to build a cohort based on OHDSI PhenotypeLibrary

Quick guide:

  1. Go to data.ohdsi.org/PhenotypeLibrary/. Use the search function to find your desired cohort and open the JSON tab. Copy the code and use the Clipboard in Sandbox to bring the code into Sandbox.

  2. In Atlas ‘Cohort Definitions’: Create a new cohort and go to the ‘Export’ tab and select the ‘JSON’ button. Here, paste the JSON code from Clipboard – if needed, in small chunks. Finally, click the ‘Reload’ button at the bottom of the screen. Go to the ‘Definition’ tab and check that all the cohort entry, inclusion and exit criteria have appeared there. Generate the cohort.

Detailed instructions:

  1. Go to data.ohdsi.org/PhenotypeLibrary/. Here, one can see the list of all available cohorts in the library. By searching and then selecting the desired cohort, tabs appear under the table. The tabs include details about the cohort definition as well as text for JSON and SQL that can be copied and used accordingly, e.g. in Atlas.

  1. Copy the JSON text into Clipboard in the Sandbox. Note that due to the small size of the Clipboard, you may need to copy and paste to the Clipboard and Atlas in small chunks.

  2. In Atlas, you will not need to go to ‘Concept Sets’ but can go directly to ‘Cohort Definitions’. Click the ‘New Cohort’ button and give a name to the cohort. Then go to the ‘Export’ tab and select the ‘JSON’ button. Here, paste the JSON code from Clipboard – if needed, in small chunks. Finally, click the ‘Reload’ button at the bottom of the screen.

  1. Go back to the ‘Definition’ tab and check that all the cohort entry, inclusion and exit criteria have appeared there.

  1. If everything looks ok, the cohort is ready to be generated in the ‘Generation’ tab.

  2. Finally, generate the control cohort as usual, i.e. by cloning the case cohort and adjusting the definitions accordingly.

6. How to build a cohort filtering events by FinnGen register, e.g only those with inpatient records

We can use appearance in specific registers to filter cases. In Atlas, there are already readily made concept sets for the following registers: INPAT, OUTPAT, OPER_IN, OPER_OUT, PRIM_OUT, CANC, PURCH, and REIMB, named as ‘REGISTER [FinnGen support concept set]’, where REGISTER is one of the above-mentioned registers. Some of the registers have also been combined into one concept set, e.g. INPAT+OUTPAT [FinnGen support concept set]. These concept sets can be used in the ‘Cohort Definitions’ for filtering as will be described below.

NB! The below instructions are for data release 12 and before. In R13, it is even simpler to filter for registers. You can find INPAT, OUTPAT, OPER_IN, OPER_OUT and PRIM_OUT codes in the ‘Search’ window under the domain ‘Visit’, vocabulary ‘FGVisitType’, and create concept sets from them. Just tick the box for ‘Descendants’ to include all the specific subtypes of the register (e.g. INPAT emergency visit, non-emergency visit) and include/exclude the subtypes as usual.

Quick guide:

  1. ‘Concept Sets’: Create as usual.

  2. ‘Cohort Definitions’: in the ‘Cohort Entry Events’, add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Nested Criteria’. Next click the ‘Add criteria to group’ and choose ‘Add Visit Occurrence’. There, click ‘Add attribute’ and select ‘Add Visit Source Concept Criteria’. Now, import one of the readily made register concepts, here ‘INPAT [FinnGen support concept]’ and click the box ‘restrict to the same visit occurrence’. Generate the cohort.

Detailed instructions:

  1. Create a new cohort from scratch or clone an existing cohort you want to modify. In this example we will copy an existing cohort for type 2 diabetes patients by first opening the cohort in 'Cohort Definitions', here T2D_cases[MK], and then clicking the copy button in the top right corner after which we edit the name of the cohort, e.g. to T2D_cases_inpatients. In the ‘Cohort Entry Events’, we add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Nested Criteria’.

  1. Next click the ‘Add criteria to group’ and choose ‘Add Visit Occurrence’. There, click ‘Add attribute’ and select ‘Add Visit Source Concept Criteria’. Now, import one of the readily made register concepts, here ‘INPAT [FinnGen support concept]’ and click the box ‘restrict to the same visit occurrence’. Note that it doesn’t matter whether you add first the diagnosis and then the register as a nested criteria or the other way round. However, if you have several entry events, e.g. two different diagnoses that you want to limit to a specific register, it may be better to add first the register and then the diagnoses as nested criteria from the register.

  1. Save the changes and go to the ‘Generation’ tab. Generate the cohort in the data release you wish, e.g. the latest one. Now we see that restricting our cases to those with inpatient records only gives us a sample size of 38,319. Our cohort without this restriction had 82,164 individuals.

7. How to build a cohort using only diagnoses from specialty clinics, i.e. filtering for visit type

Quick guide:

  1. ‘Concept Sets’: Create as usual.

  2. ‘Cohort Definitions’: In the ‘Cohort Entry Events’, add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Provider Specialty’. Click the ‘Add’ button and type in the search box your desired specialty clinic, in this example ‘endocrinology’. A list of terms including ‘endocrinology’ will appear, including both non-standard (N) and standard (S) codes. Choose the standard code with the ‘Vocabulary’ term ‘Medicare Specialty’. Click ‘Add And Close’. Generate the cohort.

Detailed instructions:

1. Create a new cohort from scratch or clone an existing cohort you want to modify. In this example we will clone an existing cohort for type 2 diabetes patients by first opening the cohort in ‘Cohort Definitions’ and then clicking the clone button in the top right corner after which we edit the name of the cohort, e.g. to T2D_cases_endocrinology. In the ‘Cohort Entry Events’, we add an attribute to the ‘Condition Occurrence’ and choose the ‘Add Provider Specialty’.

2. Next, click the ‘Add’ button and type in the search box your desired specialty clinic, in this example ‘endocrinology’. A list of terms including ‘endocrinology’ will appear, including both non-standard (N) and standard (S) codes. Choose the standard code with the ‘Vocabulary’ term ‘Medicare Specialty’. Click ‘Add And Close’.

3. Now, this new restriction has appeared in our cohort definition.

4. Save the changes and generate the cohort as usual by going to the ‘Generation’ tab and selecting the desired release of the data in which to generate the cohort. Inspect the number of individuals included in the cohort. By generating the cohort in R12 we notice that there are now 8,037 type 2 diabetes patients who had a visit to the endocrinology clinic as compared to our original cohort of 82,164 type 2 diabetes patients from all the registers.

8. How to build a cohort filtering by medication use

e.g. new users of blood glucose lowering medication with a prior diagnosis of type 2 diabetes

In this example we will define two concept sets, one for the diagnosis and one for medication, using SNOMED codes for the diagnosis and ATC codes for the medication.

Quick guide:

  1. ‘Concept Sets’: Create as usual.

  2. ‘Cohort Definitions’: 1) In ‘Cohort Entry Events’, click the ‘Add Initial Event’ and select ‘Add Drug Exposure’. Import the concept set and add first time of use by clicking ‘Add attribute’ and select ‘Add First Exposure Criteria’. 2) In the ‘Inclusion Criteria’, create ‘New inclusion criteria’ for the prior diagnosis of type 2 diabetes as usual. Modify the boxes below to include ‘where the event starts all days before and 0 days before index start date’. By doing this, we ensure that the diagnosis was given between 0 and any day before the drug use start date which is the index start date. 3) In the ‘Cohort Exit’, define that ‘Event will persist until end of a continuous drug exposure’ and import the concept set for the drugs. Specify a persistence window, e.g. a maximum of 30 days’ gap between prescriptions. Save the changes and generate the cohort.

Detailed instructions:

  1. We start by creating the concept set. In the ‘Concept Sets’, click on ‘New concept’, give it a name, save it and click ‘Add concepts’. This will open a new window with a Search. Let’s write ‘type 2 diabetes’, click enter or the magnifying glass icon to search. On the ‘Vocabulary’, we can limit to SNOMED by clicking on that. Now we see that all the entries that remain are in blue and are thus standard codes. We can select the ‘Type 2 diabetes mellitus’ by clicking the checkbox on the left hand side of the name, and at the bottom of the page tick the box for ‘Descendants’ and ‘Add To New Concept Set’. Now we can go back to the ‘Concept Sets’, save the changes and close the window.

  1. We can create a concept set similarly for the medication. In the ‘Concept Sets’, click on ‘New concept’, give it a name, save it and click ‘Add concepts’. This will open a new window with a Search. Let’s write ‘A10B’, click enter or the magnifying glass icon to search. On the ‘Vocabulary’, we can limit to ATC codes by clicking on that. Now we see that all the entries that remain are in purple color. We can select the appropriate term by clicking the checkbox on the left hand side of its name, and at the bottom of the page tick the box for ‘Descendants’ and ‘Add To New Concept Set’. Now we can go back to the ‘Concept Sets’, save the changes and close the window.

  1. Next we create a new cohort by going to ‘Cohort Definitions’, clicking the ‘New Cohort’ button and by giving a name to our cohort.

  2. In ‘Cohort Entry Events’, click the ‘Add Initial Event’ and select ‘Add Drug Exposure’. Since we used ATC codes for our concept set definition, we can treat them similarly to standard codes and input directly. We can add any other specific criteria, e.g. first time of use. Click again ‘Add attribute’ and select ‘Add First Exposure Criteria’.

  1. Next, we need to include only those with a prior diagnosis of type 2 diabetes. In the ‘Inclusion Criteria’, click the ‘New inclusion criteria’, give it a name and click ‘Add criteria to group’. From the dropdown menu select ‘Add Condition Occurrence’. Since our concept set for type 2 diabetes was made using SNOMED (standard) codes, we can import it directly. Since we require for a prior diagnosis, we’ll need to modify the boxes below to include ‘where the event starts all days before and 0 days before index start date’. By doing this, we ensure that the diagnosis was given between 0 and any day before the drug use start date which is the index start date.

  1. Now we are ready to generate our cohort in the ‘Generation’ tab. In this example, we can see that there are 22,364 cases with this definition in FinnGen R12 data.

9. How to build a cohort filtering for the number of medications an individual has received

In the example 1. How to build a cohort based on a diagnosis using local (non-standard) codes we already used the option to limit the cohort to only those individuals who have at least three occurrences of the diagnosis. You can follow that example for building the cohort but use concept sets for medications instead of diagnosis. Similarly for medications, we can filter for the number of purchases. This can be done in the ‘Cohort Definitions’ ‘Inclusion Criteria’ section at the top of each criterion box by selecting the appropriate option from ‘at most’, ‘exactly’, and ‘at least’ and the number of occurrences.

10. How to build a cohort based on Drug Era

e.g. Parkinson’s Disease patients who have used levodopa medication continuously for at least five years after the diagnosis

In Atlas, you can build cohorts for the following ‘Eras’: ‘Condition’, ‘Dose’ and ‘Drug’. In this example we will focus on ‘Drug Eras’ which refer to continuous periods of drug use based on the active ingredient of a drug. For this, we will need to use the international RxNorm codes of drugs. The eras are calculated from the start date of the first purchase to the end date of the last purchase in a defined period. Note that in FinnGen data the end date is currently defined as start date + 1. Therefore, single drug purchases will be counted to have a length of 2 days. Also, in FinnGen data gaps larger than 120 days between drug purchases result in the calculation of distinct eras.

In this example we will use Parkinson’s Disease based on international SNOMED coding and levodopa medication based on international RxNorm coding.

Quick guide:

  1. ‘Concept Sets’: Create as usual, one for the condition and one for the drugs.

  2. ‘Cohort Definitions’: 1) In the ‘Cohort Entry Events’, click ‘Add Initial Event’, select ‘Add Condition Occurrence’ and import the concept set for the condition. 2) In the ‘Inclusion Criteria’, create a new inclusion criteria. Click ‘Add criteria to group’ and select ‘Add Drug Era’. Import the concept set for the drug. To have the drug use after the diagnosis and to be continuous for at least five years, click the ‘Add attribute’ and select ‘Add Era Length Criteria’. Select the Era length to be Greater than Equal to 1825 days, i.e. five years. Modify the section ‘where event starts between 0 days Before and All days After index start date’ to consider only drug purchases after the diagnosis. Generate the cohort.

Detailed instructions:

  1. We start by creating two ‘Concept sets’, one for the disease and one for the drug. Go to ‘Concept Sets’ and click ‘New Concept Set’. Give a name to your concept sets and click ‘Add Concepts’ at the bottom of the page. A Search window will appear. Let’s write ‘Parkinson’s disease’. A list of standard (blue), and non-standard (red) codes will appear. We can limit our search to SNOMED codes by selecting it from the left hand panel ‘Vocabulary’. Now only standard codes will remain. We can click the SNOMED code ‘Parkinson’s Disease’ and inspect its hierarchy from the ‘Hierarchy’ tab. We can see all its Parents and Children. We can add it with its Children (‘Descendants’ tick box clicked) to the concept set. Now when we go back to the ‘Concept Sets’, we see that it includes 10 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.

  2. Next, we create a concept set for the drug. Similarly, we create a new concept set and in the search window, write ‘levodopa’. We can again limit our search to RxNorm codes only. By clicking the ‘levodopa’ and inspecting it, we can conclude that it covers the information we are looking for. We can add this concept with its Children (‘Descendants’) to the concept set. Now when we go back to the ‘Concept Sets’, we see that it includes 4160 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.

  3. Next we go to ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Select ‘Add Condition Occurrence’. Click the arrow next to the ‘Any Condition’ and import the concept set of Parkinson’s Disease you created in the first step. Since this was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly.

  4. Next, go to ‘Inclusion Criteria’. Click the ‘New inclusion criteria’ and give it a name. Click ‘Add criteria to group’ and select ‘Add Drug Era’. Import the concept set for levodopa. Now we want to have the drug use after the diagnosis and to be continuous for at least five years. For this, we’ll click the ‘Add attribute’ and select ‘Add Era Length Criteria’. Next we will select the Era length to be Greater than Equal to 1825 days, i.e. five years. We also need to modify the section ‘where event starts between 0 days Before and All days After index start date’ to consider only drug purchases after the diagnosis.

  1. Now, we are ready to generate our cohort. Save the changes and go to the ‘Generation’ tab. Select the data release in which you’d like to generate the cohort and click ‘Generate’. We can see that in R12 there are 758 individuals in our cohort. When we click the ‘View Report’, we see that in total there are 6,065 individuals with Parkinson’s Disease but only 12.5%, i.e. 758 of them have used levodopa medication continuously for at least five years after the disease diagnosis.

11. How to build a cohort using KELA reimbursement codes

Quick guide:

  1. ‘Concept Sets’: The best way to search for the KELA reimbursement codes is by string search, e.g. here ‘psych’. To limit the search to KELA reimbursement codes, select REIMB from the ‘Vocabulary’ panel. Select the appropriate code and add it to the concept set. Note that the KELA reimbursement codes do not have ‘Descendants’ so no need to select that tick box.

  2. ‘Cohort Definitions’: In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Reimbursements are in the domain ‘Condition’ so select ‘Add Condition Occurrence’. Since reimbursement codes are based on non-standard codes, click the ‘Add attribute’ and select ‘Add Condition Source Concept’. Import the concept set, save changes and generate the cohort as usual.

Detailed instructions:

1. Go to ‘Concept Sets’ and click ‘New Concept Set’. Give a name to your concept sets and click ‘Add Concepts’ at the bottom of the page. A Search window will appear. Let’s write ‘psych’. It is better to write part of the string rather than the code itself, here 112, because writing the number will get Atlas stuck. After you press the Enter, a list of codes will appear. We can limit our search to REIMB codes by selecting it from the left hand panel ‘Vocabulary’. Now only such codes will remain. We can select the appropriate code and add it to the concept set. Note that the KELA reimbursement codes do not have ‘Descendants’ so no need to select that tick box. Now when we go back to the ‘Concept Sets’, we see that it has 1 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.

2. We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Note that reimbursements are in the domain ‘Condition’ so you’ll need to select ‘Add Condition Occurrence’. Since reimbursement codes are based on non-standard codes, we need to click the ‘Add attribute’ and select ‘Add Condition Source Concept’. By clicking the arrow next to the ‘Condition Source Concept is Any Condition’, we can import the concept set we created in step 1.

3. In this example we don’t need to define any additional ‘Inclusion criteria’. We can save the changes and go to the ‘Generation’ tab. Select the data release in which you’d like to generate the cohort and click ‘Generate’. We can see that in R12 there are 19,243 individuals in our cohort which is the same number as for our concept set since we didn’t apply any other criteria.

12. How to build a cohort using birth/delivery as a variable

e.g. a cohort of women who develop autoimmune disease within a year of giving birth

This example requires two concept sets: one for autoimmune disease and one for giving birth. Selecting only females will be done at the ‘Cohort Definitions’, although the concept of delivery should already filter for females.

Quick guide:

  1. ‘Concept Sets’: Create as usual, one for the disease and one for giving birth (delivery).

  2. ‘Cohort Definitions’: 1) Create a new cohort. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Delivery is in the domain ‘Procedure’ so select ‘Add Procedure Occurrence’ and import the concept set of delivery. Delivery should be recorded for females only but to make sure that our cohort covers only females, click ‘Add attribute’, select ‘Add Gender Criteria’ and click ‘Add’. A new window will open where you can write ‘female’. Select the correct one from the list by clicking the checkmark on the left, and click ‘Add And Close’. 2) In the ‘Inclusion Criteria’, create new inclusion criteria, click ‘Add criteria to group’ and select ‘Add Condition Occurrence’. Import the concept set for autoimmune disease. To restrict the diagnosis to within one year of giving birth, modify the section ‘where event starts between 0 days before and 365 days after index start date’. Generate the cohort.

Detailed instructions:

1. Go to ‘Concept Sets’ and click ‘New Concept Set’. Give a name to your concept sets and click ‘Add Concepts’ at the bottom of the page. A Search window will appear. Let’s write ‘autoimmune’. A list of standard (blue), and non-standard (red) codes will appear. We can limit our search to SNOMED codes by selecting it from the left hand panel ‘Vocabulary’. Now only standard codes will remain. We can click the SNOMED code ‘autoimmune disease’ and inspect its hierarchy from the ‘Hierarchy’ tab. We can see all its Parents and Children. This code seems reasonable for us so we will add it with its Children (‘Descendants’ tick box clicked) to the concept set. Now when we go back to the ‘Concept Sets’, we see that it includes 593 Included Concepts. We can save the changes and close the concept set by clicking on the appropriate icons next to its name.

2. Next, we create a concept set for giving birth. Similarly, we create a new concept set and in the search window, write ‘delivery’. We can again limit our search to SNOMED codes only. By clicking the ‘Delivery procedure’ and inspecting it, we can conclude that it covers the information we are looking for, including vaginal delivery of fetus and cesarean section. We can add this concept with its Children (‘Descendants’) to the concept set.

3. We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’. Note that delivery is in the domain ‘Procedure’ so you’ll need to select ‘Add Procedure Occurrence’. Click the arrow next to the ‘Any Procedure’ and import the concept set of delivery. Since this was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly. Delivery should be recorded for females only but to make sure that our cohort covers only females, click ‘Add attribute’, select ‘Add Gender Criteria’ and click ‘Add’. A new window will open where you can write ‘female’. Select the correct one from the list by clicking the checkmark on the left, and click ‘Add And Close’.

4. Next, go to ‘Inclusion Criteria’. Click the ‘New inclusion criteria’ and give it a name. Click ‘Add criteria to group’ and select ‘Add Condition Occurrence’. Import the concept set for autoimmune disease. Now we want to restrict the diagnosis to within one year of giving birth. For this, we’ll need to modify the section ‘where event starts between 0 days before and 365 days after index start date’.

5. Now, we are ready to generate our cohort. Save the changes and go to the ‘Generation’ tab. Select the data release in which you’d like to generate the cohort and click ‘Generate’. We can see that in R12 there are 661 women in our cohort. When we click the ‘View Report’, we see that in total there are 121,078 events (deliveries) but only 0.55%, i.e. 661 women develop an autoimmune disease within one year of giving birth.

13. How to build a cohort with multiple events per person

e.g. individuals with repeated fractures

In this example we create a cohort of persons who have at least two fractures. We also consider the time span between separate events as 120 days (4 months).

Quick guide:

  1. ‘Concept Sets’: Create as usual.

  2. ‘Cohort Definitions’: Create a new cohort. 1) In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Condition Occurrence’, and import the concept set. Now, the important part is to change the ‘Limit initial events to: earliest event per person’ to ‘Limit initial events to: all events per person’. 2) In the ‘Inclusion Criteria’, do the same change as above even if there were no other criteria. In this example we add one criteria of having at least 2 fractures. Press the ‘New inclusion criteria’, give it a name, and click the ‘Add criteria to group’ and select ‘Add Condition Occurrence’. Import the concept set and at the top of the box change the number to with ‘at least 2’ occurrences of. Finally, change the ‘Limit qualifying events to earliest event per person’ to ‘Limit qualifying events to all events per person’. 3) In the ‘Cohort Exit’, change the Event Persistence to ‘fixed duration relative to initial event’. Change also the Number of days offset from 0 to an appropriate period, e.g. to 120 days, meaning that fractures happening 120 days (4 months) apart are considered separate events. 4) Generate the cohort and inspect the numbers of people and records. There should be more records than people now.

Detailed instructions:

  1. We start by creating a ‘Concept Set’ for fractures. Go to ‘Search’ and type ‘fracture of bone’. We can limit our search to SNOMED terms in the ‘Vocabulary’ panel and Clinical Finding in the ‘Class’ panel. It is often useful to sort the results by record count (RC). After sorting, we see that ‘Fracture of bone’ is the one with the highest record count.

  1. By clicking the name we can inspect it further. If we go to the ‘Hierarchy’ tab we see that this concept has one Parent and 28 Children concepts. We are happy with our selection, so we can tick the checkbox on the left of the name, tick the ‘Descendants’ and finally click the ‘Add To New Concept Set’.

  1. Next we can go to ‘Concept Sets’, and give a name to our new concept set. We see that there are 2920 concepts included. We can save the changes and close the window by clicking the appropriate buttons next to the name of the concept set.

  1. We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Condition Occurrence’. Since the concept set was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly. Now, change the ‘Limit initial events to: earliest event per person’ to ‘Limit initial events to: all events per person’.

  1. In the ‘Inclusion Criteria’, do the same change as above even if there were no other criteria. We, however, want to add a criteria for a minimum of two fractures per person. To do this, we press the ‘New inclusion criteria’, give it a name, and click the ‘Add criteria to group’ and select ‘Add Condition Occurrence’. We can import our concept set and at the top of the box change the number to with ‘at least 2’ occurrences of. Finally, we change the ‘Limit qualifying events to earliest event per person’ to ‘Limit qualifying events to all events per person’.

  1. In the ‘Cohort Exit’, change the Event Persistence to ‘fixed duration relative to initial event’. Change also the Number of days offset from 0 to an appropriate period. In this example we change it to 120 days, meaning that fractures happening 120 days (4 months) apart are considered separate events.

  1. Save the changes and generate the cohort in the ‘Generation’ tab in your selected release of data. By selecting R12, we note that there are 112,762 individuals in our cohort and the event count is greater than the number of individuals with 218,578 records, reflecting multiple fractures per person. Note that since we set the number of days offset to 120 days, for some individuals multiple fractures within this time frame are counted as one episode, and hence, the number of records is not at least twice the number of individuals. By varying the days offset to different numbers, you will observe that the number of records will change accordingly.

14. How to build a cohort by filtering by main/side diagnosis

Sometimes we want to consider only main or side diagnoses. In this example we use the previous example How to build a cohort with multiple events per person, e.g. individuals with repeated fractures to take only main diagnoses.

  1. In ‘Cohort Definitions’, open and copy the previously generated cohort and give it a new name. In the ‘Cohort Entry Events’, click the ‘Add attribute’ and select ‘Add Condition Status’. Press the ‘Add’ button and a new window will open. Type ‘primary diagnosis’ in the search bar and click the ‘Search’ button. This will give you ‘Primary diagnosis’. Select this by ticking the checkbox on the left of the name and press ‘Add And Close’. Note that if you wanted to select a side diagnosis, you would type ‘Secondary diagnosis’ in the search bar.

  1. Save the changes and generate the cohort in the ‘Generation’ tab in your selected release of data. Now we see that there are 107,141 individuals with 190,738 records.

15. How to build a cohort using Kanta lab values

Kanta lab values can be found in Atlas as concepts under the ‘Vocabulary’ as ‘LOINC’ in their harmonized form and as ‘LABfi_ALL’ in their original form. In this example we will build a cohort of individuals who have high fasting triglycerides value (>2.0 mmol/l) at least once.

Quick guide:

  1. ‘Cohort Definitions’: Click ‘New Cohort’, give it a name and description. 1) In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Measurement’. 2) In the ‘Inclusion Criteria’, add a new criteria, click the ‘Add criteria to group’, select ‘Add measurement’ and import the concept set for the triglycerides. Click the ‘Add attribute’ and select ‘Add Value as Number Criteria’. Edit the number to be ‘Greater than 2’. 3) Save and generate the cohort as usual.

Detailed instructions:

  1. We can start building the cohort in the ‘Cohort Definitions’. Click ‘New Cohort’, give it a name and description. In the ‘Cohort Entry Events’, click ‘Add Initial Event’, and select ‘Add Measurement’. Since the concept set was based on standard codes, we don’t need to click the ‘Add attribute’ but we can import the concept set directly.

  1. We will add an ‘Inclusion Criteria’ for triglyceride values higher than 2 mmol/l. We click the ‘New inclusion criteria’ and give it a name. Next, click the ‘Add criteria to group’, select ‘Add measurement’ and import the concept set for the triglycerides. Now click the ‘Add attribute’ and select ‘Add Value as Number Criteria’.

Now a new criteria has appeared and we can edit the number to be ‘Greater than 2’.

  1. Save the changes and generate the cohort in the ‘Generation’ tab in your selected release of data. With R12, we see that there are 85,737 individuals in our cohort.

16. How to export a cohort built in Atlas into R

Quick guide:

  1. In Sandbox, go to Applications > Sandbox > CohortOperations2.

  2. On the left side panel, click the ‘Import Cohorts’. From the tabs, select ‘Atlas’. Type in the search bar the name of the cohort and select the correct one by ticking the checkbox on the left next to the cohort name and click ‘Import Selected’.

  3. On the left side panel, click the ‘Export’. Under ‘Select cohort:’, choose the one you want to export and click ‘Export’.

  4. A new window will open. Modify the name and the location accordingly and click the ‘Save’ button.

Detailed instructions:

  1. In Sandbox, go to Applications > Sandbox > CohortOperations2.

  1. On the left side panel, click the ‘Import Cohorts’. You can now choose from the tabs where the cohort is imported from. Select ‘Atlas’. A list of cohorts created in Atlas will appear. We use the search bar to type the name of the cohort created in Atlas. In this example we look for the cohort repeated_fractures[MK], so we’ll write ‘repeated fractures’ in the search bar. Cohorts including this in their name will appear. Select the one you are interested in by ticking the checkbox on the left next to the cohort name and click ‘Import Selected’.

  1. Once the cohort has been imported into Cohort Operations, we can select ‘Export’ from the left side panel. Under ‘Select cohort:’, we can click the arrow to show a dropdown menu of imported cohorts. In this example we have imported only one cohort, so only that one will show and we will select that. Next we are ready to click ‘Export’ at the bottom of the page.

  1. A new window will open. There we can modify the name of the cohort as well as the location where we want to save the cohort. Here we keep the name as it is and change the location to ‘Downloads’. Once ready, we click the ‘Save’ button. Now the cohort is saved as cohortname.tsv format into the ‘Downloads’ folder and can be read into R.

This is probably the easiest way of making a cohort in Atlas. The includes cohorts created by the OHDSI community that anyone can make use of.

Similarly to filtering by registers, we can filter by visits to specialty clinics. A list of the clinics is given . Clinics with visits from >50 persons only have been brought to Atlas for privacy reasons. In this example we will continue with our cohort of type 2 diabetes patients but will include only patients who have visited an endocrinology clinic.

In the ‘Cohort Exit’, we define that ‘Event will persist until end of a continuous drug exposure’, since we are interested in the cohort of people who use the blood glucose lowering drugs. We need to import our concept set for drugs to define our drugs of interest. We can allow for a persistence window, e.g. a maximum of 30 days’ gap between prescriptions. Then, individuals with larger gaps will not be considered as continuous drug users and will not be included in our cohort.

The KELA reimbursement codes can be found in Atlas using the ‘Vocabulary’ REIMB. More detailed information on the mapping is provided . In this example we will use the KELA reimbursement code 112 for ‘Severe psychotic and other severe mental disorders’ to define our cohort.

‘Concept Sets’: Go to ‘Search’ and type your search term, here ‘trigly’. Select the harmonized value which can be identified with the ‘Vocabulary’ as ‘LOINC’. You can check from that the id of your selected measurement matches with the OMOP id of the correct measurement. Select the correct term and create the concept set as usual.

We start by creating a ‘Concept Set’ for triglycerides. Go to ‘Search’ and type ‘trigly’. After sorting by record count (RC), we see that the harmonized value with ‘Vocabulary’ as ‘LOINC’ has the most records. We go to to check that this is the one we are after. In Risteys we also type ‘trigly’ and see that the OMOP id for the fasting triglycerides matches with the one in Atlas. We can tick the checkbox next to the id number and click the ‘Add To Concept Set’. There are no ‘Descendants’ in the lab values so no need to select that one. We can go to ‘Concept Sets’ from the left side panel, give a name to the concept and save it.

Sometimes it is useful to work further with the cohort in other tools, such as R. There is no quick way to do the export in Atlas but we can use other tools in Sandbox to do this. can read in all the cohorts made in Atlas and it can also be used to export cohorts.

OHDSI PhenotypeLibrary
here
here
Risteys
Risteys
The Cohort Operations tool