Using Atlas in Sandbox

The figure below shows the full workflow of cohort building and analyses within Sandbox. In this tutorial we will focus on the part containing Atlas.

Atlas has a lot of functionalities, from cohort definitions to calculating incidence rates. However, for cohort building purposes, ‘Concept Sets’ and ‘Cohort Definitions’ are usually sufficient, with the addition of ‘Characterizations’ to inspect that the built cohort is as intended.

In the case of using a readily made phenotype definition from the OHDSI PhenotypeLibrary, you can proceed directly to the ‘Cohort Definitions’.

To use Atlas in the Sandbox, start by opening your IVM which may be of any size, including the smallest one, as Atlas does all the data fetching using BigQuery.

In Sandbox, select Applications > Sandbox > Atlas.

Steps for building a cohort from FinnGen data in Atlas

1. Create ‘Concept Sets’

These can be for instance medical codes (ICD, SNOMED) or drug purchases (ATC, VNRfi, RxNorm).

  • Search for the concept (medical code, drug purchase etc.) using the ‘Search’ function: either as strings, as ICD codes, as SNOMED codes, etc.

  • Use the ‘Descendants’ tick box to include sub codes of a diagnosis/medication main code.

  • Use the ‘Exclude’ tick box to exclude specific sub codes from the concept set you are creating.

  • International standard codes, such as SNOMED codes, are displayed in blue color and local non-standard codes, such as ICD codes, are displayed in red color.

  • For standard codes, explore the ‘Hierarchy’ to see the ‘Parents’ and ‘Children’ (i.e. ‘Descendants’) of the code and to help you decide which code to select. For non-standard codes this is not available.

  • Columns RC, DRC, PC, DPC refer to record count, descendant record count, person count, and descendant person count, respectively. A single concept with hierarchy, e.g. an ICD-10 code A10, will have descendant records and persons for the sub codes A10.0, A10.1 and A10.5, for instance. Record and person counts of all these subcodes are included in DRC and DPC. It is possible to have 0 records/persons for the main code, but descendant records/persons for the subcodes. It is good practice to sort by RC to see in which codes there are records in FinnGen data.

2. Create 'Cohort Definitions'

Use the ‘Concept Sets’ in ‘Cohort Definitions’ to define the ‘Cohort Entry Events’, ‘Inclusion Criteria’ and ‘Cohort Exit’. You will need to build separate cohorts for cases and controls.

  • In the ‘Define’ tab give a name to your cohort and add definitions for the ‘Cohort Entry Events’, ‘Inclusion Criteria’ and ‘Cohort Exit’.

  • ‘Cohort Entry Events’ defines the starting point for the cohort

    • For case cohort: can be anything from the Atlas dropdown menu ‘Add initial event’, e.g. first diagnosis (‘Add Condition Occurrence’), drug purchase (‘Add Drug Exposure’), etc. The entry to the cohort should be clearly defined, avoiding entries such as ‘Any Visit Occurence’ without a specification.

    • For control cohort: usually ‘Any Visit Occurrence’ meaning the entry to any of the registers since they are a group of people with no conditions.

  • ‘Inclusion Criteria’ defines the inclusion to the cohort more specifically, e.g. by number of drug purchases, etc.

    • To create a cohort based on multiple concepts, e.g. conditions and drugs, in the ‘Inclusion Criteria’ box above all the criteria you have added, there is a dropdown menu to choose from whether the inclusion is based on all, any, at least or at most of the criteria.

  • ‘Cohort Exit’ defines when a person exits a cohort

    • Usually the default given by Atlas is sufficient

    • Modify this if you want to create a cohort where a person can enter more than once,e.g. with multiple fractures.

  • Creating a control cohort:

    • Copy the case cohort

    • Adjust the ‘Cohort Entry Events’ to ‘Any Visit Occurrence’ if appropriate

    • In the ‘Inclusion Criteria’, adjust any condition/drug purchase to exactly 0 occurrences or delete completely

    • Add any new inclusion criteria, e.g. the controls may need to be free of some other conditions.

  • Creating a cohort by exporting a JSON code e.g. from OHDSI PhenotypeLibrary:

    • Use the ‘Export’ tab and select the ‘JSON’ button

    • Paste the JSON code from Sandbox Clipboard – if needed, in small chunks

    • Click the ‘Reload’ button at the bottom of the screen. The cohort definitions should have appeared in the ‘Define’ tab

  • Final step: go to the ‘Generate’ tab and choose the FinnGen data release in which you would like to generate the cohort.

View the report for the number of individuals included in the cohort. This is an essential step because without successful cohort generation the cohort cannot be found and applied to further analyses.

  • Using existing FinnGen endpoints: use the Cohort Operations tool in Sandbox, where endpoint cohorts can be imported directly from the ‘Endpoint’ tab

3. Use 'Characterizations'

Inspect the cohorts by using the Atlas function ‘Characterizations’ and/or the separate Cohort Operations tool and/or other tools in Sandbox.

  • In the Atlas ‘Characterizations’, import the case and control cohorts using the ‘Design’ tab and next, choose the features that you want to characterize in each cohort, for example age and gender

  • In the ‘Executions’ tab, generate the report in your preferred FinnGen data release and view the report directly there

Summary of the Atlas terminology in terms of FinnGen data

Atlas terminology
Used terms in FinnGen data

Standard concept: standard (international)

SNOMED, LOINC, RxNorm

Standard concept: non-standard (local)

ICD8fi, ICD9fi, ICD10fi, ICD10, ICPC, NCSPfi, VNRfi

Standard concept: classification

ATC

Concept Set

A set of codes based on a diagnosis, drug purchase, drug reimbursement, etc. Each set is based either on standard or non-standard codes but not on both. E.g. a concept set 1 on disease X based on ICD codes or a concept set 2 on disease X based on SNOMED codes. The concept sets will be used in the ‘Cohort Definitions’ to define ‘Cohort Entry Events’, ‘Inclusion Criteria’ and ‘Cohort Exit’.

Concept Set: Descendants

Descendants are the sub codes of ICD or ATC codes, e.g. A10.1.

Concept Set: RC, DRC, PC, DPC

Record count (RC) and person count (PC) refer to the counts for main codes, e.g. for ICD-10 code A10, whereas descendant record count (DRC) and descendant person count (DPC) refer to the counts for the sub codes, e.g. ICD-10 code A10.1.

Last updated

Was this helpful?