# From research question to concepts and cohort building

Before starting to build cohorts in Atlas, it is useful to familiarize yourself with the terminology used in Atlas, especially **‘Concept Sets’** and **‘Cohort Definitions’**.&#x20;

## What is a ‘Concept Set’?

A concept set is a group of concepts (for example glucocorticoids) from a single data *‘domain’* (drugs) that you want to examine in your cohort. When building a concept set you determine which items (for example specific drugs) to include/exclude from your analysis.

## What is a ‘Cohort Definition’?

In your cohort definition you can use one or more **‘Concept Sets’.** In the simplest definition you can now identify the participants who belong to the **‘Concept Set’**, e.g. use the drug that you defined in your **‘Concept Set’**.

## How to find the best keyword for your ‘Concept Set’?

* Select the broadest nominator for your concept and exclude the *‘descendants’* you wish to be excluded, e.g. here the nominator could be glucocorticoids and the *‘descendant’* a specific type of glucocorticoid
* Note: the hierarchy can be different in the Finnish nomenclature compared to that in OMOP
* Hierarchy - *‘parents*’ and *‘children’* (i.e. *‘descendants’*), e.g. the *‘parent’* of glucocorticosteroids is corticosteroids and the *‘children’* are different types of glucocorticosteroids, e.g. dexamethasone

#### OMOP terminology used in Atlas:

<table><thead><tr><th width="287">Terminology</th><th>What it includes</th></tr></thead><tbody><tr><td>Domain</td><td>Condition, Drug, Procedure, Visit, etc.</td></tr><tr><td>Concept</td><td>Classification, Non-standard, Standard</td></tr><tr><td>Class</td><td>Clinical Finding, Diagnosis, ATC 5th, etc.</td></tr><tr><td>Vocabulary</td><td>ICD10, ICD9, ICD10fi, ICD9fi, SNOMED, ATC, RxNorm, REIMB, etc.</td></tr><tr><td>Validity</td><td>Invalid, Valid</td></tr></tbody></table>

{% hint style="warning" %}
When you are making the **‘Concept Set’** in Atlas, notice that there are multiple domains, classes, nomenclatures, and vocabularies that are not relevant for the Finnish health data. Study only those in which you can find record counts, i.e. there are events in your data.
{% endhint %}

## Websites that can help you when planning the cohort building

* [Finnish health registers and medical coding](https://finngen.gitbook.io/finngen-handbook/finngen-data-specifics/finnish-health-registers-and-medical-coding/international-and-finnish-health-code-sets)
* [ICD-10 in Terveysportti](https://www.terveysportti.fi/apps/icd/) (in Finnish, requires login with your institutional credentials)
* [List of ICD-9 codes - Wikipedia](https://en.wikipedia.org/wiki/List_of_ICD-9_codes)
* [List of ICD-8 codes](https://www.su.se/polopoly_fs/1.617004.1655132182!/menu/standard/file/ICD%208%20codes.pdf)
* [ICPC-2 – Finnish](https://www.google.fi/url?sa=t\&rct=j\&q=\&esrc=s\&source=web\&cd=\&ved=2ahUKEwiiyIHa9qGKAxVIIxAIHYabDl8QFnoECBgQAQ\&url=https%3A%2F%2Fwww.kuntaliitto.fi%2Ffile%2F3512%2Fdownload%3Ftoken%3DUzRvCy4t\&usg=AOvVaw3DXJTc3vEJX7ypINbeBn7y\&opi=89978449) (ICPC codes in Finnish)
* [koodistopalvelu.kanta.fi/codeserver/pages/classification-view-page.xhtml](https://koodistopalvelu.kanta.fi/codeserver/pages/classification-view-page.xhtml) (SPAT codes in Finnish)
* [Toimenpideluokitus (NCSP)](https://www.terveysportti.fi/terveysportti/toimenpideluokitus.koti) (in Finnish, requires login with your institutional credentials)
* [Laaketietokanta](https://www.terveysportti.fi/apps/laake/) (in Finnish)
* [Diagnostiikkakeskus - Ammattilaisen sivusto](https://diagnostiikka.hus.fi/) (in Finnish)
* [ATC-luokitus - Fimea.fi - Fimea](https://fimea.fi/laakehaut_ja_luettelot/atc-luokitus) (in Finnish)
* [ATHENA tool](https://athena.ohdsi.org/search-terms/start) for exploring data mappings&#x20;
* [OHDSI PhenotypeLibrary](https://data.ohdsi.org/PhenotypeLibrary/) for readily made concept sets and cohort definition by the OHDSI community
* [Risteys](https://risteys.finregistry.fi/) for exploring existing cohort definitions in FinnGen data
