# Detailed longitudinal data

This page has been last updated for R13.

{% hint style="info" %}
[Service sector data](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/service-sector-data) contains detailed longitudinal data with additional columns.
{% endhint %}

### Sandbox directory

Detailed longitudinal data is available in the following Sandbox directory:

`/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/`

**Note, this data also available in Atlas and OMOP common data model.**&#x20;

### Data files

The data is available in the following file:

`data/finngen_R{RELEASE]_detailed_longitudinal_1.0.txt.gz`

The data file has eleven columns:

<table data-header-hidden><thead><tr><th width="284.0234541577825"></th><th></th></tr></thead><tbody><tr><td><strong>Column</strong></td><td><strong>Description</strong></td></tr><tr><td>FINNGEN ID</td><td>Sample ID</td></tr><tr><td>SOURCE</td><td>Register source</td></tr><tr><td>EVENT_AGE</td><td>Individual's age at the event to two decimals</td></tr><tr><td>APPROX_EVENT_DAY</td><td>A randomized event date: +/- 1-15 days are added to the <a href="../../finnish-health-registers-and-medical-coding/data-masking-blurring-of-visit-dates">confidential</a> exact event date</td></tr><tr><td>CODE1 - CODE4</td><td>Register source specific codes and other information</td></tr><tr><td>ICDVER</td><td>ICD-code version: ICD8/9/10, ICD-O-3</td></tr><tr><td>CATEGORY</td><td>Register code sets (vocabularies)</td></tr><tr><td>INDEX</td><td>Register index number. The same INDEX value within a register means that the codes have been given in the same hospital visit, or are, for example, from the same drug purchase event.</td></tr></tbody></table>

Detailed information about the columns is available in the following file:

`finngen_R[RELEASE]_detailed_longitudinal_readme_1.0.txt`

This is the main register data in FinnGen and contains [health register codes](https://docs.finngen.fi/finngen-data-specifics/finnish-health-registers-and-medical-coding/international-and-finnish-health-code-sets) from different sources. The data is called longitudinal because it contains several events/entries of codes for the same individual recorded at different times.

{% hint style="info" %}
Detailed longitudinal data was presented in the [FinnGen data users meeting on 12th January 2021](https://www.finngen.fi/en/members/recordings/finngen-data-users-meeting-12-jan-2021) and can be explored using [Atlas](https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/atlas-data-sources) in the Sandbox.
{% endhint %}

The data is used as input for the Endpointter to determine which individuals are included in each of FinnGen's phenotypic endpoints:

![](https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2Fgit-blob-2f49d16197fd35f4ef0c0b8a5c0dc70571fb7cb3%2Fimage%20\(524\).png?alt=media)

A mock example of the detailed longitudinal data file is shown below:

<figure><img src="https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2F0YQ7ykdb7ToOwlj6NPCx%2Fdetailed_long.png?alt=media&#x26;token=beda6db2-a849-435d-be93-6fec4555b7b1" alt=""><figcaption></figcaption></figure>

### **Sources**

The detailed longitudinal data file contains codes from the following registries:

<table data-header-hidden><thead><tr><th width="142.33333333333331"></th><th width="302"></th><th></th></tr></thead><tbody><tr><td><strong>Source</strong></td><td><strong>Description</strong></td><td><strong>Types of codes</strong></td></tr><tr><td>PURCH</td><td>Kela drug purchase register</td><td>Medication codes</td></tr><tr><td>REIMB</td><td>Kela drug reimbursement register</td><td>Medication codes</td></tr><tr><td>INPAT</td><td>Inpatient Hilmo register</td><td>Diagnosis codes</td></tr><tr><td>OPER_IN</td><td>Inpatient Hilmo register - operations</td><td>Operation codes</td></tr><tr><td>OUTPAT</td><td>Specialist outpatient Hilmo register</td><td>Diagnosis codes</td></tr><tr><td>OPER_OUT</td><td>Specialist outpatient Hilmo - register operations</td><td>Operation codes</td></tr><tr><td>PRIM_OUT</td><td>Primary health care outpatient visits</td><td>Diagnosis and operation codes</td></tr><tr><td>CANC</td><td>Cancer register</td><td>Cancer codes</td></tr><tr><td>DEATH</td><td>Cause of death register</td><td>Cause of death codes</td></tr></tbody></table>

Image below demonstrates how variables in the national health registries end up to the columns in the detailed longitudinal data.&#x20;

<figure><img src="https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2F2ATSIfNhwnSoPG4wHHyn%2Fimage.png?alt=media&#x26;token=fac1fb14-8a6f-4d34-a146-4126b151ee9d" alt=""><figcaption><p>On the left: source registers, middle: variables in the source register data, on the right: column names in the detailed longitudinal data.</p></figcaption></figure>

### Codes

Information stored in the CODE1 - CODE4 column depends on the register source:

<table data-header-hidden><thead><tr><th width="144.33333333333331"></th><th width="306.64559585492236"></th><th></th></tr></thead><tbody><tr><td><strong>Source</strong></td><td><strong>Description</strong></td><td><strong>Codes</strong></td></tr><tr><td>PURCH</td><td>Kela drug purchase register</td><td><p>CODE1: ATC code</p><p>CODE2: Kela reimbursment code</p><p>CODE3: Product number</p><p>CODE4: Number of packages</p></td></tr><tr><td>REIMB</td><td>Kela drug reimbursement register</td><td><p>CODE1: Kela reimbursment code</p><p>CODE2: ICD code</p></td></tr><tr><td>INPAT OUTPAT</td><td><p>Inpatient Hilmo register</p><p>Specialist outpatient Hilmo register</p></td><td><p>CODE1: symptom code</p><p>CODE2: cause code (e.g. CODE1 could be <em>dementia associated with Alzheimer’s disease</em> with CODE2 as <em>Alzheimer's disease</em>)</p><p>CODE3: ATC code for drug's adverse effect</p><p>CODE4: duration of stay</p></td></tr><tr><td><p>OPER_IN</p><p>OPER_OUT</p></td><td><p>Inpatient Hilmo register - operations</p><p>Specialist outpatient Hilmo register - operations</p></td><td>CODE1: operation code</td></tr><tr><td>PRIM_OUT</td><td>Primary health care outpatient visits</td><td><p>CODE1: diagnosis or operation code</p><p>CODE2: symptom code</p><p>CODE3: ATC code for drug's adverse effect</p></td></tr><tr><td>CANC</td><td>Cancer register</td><td><p>CODE1: ICD-0-3 topography</p><p>CODE2: ICD-0-3 morphology</p><p>CODE3: ICD-0-3 behaviour</p></td></tr><tr><td>DEATH</td><td>Cause of death register</td><td>CODE1: cause of death</td></tr></tbody></table>

### **Categories**

The register code sets (vocabularies) are stored in the CATEGORY column:

<figure><img src="https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2Fgit-blob-cdfc91973ff178cf4225022029d0cb4a2bd2fda6%2Fcategory.png?alt=media" alt=""><figcaption></figcaption></figure>

Detailed information about register code sets is available from:

* [International and Finnish Health Code Sets](https://docs.finngen.fi/finngen-data-specifics/finnish-health-registers-and-medical-coding/international-and-finnish-health-code-sets)
* [More information on health code sets](https://docs.finngen.fi/finngen-data-specifics/finnish-health-registers-and-medical-coding/more-information-on-health-code-sets)

For diagnosis codes:

* 0 at the end of the CATEGORY variable means the main diagnosis code (e.g. 0 in ICD0 and NOM0)
* 1:N at the end of the CATEGORY variable refers to side diagnoses (e.g. 1:N in ICD1:N, NOM1:N)

### Register data availability dates

Data is available from the register start date until the end of the register-specific follow-up date. The follow-up dates are available [here](https://docs.finngen.fi/finngen-data-specifics/endpoints/complete-follow-up-time-of-the-finngen-registries-primary-endpoint-data) and in the following file:

`finngen_R[RELEASE]_detailed_longitudinal_readme_1.0.txt`

The start and follow-up dates differ between registries and FInnGen data releases and you should take this into account in your analyses. Register start and follow-up dates are shown below for Data Freeze 6:

<figure><img src="https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2Fgit-blob-7fa6539bc9bb835b29febf4c015fa740c2fcaaf9%2Fkuva%20(75).png?alt=media" alt=""><figcaption></figcaption></figure>

### Further information

* [Splitting combination codes in detailed longitudinal data](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/detailed-longitudinal-data/what-are-combination-codes-and-how-they-are-separated-in-detailed-longitudinal-data)
* [Registers in detailed longitudinal data](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/registers-in-the-detailed-longitudinal-data)
