# Register data pre-processing

FinnGen register team receives raw register data from the [registries](/finngen-data-specifics/finnish-health-registers-and-medical-coding/finnish-health-registers.md), and performs pre-processing for the data, before creating phenotype files and releasing data files to the Sandbox.

Raw register data includes PICs (personal identification number) for each individual. The register team has created FINNGENIDs for each PIC, and these FINNGENIDs are used for both genotype and phenotype data.

### **Pre-processing actions of the register data**

* Replace PIC with FINNGENID
* Create EVENT AGE using birth date from the PIC and event date (eg. arrival date to the hospital, or date when the drug was purchased)
* Create SEX using PIC (if the 10th letter of the PIC is even the individual is female)
* Harmonize variables from the different years of the registry (variable names have been changing during the years)
* Combine different register data years to the same data file
* Convert date variables to yyyy-mm-dd format
* Create [ICDVER](/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/detailed-longitudinal-data.md) based on the year of the diagnosis (ICD8: 1967-1986; ICD9: 1987-1995; ICD10: since 1996; ICD-O-3: cancer registry)
* Separate inpatient and outpatient data based on PALA (service type) variable (HILMO)
* Create other register-specific variables; eg, PARITY, NRO CHILD, NRO FETUSES in [reproductive history register](/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/other-registers.md); or kidney variables in [kidney register](/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/other-registers.md).
* \*Create [HOSPDAYS ](/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/detailed-longitudinal-data.md)variable (hospital departure date deducted from hospital arrival date; HILMO)
* \*Create [APPROX EVENT DATE](/finngen-data-specifics/finnish-health-registers-and-medical-coding/data-masking-blurring-of-visit-dates.md) by blurring/masking the exact event date (see the link in this line for more information about this process)
* \*Remove denials (individuals who have asked to have their data removed from FinnGen)

\*done later in data processing


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/finngen-data-specifics/finnish-health-registers-and-medical-coding/register-data-pre-processing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
