Register data pre-processing
Last updated
Was this helpful?
Last updated
Was this helpful?
FinnGen register team receives raw register data from the , and performs pre-processing for the data, before creating phenotype files and releasing data files to the Sandbox.
Raw register data includes PICs (personal identification number) for each individual. The register team has created FINNGENIDs for each PIC, and these FINNGENIDs are used for both genotype and phenotype data.
Replace PIC with FINNGENID
Create EVENT AGE using birth date from the PIC and event date (eg. arrival date to the hospital, or date when the drug was purchased)
Create SEX using PIC (if the 10th letter of the PIC is even the individual is female)
Harmonize variables from the different years of the registry (variable names have been changing during the years)
Combine different register data years to the same data file
Convert date variables to yyyy-mm-dd format
Create based on the year of the diagnosis (ICD8: 1967-1986; ICD9: 1987-1995; ICD10: since 1996; ICD-O-3: cancer registry)
Separate inpatient and outpatient data based on PALA (service type) variable (HILMO)
Create other register-specific variables; eg, PARITY, NRO CHILD, NRO FETUSES in ; or kidney variables in .
*Create variable (hospital departure date deducted from hospital arrival date; HILMO)
*Create by blurring/masking the exact event date (see the link in this line for more information about this process)
*Remove denials (individuals who have asked to have their data removed from FinnGen)
*done later in data processing