Data Masking/Blurring of Visit Dates
Last updated
Was this helpful?
Last updated
Was this helpful?
All FinnGen individual-level data is pseudo-anonymised: Personal Identity codes (PICs) are replaced by FinnGen IDs, and only pseudo-anonymised individual-level data can be found in Sandbox.
In DF1-DF7 all register codes with less that five cases within detailed longitudinal data, and all endpoints with less than 5 cases in endpoint and longitudinal endpoint data have been removed from the data.
DF8v3 onwards all register codes in detailed longitudinal data and all endpoints in endpoint and endpoint longitudinal data, also those with less than 5 cases, are included in the data released to the Sandbox.
In order to protect individual-level data, exact event days cannot be released with phenotype data. Exact event dates are randomized to an approximated event day (APPROX_EVENT_DAY
) by adding +/- 1-15 days (offset) to the exact event day.
The number added to the exact event day is consistent within individual (individual-specific), meaning that the same number (offset) is added to all events of the individual.
Until DF10, offset is not consistent across registers. The APPROX_DAY
is usually calculated separately in each register (eg. vs. ). However in the the same individual-specific offset is used for particular individual in all registers included in the data.
From DF11 forward offset is consistent across registers. Same offset per person (consistent for all event of 1 person) is used for all FinnGen register files.