Smoking Data
Smoking data harmonized from FinnGen registers (February 2026).
The file contains smoking measurements from FinnGen registers in longitudinal format, with one row per measurement event per person. It also includes an aggregated person-level ever-smoker/never-smoker variable. Persons with no smoking information in any register have a single row with FINNGENID only and empty values in all other columns.
File structure
Data
Smoking_harmonized_longitudinal_v1.tsv.gz
Smoking data harmonized from FinnGen registers
Misc
Data description
Number of persons: 519972 Number of persons with at least one smoking measurement: 402760 Number of persons without any smoking information: 117212 Number of measurement events: 2877207
Data fields
FINNGENID
FINNGENID
APPROX_EVENT_DAY
Date of the smoking measurement
SMOKE
Smoking status at the measurement
SOURCE1
The source of the measurement
SOURCE2
Further detail within SOURCE1, e.g. specific biobank or data source
EVER_SMOKER
Person-level aggregate: YES if any measurement indicates smoking (CURRENT, EVER, FORMER), NO if all measurements indicate non-smoking (NEVER, NO, PASSIVE). Same value across all rows per person.
Data sources
DLCO
DLCO data from biobanks
Biobank specific
EXTENDED_HILMO
Hilmo and AvoHilmo structural smoking status
See https://koodistopalvelu.kanta.fi/codeserver/pages/classification-view-page.xhtml?classificationKey=220&versionKey=295
KANTA_DELIVERY
Kanta medication delivery
Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04
KANTA_LAB
Kanta lab measurements
Lab measurement u-cot > 200 ng/ml classified as smoker following HUS reference values
KANTA_PRESCRIPTION
Kanta medicine prescriptions
Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04
MINIMUM_EXTENDED
Baseline data at biobank intake
Biobank specific
SERVICE_SECTOR_DETAILED_LONGITUDINAL_ICD
Hilmo inpatient and outpatient and AvoHilmo cause of visit
ICD-10: F17, F17.1, F17.2, F17.20, F17.29, F17.3, F17.8, F17.9, Z72.0, Z71.6, T65.2, K13.21, K03.61; ICD 8/9: 98990, 3051A
SERVICE_SECTOR_DETAILED_LONGITUDINAL_ICPC
AvoHilmo cause of visit
ICPC2 code P17 (tobacco abuse)
SERVICE_SECTOR_DETAILED_LONGITUDINAL_MOP
AvoHilmo dental health care measures
Smoking status yes/no
SERVICE_SECTOR_DETAILED_LONGITUDINAL_NOM
Surgical procedures from Hilmo inpatient and outpatient registers
Nomesco code IHA22 (smoking intervention)
SERVICE_SECTOR_DETAILED_LONGITUDINAL_PURCH
Kela drug purchase register
Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04
SPIROMETRY
Spirometry data from biobanks. Limited to biobanks that recorded smoking status.
Biobank specific
Smoking variable values
Available values of the SMOKE variable depend on the source register, as the registers have information on different levels of accuracy. The smoking status has been harmonized to the following possible values:
CURRENT
Current smoker
EVER
Has smoked at some time (not known if current or former)
FORMER
Former smoker
NEVER
Never smoked
NO
Not a smoker currently
PASSIVE
Passive smoker
Data notes
The smoking status combined from many registers may contain inconsistencies across the measurements of the individual. 5521 individuals transition from ever-smoker to never-smoker, which should not be possible. 2108 individuals have smoking measurements on the same day with different smoking statuses, either from the same register or from different registers.
Many of the registers only record current smokers, with non-smokers left unrecorded, biasing the data towards smokers.
Duplicate measurements have been removed.
Last updated
Was this helpful?