Smoking Data

Smoking data harmonized from FinnGen registers (February 2026).

The file contains smoking measurements from FinnGen registers in longitudinal format, with one row per measurement event per person. It also includes an aggregated person-level ever-smoker/never-smoker variable. Persons with no smoking information in any register have a single row with FINNGENID only and empty values in all other columns.

File structure

Data

File
Description

Smoking_harmonized_longitudinal_v1.tsv.gz

Smoking data harmonized from FinnGen registers

Misc

Data description

Number of persons: 519972 Number of persons with at least one smoking measurement: 402760 Number of persons without any smoking information: 117212 Number of measurement events: 2877207

Data fields

Field
Description

FINNGENID

FINNGENID

APPROX_EVENT_DAY

Date of the smoking measurement

SMOKE

Smoking status at the measurement

SOURCE1

The source of the measurement

SOURCE2

Further detail within SOURCE1, e.g. specific biobank or data source

EVER_SMOKER

Person-level aggregate: YES if any measurement indicates smoking (CURRENT, EVER, FORMER), NO if all measurements indicate non-smoking (NEVER, NO, PASSIVE). Same value across all rows per person.

Data sources

SOURCE1 value
Register source
Code and smoking classification

DLCO

DLCO data from biobanks

Biobank specific

EXTENDED_HILMO

Hilmo and AvoHilmo structural smoking status

See https://koodistopalvelu.kanta.fi/codeserver/pages/classification-view-page.xhtml?classificationKey=220&versionKey=295

KANTA_DELIVERY

Kanta medication delivery

Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04

KANTA_LAB

Kanta lab measurements

Lab measurement u-cot > 200 ng/ml classified as smoker following HUS reference values

KANTA_PRESCRIPTION

Kanta medicine prescriptions

Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04

MINIMUM_EXTENDED

Baseline data at biobank intake

Biobank specific

SERVICE_SECTOR_DETAILED_LONGITUDINAL_ICD

Hilmo inpatient and outpatient and AvoHilmo cause of visit

ICD-10: F17, F17.1, F17.2, F17.20, F17.29, F17.3, F17.8, F17.9, Z72.0, Z71.6, T65.2, K13.21, K03.61; ICD 8/9: 98990, 3051A

SERVICE_SECTOR_DETAILED_LONGITUDINAL_ICPC

AvoHilmo cause of visit

ICPC2 code P17 (tobacco abuse)

SERVICE_SECTOR_DETAILED_LONGITUDINAL_MOP

AvoHilmo dental health care measures

Smoking status yes/no

SERVICE_SECTOR_DETAILED_LONGITUDINAL_NOM

Surgical procedures from Hilmo inpatient and outpatient registers

Nomesco code IHA22 (smoking intervention)

SERVICE_SECTOR_DETAILED_LONGITUDINAL_PURCH

Kela drug purchase register

Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04

SPIROMETRY

Spirometry data from biobanks. Limited to biobanks that recorded smoking status.

Biobank specific

Smoking variable values

Available values of the SMOKE variable depend on the source register, as the registers have information on different levels of accuracy. The smoking status has been harmonized to the following possible values:

SMOKE value
Description

CURRENT

Current smoker

EVER

Has smoked at some time (not known if current or former)

FORMER

Former smoker

NEVER

Never smoked

NO

Not a smoker currently

PASSIVE

Passive smoker

Data notes

The smoking status combined from many registers may contain inconsistencies across the measurements of the individual. 5521 individuals transition from ever-smoker to never-smoker, which should not be possible. 2108 individuals have smoking measurements on the same day with different smoking statuses, either from the same register or from different registers.

Many of the registers only record current smokers, with non-smokers left unrecorded, biasing the data towards smokers.

Duplicate measurements have been removed.

Last updated

Was this helpful?