# Smoking Data

Smoking data harmonized from FinnGen registers (March 2026).

The file contains smoking measurements from FinnGen registers in longitudinal format, with one row per measurement event per person. It also includes an aggregated person-level ever-smoker/never-smoker variable. Persons with no smoking information in any register have a single row with FINNGENID only and empty values in all other columns.

### File structure

#### Data

| File                                         | Description                                    |
| -------------------------------------------- | ---------------------------------------------- |
| smoking\_harmonized\_longitudinal\_v2.tsv.gz | Smoking data harmonized from FinnGen registers |

Path of the data: /finngen/library-red/finngen\_R14/harmonized\_data/smoking\_data/

### Misc

#### Data description

* Number of persons: 519972
* Number of persons with at least one smoking measurement: 402348
* Number of persons without any smoking information: 117624
* Number of measurement events: 2875465

#### Data fields

| Field              | Description                                                                                                                                                                                        |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| FINNGENID          | FINNGENID                                                                                                                                                                                          |
| APPROX\_EVENT\_DAY | Date of the smoking measurement                                                                                                                                                                    |
| SMOKE              | Smoking status at the measurement                                                                                                                                                                  |
| SMOKE\_AGE         | Age in years at the measurement                                                                                                                                                                    |
| SOURCE1            | The source of the measurement                                                                                                                                                                      |
| SOURCE2            | Further detail within SOURCE1, e.g. specific biobank or data source                                                                                                                                |
| EVER\_SMOKER       | Person-level aggregate: YES if any measurement indicates smoking (CURRENT, EVER, FORMER), NO if all measurements indicate non-smoking (NEVER, NO, PASSIVE). Same value across all rows per person. |

#### Data sources

| SOURCE1 value                                  | Register source                                                                  | Code and smoking classification                                                                                                     |
| ---------------------------------------------- | -------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| DLCO                                           | DLCO data from biobanks                                                          | Biobank specific                                                                                                                    |
| EXTENDED\_HILMO                                | Hilmo and AvoHilmo structural smoking status                                     | See [here.](https://koodistopalvelu.kanta.fi/codeserver/pages/classification-view-page.xhtml?classificationKey=220\&versionKey=295) |
| KANTA\_DELIVERY                                | Kanta medication delivery                                                        | Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04                                                          |
| KANTA\_LAB                                     | Kanta lab measurements                                                           | Lab measurement u-cot > 200 ng/ml classified as smoker following HUS reference values                                               |
| KANTA\_PRESCRIPTION                            | Kanta medicine prescriptions                                                     | Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04                                                          |
| MINIMUM\_EXTENDED                              | Baseline data at biobank intake                                                  | Biobank specific                                                                                                                    |
| SERVICE\_SECTOR\_DETAILED\_LONGITUDINAL\_ICD   | Hilmo inpatient and outpatient and AvoHilmo cause of visit                       | ICD-10: F17, F17.1, F17.2, F17.20, F17.29, F17.3, F17.8, F17.9, Z72.0, Z71.6, T65.2, K13.21, K03.61; ICD 8/9: 98990, 3051A          |
| SERVICE\_SECTOR\_DETAILED\_LONGITUDINAL\_ICPC  | AvoHilmo cause of visit                                                          | ICPC2 code P17 (tobacco abuse)                                                                                                      |
| SERVICE\_SECTOR\_DETAILED\_LONGITUDINAL\_MOP   | AvoHilmo dental health care measures                                             | Smoking status yes/no                                                                                                               |
| SERVICE\_SECTOR\_DETAILED\_LONGITUDINAL\_NOM   | Surgical procedures from Hilmo inpatient and outpatient registers                | Nomesco code IHA22 (smoking intervention)                                                                                           |
| SERVICE\_SECTOR\_DETAILED\_LONGITUDINAL\_PURCH | Kela drug purchase register                                                      | Nicotine patches and gum/smoking cessation meds: N07BA01, N07BA03, N07BA04                                                          |
| SPIROMETRY                                     | Spirometry data from biobanks. Limited to biobanks that recorded smoking status. | Biobank specific                                                                                                                    |

#### Smoking variable values

Available values of the SMOKE variable depend on the source register, as the registers have information on different levels of accuracy. The smoking status has been harmonized to the following possible values:

| SMOKE value | Description                                              |
| ----------- | -------------------------------------------------------- |
| CURRENT     | Current smoker                                           |
| EVER        | Has smoked at some time (not known if current or former) |
| FORMER      | Former smoker                                            |
| NEVER       | Never smoked                                             |
| NO          | Not a smoker currently                                   |
| PASSIVE     | Passive smoker                                           |

#### Data notes

Smoking measurements recorded at age 12 or younger were removed, as they're unlikely to reflect established smoking behavior.

The smoking status combined from many registers may contain inconsistencies across the measurements of the individual. 5500 individuals transition from ever-smoker to never-smoker, which should not be possible. 2107 individuals have smoking measurements on the same day with different smoking statuses, either from the same register or from different registers.

Many of the registers only record current smokers, with non-smokers left unrecorded, biasing the data towards smokers.

Duplicate measurements have been removed.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/harmonized-data/smoking-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
