Interpretation of Endpoint Definition file
Last updated
Was this helpful?
Last updated
Was this helpful?
This document explains how to attribute an endpoint to events in the detailed longitudinal data using the rules from the endpoint definition file (latest version at ). Have a look at the at the end of this document for some specificities that are easy to miss at first.
Each endpoint is defined by a set of rules, given as one line in the endpoint definition file. The detailed longitudinal file contains health events (rows in that file) that will be looked up against these rules. Each rule will add or remove events to the list of candidate events. Once all rules have been applied, the remaining candidate events are attributed to the endpoint.
Endpoint: occurrence of a health event defined by rules that match on the health register data.
Candidate events: list of events that could be attributed to the endpoint. This list grows and shrinks as the endpoint rules are applied.
Consider: add event to the list of candidate events.
Discard: remove event from the list of candidate events.
The endpoint definition file version 1.3 has the following metadata columns:
NAME
naming: Reference name in the FinnGen endpoint data
LONGNAME
naming: Descriptive name
Latin
naming: Latin name
TAGS
categorisation: List of categories the endpoint belongs to
LEVEL
categorisation: Level in the ICD-10 hierarchy
OMIT
categorisation: Is a core GWAS? (NA: yes, 1 or 2: no)
PARENT
categorisation: Parent in the ICD-10 hierarchy
version
changelog: introduced in data freeze
Modification_date
changelog: date of last modification
Modified_by
changelog: author of last modification
Modification_reason
changelog: purpose of modification
Special
free text notes
The rules are defined by the following columns in the endpoint definition file: (Click on a value in "Column name" or "Extra rules", where available, to be directed to further details that follow the table)
Column name
Purpose
Extra rules
SEX
Filter at the FINNGENID level
–
–
–
INCLUDE
Use other endpoints to find events
–
–
–
PRE_CONDITIONS
Filter at the event level
–
–
–
CONDITIONS
Filter at the FINNGENID level
–
–
–
Inclusion lookup
ICD-10
PRIM_OUT
Inclusion lookup
NOMESCO
PRIM_OUT
Diagnosis selection hint
–
INPAT
, OUTPAT
–
Inclusion lookup
ATC
INPAT
, OUTPAT
Inclusion lookup
ICD-10
INPAT
, OUTPAT
Inclusion lookup
ICD-9
INPAT
, OUTPAT
Inclusion lookup
ICD-8
INPAT
, OUTPAT
Exclusion lookup
ICD-10
INPAT
, OUTPAT
Exclusion lookup
ICD-9
INPAT
, OUTPAT
Exclusion lookup
ICD-8
INPAT
, OUTPAT
Diagnosis selection hint
–
DEATH
–
Inclusion lookup
ICD-10
DEATH
Inclusion lookup
ICD-9
DEATH
Inclusion lookup
ICD-8
DEATH
Exclusion lookup
ICD-10
DEATH
Exclusion lookup
ICD-9
DEATH
Exclusion lookup
ICD-8
DEATH
Inclusion lookup
NOMESCO
OPER_IN
, OPER_OUT
Inclusion lookup
Finnish hospital league
OPER_IN
, OPER_OUT
Inclusion lookup
Demanding heart patient, old codes
OPER_IN
, OPER_OUT
Inclusion lookup
Demanding heart patient, new codes
OPER_IN
, OPER_OUT
Inclusion lookup
KELA reimbursement code
REIMB
Inclusion lookup
ICD-10, ICD-9
REIMB
Additional requirement hint
–
PURCH
–
Inclusion lookup
ATC
PURCH
Additional requirement hint
–
PURCH
–
Inclusion lookup
VNRO
PURCH
–
Inclusion lookup
ICD-O-3 topography
CANC
Exclusion lookup
ICD-O-3 topography
CANC
Inclusion lookup
ICD-O-3 morphology
CANC
Exclusion lookup
ICD-O-3 morphology
CANC
Inclusion lookup
ICD-O-3 behavior
CANC
Consider events where:
SOURCE
: is PRIM_OUT
and CATEGORY
: contains ICD
and CODE1
: matches the OUTPAT_ICD
regex
Consider events where:
SOURCE
: is PRIM_OUT
and CATEGORY
: starts with OP
and the OUTPAT_OPER
regex matches CODE1
Values
YES
: only look at events with CATEGORY
: 0
for the rules of HD_ICD_10
, HD_ICD_9
, HD_ICD_8
, HD_ICD_10_EXCL
, HD_ICD_9_EXCL
and HD_ICD_8_EXCL
NA
: (nothing to filter)
This rule states to look only into the main diagnosis for hospital discharge events (as opposed to side diagnoses, where CATEGORY
is not 0
).
Consider events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_10_ATC
regex matches CODE3
This rule must be applied by looking for events that match both this rule and the HD_ICD_10
rule at the same time.
For example, an endpoint definition with HD_ICD_10
= E610
and HD_ICD_10_ATC
= ANY
will match an event that has:
SOURCE
: INPAT
or OUTPAT
and ICDVER
: 10
and HD_ICD_10
regex matches CODE1
or CODE2
and any code in CODE3
(but there must be a code there, it cannot be empty)
Consider events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_10
regex matches CODE1
or CODE2
and ICDVER
: is 10
Consider events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_9
regex matches CODE1
or CODE2
and ICDVER
: is 9
Consider events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_8
regex matches CODE1
or CODE2
and ICDVER
: is 8
Discard events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_10_EXCL
regex matches CODE1
or CODE2
and ICDVER
: is 10
Discard events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_9_EXCL
regex matches CODE1
or CODE2
and ICDVER
: is 9
Discard events where:
SOURCE
: is INPAT
or OUTPAT
and the HD_ICD_8_EXCL
regex matches CODE1
or CODE2
and ICDVER
: is 8
Values
YES
: only look at events with CATEGORY
: U
or I
for the rules of COD_ICD_10
, COD_ICD_9
, COD_ICD_8
, COD_ICD_10_EXCL
, COD_ICD_9_EXCL
, and COD_ICD_8_EXCL
NA
: (nothing to filter)
This rule states to look only into the main diagnosis for cause of death events (CATEGORY
: U
for underlying and I
for immediate cause of death, as opposed to contributing cause of death CATEGORY
: starts with c
).
Consider events where:
SOURCE
: is DEATH
and the COD_ICD_10
regex matches CODE1
or CODE2
and the ICDVER
: is 10
Consider events where:
SOURCE
: is DEATH
and the COD_ICD_9
regex matches CODE1
or CODE2
and the ICDVER
: is 9
Consider events where:
SOURCE
: is DEATH
and the COD_ICD_8
regex matches CODE1
or CODE2
and the ICDVER
: is 8
Discard events where:
SOURCE
: is DEATH
and the COD_ICD_10_EXCL
regex matches CODE1
or CODE2
and ICDVER
: is 10
Discard events where:
SOURCE
: is DEATH
and the COD_ICD_9_EXCL
regex matches CODE1
or CODE2
and ICDVER
: is 9
Discard events where:
SOURCE
: is DEATH
and the COD_ICD_8_EXCL
regex matches CODE1
or CODE2
and ICDVER
: is 8
Consider events where:
SOURCE
: is OPER_IN
or OPER_OUT
and the OPER_NOM
regex matches CODE1
and CATEGORY
: contains NOM
Consider events where:
SOURCE
: is OPER_IN
or OPER_OUT
and the OPER_HL
regex matches CODE1
and CATEGORY
: contains FHL
Consider events where:
SOURCE
: is OPER_IN
or OPER_OUT
and the OPER_HP1
regex matches CODE1
and CATEGORY
: contains HPO
Consider events where:
SOURCE
: is OPER_IN
or OPER_OUT
and the OPER_HP1
regex matches CODE1
and CATEGORY
: contains HPN
Consider events where:
SOURCE
: is REIMB
and KELA_REIMB
regex matches CODE1
Consider events where:
SOURCE
: is REIMB
and KELA_REIMB_ICD
regex matches CODE2
This rule must be applied by looking for events that match both this rule and the KELA_REIMB
rule at the same time.
Values
NA
: 3 events or more of the KELA_ATC
rule are needed to attribute the endpoint
SINGLE_OK
: 1 event or more of KELA_ATC
rule are needed to attribute the endpoint
YES
: the KELA_ATC
rule is not sufficient by itself, another rule must be matching to attribute the endpoint
This rule sets additional requirements on the KELA_ATC
rule.
Consider events where:
SOURCE
: is PURCH
and KELA_ATC
regex matches CODE1
This rule is not used.
This rule is not used.
Consider events where:
SOURCE
: is CANC
and the CANC_TOPO
regex matches CODE1
Discard events where:
SOURCE
: is CANC
and the CANC_TOPO_EXCL
regex matches CODE1
Consider events where:
SOURCE
: is CANC
and the CANC_MORPH
regex matches CODE2
Discard events where:
SOURCE
: is CANC
and the CANC_MORPH_EXCL
regex matches CODE2
Consider events where:
SOURCE
: is CANC
and the CANC_TOPO
regex matches CODE3
Value
other endpoint names, separated by |
Attribute the current endpoint to an individual if it has at least one of the endpoints in INCLUDE
.
Value
condition on EVENT_AGE
or EVENT_YEAR
EMERG
: (unused, nothing to do)
NA
: (nothing to do)
Discard events not matching PRE_CONDITIONS
from the list of candidate events.
This rule usually applies a filter on age or year at the event. It filters out some events from the existing list of candidate events.
An individual must fit the CONDITIONS
rule to be attributed the endpoint.
Values
1
: only keep males
2
: only keep females
NA
: (nothing to filter, the endpoint is not sex-specific)
This filter should be applied as the last filter.
When the rule is written as ANY
, then the event must have a code for the given rule, but the actual code has no importance.
This rule is useful when matching an event against multiple rules, for example:
HD_ICD_10
: K250
and HD_ICD_10_ATC
: ANY
This example requires that an event has any ATC code and at the same time has the ICD-10 code K250
. The endpoint will match drug-induced events since it requires there is an ATC code, but the actual ATC code doesn't matter.
The rule must match starting from the beginning of its value, in regex terms it means the rule value has to be prepended with a ^
. This modified rule is then used as a regex.
For example, a match-prefix rule with a value of I21
matches I2100
but doesn't match AEI21
.
An ampersand &
between two codes indicates a cause-symptom pair (specific to Finnish ICD-10). In that case, both the cause code and the symptom code must be found in the same event.
For example, HD_ICD_10
= M07&L405
will match an event that has both M07
(in CODE1
or CODE2
) and L405
(in CODE1
or CODE2
).
A rule value starting with a percent sign %
indicates a mode rule. The event will be considered only if the code is the most common amongst its sibling ICD codes for an individual.
For example %J450
would match events of an individual only if J450
is the most common code among the codes starting with J45
.
When an endpoint has multiple cancer rules (from CANC_TOPO
, CANC_TOPO_EXCL
, CANC_MORPH
, CANC_MORPH_EXCL
, CANC_BEHAV
) then it is not enough to match only one of them: all cancer rules that are defined must be satisfied by the event.
The mark $!$
is used to state that someone has checked and there is no suitable code for this endpoint in a given registry.
For example, if an endpoint has HD_ICD_9
with a value of $!$
then it means someone has gone through the whole Finnish ICD-9 and reported that there is no code that can be from that.
One single event can span multiple rows in the detailed longitudinal data files: events are unique by (FINNGENID
, SOURCE
, INDEX
), but not by row. Rows with the same values for FINNGENID
, SOURCE
, INDEX
must be looked at as one single event when performing look-ups.
The ICD-10, ICD-9 and ICD-8 used by FinnGen are specific Finnish versions which differ slightly from the international ones. This means for example that the ICD-10 found in FinnGen data are a bit different from the WHO ICD-10 or the US ICD-10-CM.
In the FinnGen data, the ICD-O-3 is used for cancer codes.
The dot .
and the comma ,
are not present in the codes in the FinnGen files, e.g. J45.1
would be J451
in the endpoint definition file and the detailed longitudinal file.
For rules that are regexes: a dot .
means "any character" and not an actual dot.
Endpoints with specific control rules are not documented here (yet!)
Name in FinnGen data (SOURCE
)
Registry description
CANC
Cancer
DEATH
Cause of death
INPAT
HILMO inpatient
OPER_IN
HILMO inpatient (operations)
OUTPAT
HILMO specialist outpatient
OPER_OUT
HILMO specialist outpatient (operations)
PRIM_OUT
AvoHILMO: primary care outpatient
PURCH
Kela drug purchase
REIMB
Kela drug reimbursement
Kela: the Social Insurance Institution of Finland
HILMO: Finnish care registers for health care
,
,
, , , ,
, , ,
, , ,
, ,
,
,
, ,
, ,
, ,
, ,
,
,
,
,
, , ,
, ,
, ,
, ,
, ,
, documentation from the FinnGen Handbook