Interpretation of Endpoint Definition file
This document explains how to attribute an endpoint to events in the detailed longitudinal data using the rules from the endpoint definition file (latest version at FinnGen: Clinical Endpoints). Have a look at the list of gotchas at the end of this document for some specificities that are easy to miss at first.
Each endpoint is defined by a set of rules, given as one line in the endpoint definition file. The detailed longitudinal file contains health events (rows in that file) that will be looked up against these rules. Each rule will add or remove events to the list of candidate events. Once all rules have been applied, the remaining candidate events are attributed to the endpoint.
When explaining the rules, the following terms are used:
Endpoint: occurrence of a health event defined by rules that match on the health register data.
Candidate events: list of events that could be attributed to the endpoint. This list grows and shrinks as the endpoint rules are applied.
Consider: add event to the list of candidate events.
Discard: remove event from the list of candidate events.
Overview of the Endpoint Definition File
The endpoint definition file version 1.3 has the following metadata columns:
NAME
naming: Reference name in the FinnGen endpoint data
LONGNAME
naming: Descriptive name
Latin
naming: Latin name
TAGS
categorisation: List of categories the endpoint belongs to
LEVEL
categorisation: Level in the ICD-10 hierarchy
OMIT
categorisation: Is a core GWAS? (NA: yes, 1 or 2: no)
PARENT
categorisation: Parent in the ICD-10 hierarchy
version
changelog: introduced in data freeze
Modification_date
changelog: date of last modification
Modified_by
changelog: author of last modification
Modification_reason
changelog: purpose of modification
Special
free text notes
The rules are defined by the following columns in the endpoint definition file: (Click on a value in "Column name" or "Extra rules", where available, to be directed to further details that follow the table)
SEX
Filter at the FINNGENID level
–
–
–
INCLUDE
Use other endpoints to find events
–
–
–
PRE_CONDITIONS
Filter at the event level
–
–
–
CONDITIONS
Filter at the FINNGENID level
–
–
–
Inclusion lookup
ICD-10
INPAT, OUTPAT
Event Rules
OUTPAT_ICD
Consider events where:
SOURCE: isPRIM_OUTand
CATEGORY: containsICDand
CODE1: matches theOUTPAT_ICDregex
OUTPAT_OPER
Consider events where:
SOURCE: isPRIM_OUTand
CATEGORY: starts withOPand the
OUTPAT_OPERregex matchesCODE1
HD_MAINONLY
Values
YES: only look at events withCATEGORY:0for the rules ofHD_ICD_10,HD_ICD_9,HD_ICD_8,HD_ICD_10_EXCL,HD_ICD_9_EXCLandHD_ICD_8_EXCLNA: (nothing to filter)
This rule states to look only into the main diagnosis for hospital discharge events (as opposed to side diagnoses, where CATEGORY is not 0).
HD_ICD_10_ATC
Consider events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_10_ATCregex matchesCODE3
This rule must be applied by looking for events that match both this rule and the HD_ICD_10 rule at the same time.
For example, an endpoint definition with HD_ICD_10 = E610 and HD_ICD_10_ATC = ANY will match an event that has:
SOURCE:INPATorOUTPATand
ICDVER: 10and
HD_ICD_10regex matchesCODE1orCODE2and any code in
CODE3(but there must be a code there, it cannot be empty)
HD_ICD_10
Consider events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_10regex matchesCODE1orCODE2and
ICDVER: is 10
HD_ICD_9
Consider events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_9regex matchesCODE1orCODE2and
ICDVER: is 9
HD_ICD_8
Consider events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_8regex matchesCODE1orCODE2and
ICDVER: is 8
HD_ICD_10_EXCL
Discard events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_10_EXCLregex matchesCODE1orCODE2and
ICDVER: is 10
HD_ICD_9_EXCL
Discard events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_9_EXCLregex matchesCODE1orCODE2and
ICDVER: is 9
HD_ICD_8_EXCL
Discard events where:
SOURCE: isINPATorOUTPATand the
HD_ICD_8_EXCLregex matchesCODE1orCODE2and
ICDVER: is 8
COD_MAINONLY
Values
YES: only look at events withCATEGORY:UorIfor the rules ofCOD_ICD_10,COD_ICD_9,COD_ICD_8,COD_ICD_10_EXCL,COD_ICD_9_EXCL, andCOD_ICD_8_EXCLNA: (nothing to filter)
This rule states to look only into the main diagnosis for cause of death events (CATEGORY: U for underlying and I for immediate cause of death, as opposed to contributing cause of death CATEGORY: starts with c).
COD_ICD_10
Consider events where:
SOURCE: isDEATHand the
COD_ICD_10regex matchesCODE1orCODE2and the
ICDVER: is 10
COD_ICD_9
Consider events where:
SOURCE: isDEATHand the
COD_ICD_9regex matchesCODE1orCODE2and the
ICDVER: is 9
COD_ICD_8
Consider events where:
SOURCE: isDEATHand the
COD_ICD_8regex matchesCODE1orCODE2and the
ICDVER: is 8
COD_ICD_10_EXCL
Discard events where:
SOURCE: isDEATHand the
COD_ICD_10_EXCLregex matchesCODE1orCODE2and
ICDVER: is 10
COD_ICD_9_EXCL
Discard events where:
SOURCE: isDEATHand the
COD_ICD_9_EXCLregex matchesCODE1orCODE2and
ICDVER: is 9
COD_ICD_8_EXCL
Discard events where:
SOURCE: isDEATHand the
COD_ICD_8_EXCLregex matchesCODE1orCODE2and
ICDVER: is 8
OPER_NOM
Consider events where:
SOURCE: isOPER_INorOPER_OUTand the
OPER_NOMregex matchesCODE1and
CATEGORY: containsNOM
OPER_HL
Consider events where:
SOURCE: isOPER_INorOPER_OUTand the
OPER_HLregex matchesCODE1and
CATEGORY: containsFHL
OPER_HP1
Consider events where:
SOURCE: isOPER_INorOPER_OUTand the
OPER_HP1regex matchesCODE1and
CATEGORY: containsHPO
OPER_HP2
Consider events where:
SOURCE: isOPER_INorOPER_OUTand the
OPER_HP1regex matchesCODE1and
CATEGORY: containsHPN
KELA_REIMB
Consider events where:
SOURCE: isREIMBand
KELA_REIMBregex matchesCODE1
KELA_REIMB_ICD
Consider events where:
SOURCE: isREIMBand
KELA_REIMB_ICDregex matchesCODE2
This rule must be applied by looking for events that match both this rule and the KELA_REIMB rule at the same time.
KELA_ATC_NEEDOTHER
Values
NA: 3 events or more of theKELA_ATCrule are needed to attribute the endpointSINGLE_OK: 1 event or more ofKELA_ATCrule are needed to attribute the endpointYES: theKELA_ATCrule is not sufficient by itself, another rule must be matching to attribute the endpoint
This rule sets additional requirements on the KELA_ATC rule.
KELA_ATC
Consider events where:
SOURCE: isPURCHand
KELA_ATCregex matchesCODE1
KELA_VNRO
This rule is not used.
KELA_VNRO_NEEDOTHER
This rule is not used.
CANC_TOPO
Consider events where:
SOURCE: isCANCand the
CANC_TOPOregex matchesCODE1
CANC_TOPO_EXCL
Discard events where:
SOURCE: isCANCand the
CANC_TOPO_EXCLregex matchesCODE1
CANC_MORPH
Consider events where:
SOURCE: isCANCand the
CANC_MORPHregex matchesCODE2
CANC_MORPH_EXCL
Discard events where:
SOURCE: isCANCand the
CANC_MORPH_EXCLregex matchesCODE2
CANC_BEHAV
Consider events where:
SOURCE: isCANCand the
CANC_TOPOregex matchesCODE3
INCLUDE
Value
other endpoint names, separated by
|
Attribute the current endpoint to an individual if it has at least one of the endpoints in INCLUDE.
PRE_CONDITIONS
Value
condition on
EVENT_AGEorEVENT_YEAREMERG: (unused, nothing to do)NA: (nothing to do)
Discard events not matching PRE_CONDITIONS from the list of candidate events.
This rule usually applies a filter on age or year at the event. It filters out some events from the existing list of candidate events.
CONDITIONS
An individual must fit the CONDITIONS rule to be attributed the endpoint.
SEX
Values
1: only keep males2: only keep femalesNA: (nothing to filter, the endpoint is not sex-specific)
This filter should be applied as the last filter.
Extra rules
any-code
When the rule is written as ANY, then the event must have a code for the given rule, but the actual code has no importance.
This rule is useful when matching an event against multiple rules, for example:
HD_ICD_10:K250and
HD_ICD_10_ATC:ANY
This example requires that an event has any ATC code and at the same time has the ICD-10 code K250. The endpoint will match drug-induced events since it requires there is an ATC code, but the actual ATC code doesn't matter.
match-prefix
The rule must match starting from the beginning of its value, in regex terms it means the rule value has to be prepended with a ^. This modified rule is then used as a regex.
For example, a match-prefix rule with a value of I21 matches I2100 but doesn't match AEI21.
cause-symptom
An ampersand & between two codes indicates a cause-symptom pair (specific to Finnish ICD-10). In that case, both the cause code and the symptom code must be found in the same event.
For example, HD_ICD_10 = M07&L405 will match an event that has both M07 (in CODE1 or CODE2) and L405 (in CODE1 or CODE2).
mode
A rule value starting with a percent sign % indicates a mode rule. The event will be considered only if the code is the most common amongst its sibling ICD codes for an individual.
For example %J450 would match events of an individual only if J450 is the most common code among the codes starting with J45.
canc-all
When an endpoint has multiple cancer rules (from CANC_TOPO, CANC_TOPO_EXCL, CANC_MORPH, CANC_MORPH_EXCL, CANC_BEHAV) then it is not enough to match only one of them: all cancer rules that are defined must be satisfied by the event.
mark-no-code
The mark $!$ is used to state that someone has checked and there is no suitable code for this endpoint in a given registry.
For example, if an endpoint has HD_ICD_9 with a value of $!$ then it means someone has gone through the whole Finnish ICD-9 and reported that there is no code that can be from that.
Gotchas
One single event can span multiple rows in the detailed longitudinal data files: events are unique by (
FINNGENID,SOURCE,INDEX), but not by row. Rows with the same values forFINNGENID,SOURCE,INDEXmust be looked at as one single event when performing look-ups.The ICD-10, ICD-9 and ICD-8 used by FinnGen are specific Finnish versions which differ slightly from the international ones. This means for example that the ICD-10 found in FinnGen data are a bit different from the WHO ICD-10 or the US ICD-10-CM.
In the FinnGen data, the ICD-O-3 is used for cancer codes.
The dot
.and the comma,are not present in the codes in the FinnGen files, e.g.J45.1would beJ451in the endpoint definition file and the detailed longitudinal file.For rules that are regexes: a dot
.means "any character" and not an actual dot.Endpoints with specific control rules are not documented here (yet!)
Appendix: list of registries
Name in FinnGen data (SOURCE)
Registry description
CANC
Cancer
DEATH
Cause of death
INPAT
HILMO inpatient
OPER_IN
HILMO inpatient (operations)
OUTPAT
HILMO specialist outpatient
OPER_OUT
HILMO specialist outpatient (operations)
PRIM_OUT
AvoHILMO: primary care outpatient
PURCH
Kela drug purchase
REIMB
Kela drug reimbursement
Appendix: coding systems and translations
Where to find the translation file for phenotype data, documentation from the FinnGen Handbook
Glossary
Kela: the Social Insurance Institution of Finland
HILMO: Finnish care registers for health care
Last updated
Was this helpful?