How to check case counts from the data
Last updated
Was this helpful?
Last updated
Was this helpful?
There are several different ways to search for the number of individuals with a certain endpoint, or a certain FinnGen health register code. Below are a few examples of how to do such searches using FinnGen data.
If you would like to know the number of individuals with a certain endpoint, the easiest way to do this is to use the browser. Please take look at the topic for more instructions on how you can use Risteys for finding out the number of individuals with certain endpoint, and also for more in-depth endpoint statistics.
You can check the number of individuals with certain endpoint also by using R in Sandbox.
library(data.table); library(dplyr); library(Rutils)
end<-fread("finngen_R8_endpoint.txt.gz",
data.table=F)
end1<-end[,c("FINNGENID", "U22_COVID19_CONFIRMED")]
covid_conf<-filter(end1, U22_COVID19_CONFIRMED==1);dim(covid_conf)
length(unique(covid_conf$FINNGENID))
Remember that the location of the endpoint and control files, including, the definitions, endpoint short and long names, can be found from the section. section gives you more information on how to read endpoint definition file.
If you would like to know how many individuals in the FinnGen data have a certain in you can also use the tool in the Sandbox for this purpose.
Before you start any search, remember to check that you are searching for the correct codes. For example, are somewhat different than international codes. Take a look at the to find out where the translations of the Finnish register codes can be found. From these files you can search the condition you are interested in, and the codes related to that condition. (NB: selecting codes should always be done with help of medical professionals who understand how the codes are used and how the underlying register data affects their usage.)
Here we use as an example ICD10 code L20 for atopic dermatitis. We are interested in how many individuals have been diagnosed with L20 ICD10-code the FinnGen data.
First, launch Atlas by going to the Applications menu on the Sandbox and selecting Finngen>Atlas.
When you are sure about the codes you are interested in, you can do the search as below.
Here we are searching ICD10-code L20 from the inpatient Hilmo registry (INPAT) and specialist outpatient Hilmo registry (OUTPAT). We are searching all ICD10 codes that begin with L20 (^in front of the code says the code has to begin there)
, because all of them are related to atopic dermatitis, and we use both symptom (CODE1) and cause (CODE2) codes for the search.
library(data.tabe); library(dplyr); library(stringr)
foo<-fread("finngen_R8_detailed_longitudinal_data.txt",
data.table=F)
foo1<-filter(foo, SOURCE=="INPAT" | SOURCE=="OUTPAT")
foo2<-filter(foo1, ICDVER=="10" & str_detect(CODE1,"^L20") | ICDVER=="10" & str_detect(CODE2,"^L20"))
length(unique(foo2$FINNGENID))
library(data.tabe); library(dplyr); library(stringr)
foo<-fread("finngen_R8_vaccination_register.txt",
data.table=F)
foo2<-filter(foo, str_detect(DRUG,"^J07A"))
length(unique(foo2$FINNGENID))
Once Atlas opens, in the Search menu on the left you can type in L20 and it will show you all the codes with L20. There is also a column "RC" (for record count), which tells how many times that code has been seen (some individuals may have the count more than once, so this is different than a count of individuals). When typing "atopic" to the Search menu, you can see other "atopic" related codes, such as S87 which is ICPC2 code. (ICPC2 codes are assigned in primary care, you can read more about them at .)
Take a look at the topic for more detailed instructions. By default, all registers included in the detailed longitudinal data are used in the search. However, if only certain register(s) are of interest results can be filtered by register by following the instructions in the topic .
Currently, Atlas contains only the file. There are some other specialty registers such as kidney, vaccination, and the birth register that are not yet rolled into the detailed longitudinal data file. If you are interested in counts from these ; eg. you would like to know how many individuals have a certain health register code in the , or you would like to search codes from the detailed longitudinal data using R, take a look at the instructions below.
If you would like to know how many individuals have certain health register code(s) in , or in the , and you are comfortable using R, you can do also do the search in the Sandbox using R.
Let's use the same example as above; how many individuals have atopic dermatitis, determined by ICD10 code L20. For this example, let's search only from the inpatient Hilmo registry (INPAT) and specialist outpatient Hilmo registry (OUTPAT) (both of these registries are included in the ).
The first step is to double-check that you are searching for the correct codes (as above). you can find the location of the translation file for register codes.
We can search the same way, for example, for ATC-codes for the bacterial vaccines (codes beginning J07A), from the :
Finally, if you don't have Sandbox access, and you would to do a lookup like above, you can send a lookup request to the email , and we will take care of it!