There are two variations of this example. The first example gets the data for a list of FINNGENIDs and adds English descriptions of codes to data by using FinnGenUtilsR package, and the second example reads the list of FINNGENIDs from a text file having a column titled FINNGENID (as one of the columns).
Get all the data for a list of FINNGENID's
In this version, the full data is fetched for a given list of FINNGENID's (which may be a single id as well).
# helper function
# returns a data frame with all the data for given FINNGENID's
get_all_data <- function(...){
finngenids <- list(...)
sql<-paste0(
"SELECT * ",
"FROM finngen-production-library.",
".sandbox_tools_r10.",
"finngen_r10_service_sector_detailed_longitudinal_v2 ",
"WHERE FINNGENID IN (", paste0("'", finngenids, "'", collapse = ","), ")"
)
tb <- bq_project_query(projectid, sql)
}
# give a list of FINNGENIDs here (these are fake ID's)
tb <- get_all_data('FG00001', 'FG00002', 'FG00003')
# extend with code translation by using FinnGenUtilsR package
df <- FinnGenUtilsR::fg_bq_append_code_info_to_longitudinal_data(
projectid, tb,
"finngen-production-library.medical_codes.fg_codes_info_v1"
)
df_with_codes <- bq_table_download(df) %>%
arrange(EVENT_AGE) %>%
mutate(AGE = round(EVENT_AGE)) %>%
select(FINNGENID, AGE, SOURCE, name_en, CODE1, CODE2, CODE3)
# # for a small data set we can use an interactive table
# DT::datatable(df_with_codes)
# this works even with large data sets
df_with_codes
Get all the data for a file of FINNGENID's
The second variation reads the FINNGENID's from a file which is more convenient if you have many of them. FINNGENIDs are given as a tab-separated text file (.tsv) where there must be a column named "FINNGENID". This can be a file generated by other FinnGen tools, e.g. Cohort Operations.
# create an example data file with fake FINNGENID's
finngenids <- tribble(
~FINNGENID, ~OTHER_COLUMN, ~YET_ANOTHER_COLUMN,
'FG00002', 1, 2,
'FG00003', 3, 4,
'FG00004', 5, 6
)
write_tsv(finngenids, "FINNGENIDS.TSV")
get_all_data <- function(...){
finngenids <- read_tsv(filename, show_col_types = FALSE) %>% pull(FINNGENID)
sql<-paste0(
"SELECT * ",
"FROM finngen-production-library.",
".sandbox_tools_r10.",
"finngen_r10_service_sector_detailed_longitudinal_v2 ",
"WHERE FINNGENID IN (", paste0("'", finngenids, "'", collapse = ","), ")"
)
tb <- bq_project_query(projectid, sql)
}
# give FINNGENIDs here
tb <- get_all_data("FINNGENIDS.TSV")
df <- FinnGenUtilsR::fg_bq_append_code_info_to_longitudinal_data(
projectid, tb,
"finngen-production-library.medical_codes.fg_codes_info_v1"
)
df_with_codes <- bq_table_download(df) %>%
arrange(EVENT_AGE) %>%
mutate(AGE = round(EVENT_AGE)) %>%
select(FINNGENID, AGE, SOURCE, name_en, CODE1, CODE2, CODE3)
# # for a small data set we can use an interactive table
# DT::datatable(df_with_codes)
# this works with larger data sets
df_with_codes
Location of the script's in the Sandbox
Below is a path to the R Markdown file in Sandbox. This R Markdown file can be converted to an html page using "knit" command. The R Markdown is a plain text file, so you can copy code snippets that you need.