BigQuery Python - Case Study - Comorbidity - Upset plot
In this we detail a scenario for how you can plot comorbidities of a FinnGen endpoint.
Location of the script
/finngen/library-green/scripts/code_snippets/codeSnippet_comorbidities_endpoint.py
You can copy paste from below explanation or take the code directly from the file itself.
For F5_ALZHDEMENT endpoint, comorbidities include type 2 diabetes (T2D), cardiovascular diseases (I9_CVD), depression (F5_DEPRESSIO), and gastrointestinal diseases (K11_GIDISEASES).
We can extract the patients from these endpoints and see the overlap of F5_ALZHDEMENT patients in comorbid endpoints.
NOTE: THIS WILL ONLY WORK IN ANACONDA ENVIRONMENT FOR NOW. SO USE THE DOCKER IMAGE. RUN THE BELOW COMMAND FIRST AND THEN RUN PYTHON.
docker run -v /home/ivm:/home/ivm -it eu.gcr.io/finngen-sandbox-v3-containers/anaconda_python/anaconda3:1.0 /bin/bash
The query is very simple as we have to extract patients from different endpoints. The details can be seen below
#
import pandas_gbq
import os, sys
# Run the following commands
os.system('export QT_PLUGIN_PATH=/home/ivm/anaconda3/plugins')
os.system('export FONTCONFIG_PATH=/home/ivm/anaconda3/etc/fonts')
from google.cloud import bigquery
import pandas as pd
from upsetplot import from_contents
from upsetplot import UpSet
# Connect to client
client = bigquery.Client()
# Queries
query_alzhdem = """ SELECT FINNGENID
FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
WHERE ENDPOINT = 'F5_ALZHDEMENT'
"""
query_type2d = """ SELECT FINNGENID
FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
WHERE ENDPOINT = 'T2D'
"""
query_cvd = """ SELECT FINNGENID
FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
WHERE ENDPOINT = 'I9_CVD'
"""
query_depress = """ SELECT FINNGENID
FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
WHERE ENDPOINT = 'F5_DEPRESSIO'
"""
query_gids = """ SELECT FINNGENID
FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
WHERE ENDPOINT = 'K11_GIDISEASES'
"""
# Job configuration
job_config = bigquery.QueryJobConfig()
# Run the queries
query_result_alzhdem = client.query(query_alzhdem,job_config=job_config)
query_result_type2d = client.query(query_type2d,job_config=job_config)
query_result_cvd = client.query(query_cvd,job_config=job_config)
query_result_depress = client.query(query_depress,job_config=job_config)
query_result_gids = client.query(query_gids,job_config=job_config)
Save the results to different dataframes
query_result_dataframe_alzhdem = query_result_alzhdem.to_dataframe()
query_result_dataframe_type2d = query_result_type2d.to_dataframe()
query_result_dataframe_cvd = query_result_cvd.to_dataframe()
query_result_dataframe_depress = query_result_depress.to_dataframe()
query_result_dataframe_gids = query_result_gids.to_dataframe()
Combine endpoint dataframes to get overlap of FINNGENIDs
comorbidEndpoints = from_contents({'AlzheimerDementia':query_result_dataframe_alzhdem['FINNGENID'].to_list(),
'Type2Diabetes': query_result_dataframe_type2d['FINNGENID'].to_list(),
'CardioVascualrDiseases': query_result_dataframe_cvd['FINNGENID'].to_list(),
'Depression': query_result_dataframe_depress['FINNGENID'].to_list(),
'GastroIntestinalDisease': query_result_dataframe_gids['FINNGENID'].to_list()})
UpSet plot of the FINNGENIDs overlap among the endpoints
# You can play with more parameters of UpSet plot
plt = UpSet(comorbidEndpoints, subset_size = 'count', show_counts = True).plot()
# Save the plot
from matplotlib import pyplot as plt
plt.savefig('/home/ivm/AlzheimerDementia_Comorbidites_UpsetPlot.png')
Last updated
Was this helpful?