BigQuery Python - Case Study - Comorbidity - Upset plot

In this we detail a scenario for how you can plot comorbidities of a FinnGen endpoint.

Location of the script

/finngen/library-green/scripts/code_snippets/codeSnippet_comorbidities_endpoint.py

You can copy paste from below explanation or take the code directly from the file itself.

For F5_ALZHDEMENT endpoint, comorbidities include type 2 diabetes (T2D), cardiovascular diseases (I9_CVD), depression (F5_DEPRESSIO), and gastrointestinal diseases (K11_GIDISEASES).

We can extract the patients from these endpoints and see the overlap of F5_ALZHDEMENT patients in comorbid endpoints.

NOTE: THIS WILL ONLY WORK IN ANACONDA ENVIRONMENT FOR NOW. SO USE THE DOCKER IMAGE. RUN THE BELOW COMMAND FIRST AND THEN RUN PYTHON.

docker run -v /home/ivm:/home/ivm -it eu.gcr.io/finngen-sandbox-v3-containers/anaconda_python/anaconda3:1.0 /bin/bash

The query is very simple as we have to extract patients from different endpoints. The details can be seen below

# 
import pandas_gbq 
import os, sys
# Run the following commands
os.system('export QT_PLUGIN_PATH=/home/ivm/anaconda3/plugins')
os.system('export FONTCONFIG_PATH=/home/ivm/anaconda3/etc/fonts')
from google.cloud import bigquery
import pandas as pd
from upsetplot import from_contents
from upsetplot import UpSet

# Connect to client
client = bigquery.Client()

# Queries
query_alzhdem = """ SELECT FINNGENID
                    FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                    WHERE ENDPOINT = 'F5_ALZHDEMENT'
                """
query_type2d  = """ SELECT FINNGENID
                    FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                    WHERE ENDPOINT = 'T2D'
                """
query_cvd     = """ SELECT FINNGENID
                    FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                    WHERE ENDPOINT = 'I9_CVD'
                """
query_depress = """ SELECT FINNGENID
                    FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                    WHERE ENDPOINT = 'F5_DEPRESSIO'
                """
query_gids    = """ SELECT FINNGENID
                    FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                    WHERE ENDPOINT = 'K11_GIDISEASES'
                """

# Job configuration
job_config = bigquery.QueryJobConfig()

# Run the queries
query_result_alzhdem = client.query(query_alzhdem,job_config=job_config)
query_result_type2d  = client.query(query_type2d,job_config=job_config)
query_result_cvd     = client.query(query_cvd,job_config=job_config)
query_result_depress = client.query(query_depress,job_config=job_config)
query_result_gids    = client.query(query_gids,job_config=job_config)

Save the results to different dataframes

query_result_dataframe_alzhdem = query_result_alzhdem.to_dataframe()
query_result_dataframe_type2d  = query_result_type2d.to_dataframe()
query_result_dataframe_cvd     = query_result_cvd.to_dataframe()
query_result_dataframe_depress = query_result_depress.to_dataframe()
query_result_dataframe_gids    = query_result_gids.to_dataframe()

Combine endpoint dataframes to get overlap of FINNGENIDs

comorbidEndpoints = from_contents({'AlzheimerDementia':query_result_dataframe_alzhdem['FINNGENID'].to_list(),
                                   'Type2Diabetes': query_result_dataframe_type2d['FINNGENID'].to_list(),
                                   'CardioVascualrDiseases': query_result_dataframe_cvd['FINNGENID'].to_list(),
                                   'Depression': query_result_dataframe_depress['FINNGENID'].to_list(),
                                   'GastroIntestinalDisease': query_result_dataframe_gids['FINNGENID'].to_list()})

UpSet plot of the FINNGENIDs overlap among the endpoints

# You can play with more parameters of UpSet plot
plt = UpSet(comorbidEndpoints, subset_size = 'count', show_counts = True).plot()
# Save the plot
from matplotlib import pyplot as plt
plt.savefig('/home/ivm/AlzheimerDementia_Comorbidites_UpsetPlot.png')

PreviousBigQuery Python - Case Study - Sex different - Tornado plot NextBigQuery Python - Case Study - Patient Timeline - Scatter plot

Last updated 10 months ago

Was this helpful?