BigQuery Python - Downstream analysis - Active Ingredient - Bar plot

We introduced details on connection to BigQuery using python packages here. After the connection there are many different ways you can use the query results. In this we detail a use case scenario for how you can query for data from BigQuery tables and use it in your analysis.

Location of the script

/finngen/library-green/scripts/code_snippets/codeSnippet_activeIngredient.py

You can copy paste from below explanation or take the code directly from the file itself.

As example, you want to know what are the active ingredients of top drug purchases in the year 1995 and do a simple bar plot to showcase the results. The following logic is as follows

  1. We want to know what each individual drug purchase was in year 1995 which is found in table finngen_r10_service_sector_detailed_longitudinal in the dataset sandbox_tools_r10 in the project finngen-production-library.

  2. The drug information can be found in the column CODE3 which is the VNR code.

  3. We can get VNR code information from table finngen-vnr in the dataset medical_codes in the project finngen-production-library.

  4. We can extract active ingredient and see the how many purchases were made for each active ingredient which is the column Substance.

  5. We will plot the top 10 active ingredients

You can extract the information using the below query

# Import packages
import os, sys
from google.cloud import bigquery
from matplotlib import pyplot as plt

# Connect to client
client = bigquery.Client()

# Run the query
query = """ SELECT FVNR.Substance AS ActiveIngredient,
			 COUNT(*) AS N_PURCHASES
            FROM `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1` AS FDL
            JOIN `finngen-production-library.medical_codes.finngen_vnr_v1` as FVNR
            ON FDL.CODE3 = FVNR.VNR
            WHERE FDL.SOURCE='PURCH' AND 
                  EXTRACT(YEAR from FDL.APPROX_EVENT_DAY) = 1995 AND 
                  FVNR.Substance IS NOT NULL AND 
                  FDL.CODE3 IS NOT NULL
            GROUP BY ActiveIngredient
            ORDER BY N_PURCHASES DESC              
            LIMIT 10
	"""
# Job configuration
job_config = bigquery.QueryJobConfig()
# Run the query
query_result = client.query(query,job_config=job_config)

You can print the results before plotting them

You can copy paste the following code to plot the top 10 active ingredients

You can also run the query and do bar plot of top 10 active ingredients in the anaconda python environment with much more ease because of pandas_gbq package.

Last updated

Was this helpful?