# BigQuery Python - Case Study - Sex different - Tornado plot

In this we detail a scenario for how you **plot sex differences distributed across different age bins** within a FinnGen endpoint.

**Location of the script**

`/finngen/library-green/scripts/code_snippets/codeSnippet_sexDifference_endpoint.py`

You can copy-paste from the below explanation or take the code directly from the file itself.

From the samples in the **F5\_ALZHDEMENT** endpoint, it would be interesting to see what the age range of male and female samples looks like. The first thing is to extract the sex and age information and then proceed to plot them.

You can get the sex information of the FINNGENIDs in the column **SEX** from the table **r10\_cov\_v1\_fid** in the dataset **sandbox\_tools\_r10** in the project **finngen-production-library**. You can extract the information using the below query

```
# Import packages
import os, sys
from google.cloud import bigquery
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Connect to client
client = bigquery.Client()

# Run the query
query = """ WITH endpointSubSet AS(
                  SELECT FINNGENID
                  FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                  WHERE ENDPOINT = 'F5_ALZHDEMENT'
            )
            SELECT ESS.FINNGENID AS FINNGENID,
                   DL.EVENT_AGE AS EVENT_AGE,
                   COV.sex AS SEX
            FROM endpointSubSet AS ESS
            JOIN `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1` as DL
            ON ESS.FINNGENID = DL.FINNGENID
            JOIN `finngen-production-library.sandbox_tools_r10.covariates_r10_v1` as COV
            ON ESS.FINNGENID = COV.fid
            WHERE DL.CODE1 LIKE '%F00%' # ICD code of Alzheimer's Dementia endpoint
        """
# Job configuration
job_config = bigquery.QueryJobConfig()
# Run the query
query_result = client.query(query,job_config=job_config)
```

Save the results to a data frame

```
columnNames = list(list(query_result)[0].keys())
temporaryList = []
for row in query_result:
      temporaryList.append(list(row))
# Convert the temporaryList to dataframe using pandas package
query_result_dataframe = pd.DataFrame(temporaryList, columns = columnNames)
```

Before plotting, we need to create age bins for male and female

```
# Create Age bins and add the age bins to respective EVENT_AGE
ageBins = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]
query_result_dataframe['AgeBin'] = pd.cut(query_result_dataframe['EVENT_AGE'],bins=ageBins)

# We need to count male and female population in the respective age bins
ageBinSexCount = query_result_dataframe.pivot_table(['EVENT_AGE'],index=['AgeBin'], columns='SEX',aggfunc=len, fill_value=0)
# Change the column names to more readable format
ageBinSexCount.columns = ['_'.join(str(s).strip() for s in col if s) for col in ageBinSexCount.columns]
ageBinSexCount.reset_index(inplace=True)
ageBinSexCount = ageBinSexCount.rename(columns={'EVENT_AGE_female':'Female','EVENT_AGE_male':'Male'})
```

Plot the sex difference in different age bins

```
# The Male counts will be on the negative side of x-axis so have to change the values to negative
ageBinSexCount['Male'] =  ageBinSexCount['Male'] / -1
# Plot parameters you can change
large = 22; med = 16; small = 12
params = {'axes.titlesize': large,
       'legend.fontsize': med,
       'figure.figsize': (16, 10),
       'axes.labelsize': med,
       'axes.titlesize': med,
       'xtick.labelsize': med,
       'ytick.labelsize': med,
       'figure.titlesize': large}
plt.rcParams.update(params)
# Plot background style. You can use different styles
plt.style.use('seaborn-whitegrid')
sns.set_style("white")
# This is just for Python Jupyter notebook where you want the plot to be inline. You can ignore this for now
%matplotlib inline
# The age bins will be changed to decreasing order as old age will be on top and young age will be bottom
ages = list(reversed(ageBinSexCount['AgeBin'].tolist()))
# Initiate the plot
plt.rcParams["figure.figsize"] = (15, 8)
ax1 = sns.barplot(x='Male', y='AgeBin', data=ageBinSexCount, order=ages, palette="Blues")
ax2 = sns.barplot(x='Female', y='AgeBin', data=ageBinSexCount, order=ages, palette="Greens")
plt.title("Sex distribution in age bins of ALZHDEMENT endpoint")
plt.xticks(ticks=[-10000, -5000, -2500, -1000, 0, 1000, 2500, 5000, 10000],
    labels = ['10k', '5k', '2.5k', '1k', '0', '1k', '2.5k', '5k', '10k']
   )
plt.grid()
plt.xlabel("Male/Female")
# You can also save the plot
plt.savefig('/home/ivm/sexdistribution_pyramid.png')
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/miscellaneous-helper-scripts-tools/tool-to-annotate-variants-with-rsids-1/bigquery-python-case-study-sex-different-tornado-plot.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
