How to run GWAS using SAIGE
Last updated
Was this helpful?
Last updated
Was this helpful?
Pipeline for running GWAS (for binary or quantitative phenotype) using SAIGE. NB! Examples for Saige pipeline has not been updated after DF10.
In FinnGen data releases 1-6, the GWAS was performed using , which performs single-variant association tests for binary traits and quantitative taits. For binary traits, SAIGE uses the saddlepoint approximation (SPA) (; ) to account for case-control imbalances. Releases 7+ now use REGENIE, which we recommend for newer analyses.
!! NB !! Please be cautious with how many GWAS you create and the number of phenotypes you include. Submitting more than ten GWAS jobs simultaneously or GWAS with more than 15 of phenotypes may jam the process and can make your organization's pipeline unusable for others. If you are going to launch more than 5 GWASs or GWAS with tens of phenotypes please contact the and we can temporarily increase the resources of your organization's Sandbox and downscale afterward.
You can find the example files for running SAIGE in Sandbox from /finngen/library-green/scripts/saige/
:
.json files (Note: This file must be edited before running!): saige_R6.json
and saige_R10.json
.wdl file: saige.wdl
sub-wdl file: saige_sub.zip
These are examples for running SAIGE on the endpoint E4_DIABETES
in R6 (saige_R6.json
) and R10 (saige_R10.json
).
You may use some or all of the default covariates or add new covariates. If you like to make a covariate to the SAIGE run please follow the instructions on .
The parts you (may) need to edit in the json file are:
saige.test_combine.bgenlistfile:
path to a .txt file with a list of bgen
files to run the test for, each on its own row. An example file for R6 can be found at /finngen/shared/r6_saige/20201006_141639/files/saige_pipeline_R6/input_files/R6_bgen_filelist.txt
.
saige.null.phenofile:
path to a phenotype-covariate file (a text file including the phenotype codes and covariates used). An example file for R6 can be found at /finngen/library-red/finngen_R6/phenotype_2.0/data/finngen_R6_cov_pheno_1.0.txt.gz
.
Hint: if you would like to use a subset of individuals in your analysis, mark the samples that you don't want to use as NA
in your phenotype and add this new phenotype as a new column in the existing phenofile. Edit the phenotype list so that it contains the name of your new phenotype column.
saige.phenolistfile:
path to a .txt file containing a list of phenotypes to run in a single column, each on its own row. These codes should correspond to the exact column ID on your Phenotype file for running SAIGE. For example:
saige.null.bedfile:
Path to .bed file for your genetic relatedness matrix (GRM). An example file for R6 can be found at /finngen/library-red/finngen_R6/grm_1.0/data/finngen_R6_grm_v1_ld_0.1.bed
saige.null.covariates:
List of covariates used in the analysis, separated by ,
- for example "age,gender"
.
saige.test_combine.test.samplefile:
Path to a .txt file listing sample IDs used in the analysis, one sample ID per row. An example file for R6 can be found at: /finngen/shared/r6_saige/20201006_141639/files/saige_pipeline_R6input_files/finngen_r6_sample_list.txt
Note: the number of samples specified in the sample file must match the number of samples in the .bgen file(s).
Note that the FinnGen IDs in the saige.test_combine.test.samplefile
you specify in the .json
file should match and be in the same order as FinnGen IDs in the saige.null.bedfile
. For example, in .json
file for R8, "saige.test.samplefile": "path/to/finngen_r8_bgen_sample_list.txt"
is compatible with "saige.null.bedfile": "/path/to/R8_GRM_V0_LD_0.1.bed"
.
In SAIGE, you'll define whether to use logistic or linear model by setting in the .json file saige_pipeline_ps.traitType
to binary
for a logistic model, and quantitative
for a linear model, for binary and continuous traits, respectively.
You can submit your SAIGE job to the pipeline system via the command line with the following command:
Once your job is done running, you can find the output of your SAIGE run from: /finngen/pipeline/cromwell/workflows/saige/[WORKFLOW_ID]/call-test_combine/shard-#/sub.test_combine/[SUB_WORKFLOW_ID]/call-combine/
(each of your phenotypes have their own folder 'shard' number, starting from 0. If you only have one phenotype, there will be only one folder: shard-0
)
In the output folder(s) you will find:
2 summary statistics files:
{prefix}_{pheno}.gz:
Summary statistics file with columns cleaned for use with pheweb's format
{prefix}_{pheno}.saige.gz:
Full summary statistics file
Plots (under glob-*
subfolder):
{prefix}_{pheno}.gz_pheweb_pval_manhattan.png:
Manhattan plot
{prefix}_{pheno}.gz_pheweb_pval_manhattan_loglog.png:
Log-adjusted Manhattan plot
{prefix}_{pheno}.gz_pval_qqplot.png:
Quantile-quantile (QQ) plot
Related:
Before you can submit your job, you need to download needed, and edit the .json file. The easiest way is to copy the files into your /finngen/red/
folder from library-red
and modify them if needed. You need to modify the .json
file (see points 1-6 ), and create covariate + phenotype
file, phenotype list
file".
saige.traitType:
Set to binary
or quantitative
. Specifies whether your phenotype(s) is binary or quantitative, and thus whether to run a .
Once your files are ready, open and copy-paste your .wdl
and .json
files, and now, be able to run your GWAS.
See how the Sandbox paths and pipelines are mapped .
Remember to save your job's ID to keep track of your job and view the output. See also .