# How to run GWAS using REGENIE

## Introduction

From FinnGen release 7 (R7), all FinnGen core endpoint GWAS were performed using [REGENIE](https://www.nature.com/articles/s41588-021-00870-7), which is similar to SAIGE (used for GWAS in releases 1-6) but is more computationally efficient and provides better effect size estimates with fewer false positives. REGENIE is therefore the [recommended software for FinnGen GWAS](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas.md#regenie-or-saige) but we still provide instructions on [running GWAS with SAIGE](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-saige.md) in case you wish to mirror the analyses performed in releases 1-6.

**Note:** The REGENIE pipeline can also be run using [custom GWAS tools](/working-in-the-sandbox/which-tools-are-available/untitled.md) and initiated directly from the [Cohort Operations](/working-in-the-sandbox/which-tools-are-available/cohort-operations-tool-co.md) tool. In addition to additive model, also recessive and dominant analysis are available in [Custom GWAS CLI](/working-in-the-sandbox/which-tools-are-available/untitled/custom-gwas-command-line-cli-tool.md). From Sandbox update 10.2 onwards also [binary](/working-in-the-sandbox/which-tools-are-available/untitled/custom-gwas-command-line-cli-tool/custom-gwas-cli-binary-mode.md) and [quantitative](/working-in-the-sandbox/which-tools-are-available/untitled/custom-gwas-command-line-cli-tool/custom-gwas-cli-quantitative-mode.md) phenotype analyses are available in[ Custom GWAS CLI](/working-in-the-sandbox/which-tools-are-available/untitled/custom-gwas-command-line-cli-tool.md).

**!! NB !!** Please be cautious with how many GWAS you create and the number of phenotypes you include. If you are going to launch more than 5 GWASs or GWAS with tens of phenotypes please contact the [finngen-servicedesk@helsinki.fi](mailto:humgen-servicedesk@helsinki.fi) so that we can temporarily increase the resources of your organization's Sandbox and downscale afterward. After resources have been increased, we recommend that you would run a single GWAS job every 30 minutes (in a bash script you can use ‘sleep 30m’ in your loop) such that you would run two phenotypes in an hour allowing you to run \~40 jobs in 24 hours. This helps avoid jamming the process and permits other users in your organization to use your organization’s pipeline.

### Unmodifiable ("Green") REGENIE pipeline (R12 and later)

The easiest method to run GWAS for releases R12 onwards is using the [unmodifiable REGENIE workflow](https://finngen.gitbook.io/finngen-handbook/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-regenie-unmodifiable-pipeline), available in the Pipelines tool. This version of the REGENIE GWAS pipeline allows you to input your phenotype file, specify analysis covariates and select the test type (additive, recessive or dominant). The advantage of using the unmodifiable workflow is that your GWAS results will be automatically transferred to the green library (without requiring a download request) and will be added to the [user-results PheWeb](https://userresults.finngen.fi/), which provides vizualisations and a summary of your results.

If you need more customization than the unmodifiable workflow allows (e.g., non-default REGENIE settings), then see instructions for the modifiable ("red") REGENIE pipeline below.

### Standard REGENIE pipeline (all releases)

If you wish to run REGENIE GWAS in release 11 or earlier, use more advanced REGENIE options and/or customize the GWAS analysis in another way, then we provide REGENIE GWAS workflow files (.wdl) that are scripts that define the analysis flow and configuration files (.json) which define the input variables. Examples of these can be found in the sandbox at

`/finngen/library-green/scripts/regenie/`:

* .**json** files (**need to be edited!**):
  * `regenie_example_R9.json`
  * `regenie_example_R10.json`
  * `regenie_example_R11.json`, and
  * `regenie_example_R12.json`
* **.wdl** file: `regenie.wdl *`
* **sub-.wdl** files as one zipped file: `regenie_sub_wdl.zip*`

These are examples to help you understand how to run REGENIE, using the endpoint `J10_ASTHMA_EXMORE` in DF9 (`regenie_example_R9.json`), in DF10 (`regenie_example_R10.json`), DF11 (`regenie_example_R11.json`) and DF12 (`regenie_example_R12.json`).

\*NOTE: all the files (wdl's and json- files) were updated in April 2023 (see User's meeting recording from April 2023), and json- files that have been used before that may not work with the current wdl.

\*NOTE: The example below is listed as LIBRARY\_RED, but you will not be able to write your custom files there. You will need to use the tag SANDBOX\_RED and upload your files to /finngen/red using gsutil.

\*NOTE: The phenotype file must be gzipped.

#### Covariate + phenotype file

You may use some or all of the default covariates or add new covariates. If you like to make a covariate to the REGENIE run please follow the instructions on [how to make a covariate + phenotype file for GWAS pipeline](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/adding-new-covariates-in-gwas-using-regenie-and-saige.md).

#### Prepare your files for REGENIE

Before you can submit your job, you need to download[ example files](#example-files-for-the-regenie-pipeline) needed, and edit the .json file, that looks like this:

![](/files/-MhYMdMT-ShpabwtteKE)

The parts you should edit in the .json- file are highlighted in the figure, and are:

* `regenie.phenolist:` the path to a phenotype list file. A phenotype list file is a text file with each row representing a phenotypic trait (similar to SAIGE), for example:

```
I9_CHD
T1D_WIDE
```

(Note: Multiple correlated phenotypes with missing values of less than 5% can be grouped as a single row separated by a tab in the file. However, we still recommend running each phenotype separately.)

* `regenie.cov_pheno:` the path to a phenotype-covariate file. The pheno-covariate file is a tab- separated (possibly gzipped) .txt file containing all phenotype and covariate columns. The first two columns of the file should be FID and IID. Please provide the **same sample ID in both columns**:

```
FID    IID
FGID1    FGID1
FGID2    FGID2
FGID3    FGID3
```

**NB:** Make sure that there are no spaces in the pheno-covariate file!

* `regenie.covariates:` List of covariate column names, separated by column; for example: `"age,gender"`. NOTE: In the example .json file (`regenie_example_R9.json`) there are already defined covariates used in the R9 core GWAS: age, sex, genotyping batch eand PC1-10.
* `regenie.is_binary:` `true` if your phenotype is binary (e.g. case-control), `false` if quantitative (e.g. BMI). Defines whether to run a [logistic or linear mode](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas.md#logistic-or-linear-model)l. See another example from [Running quantitative GWAS with REGENIE](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/running-quantitative-gwas-with-regenie.md).

If you want to run recessive, or dominant model, you also need to edit:

* `regenie.sub_step2.step2.test:` defines the [association model type ](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0254947)(`additive`, `recessive` or `dominant`) used in the GWAS. In the example model it is `additive` ("normal" GWAS). Unless you are specifically running a recessive or dominant model, there is no need to change this setting.

### Binary (case-control) or quantitative endpoint?

In REGENIE, you'll define whether to use a logistic or linear model by setting in the .json file `regenie.is_binary` as `true` for a logistic model and `false` for a linear model, for binary and continuous traits respectively. If running a REGENIE model for a quantitative trait, you can also use [this example](/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/running-quantitative-gwas-with-regenie.md).

#### Submit your REGENIE job

If you're running REGENIE using Sandbox Pipelines, it's a good idea to first read the sections [Pipelines is based on Cromwell and WDL](/working-in-the-sandbox/running-analyses-in-sandbox/pipelines-tool-instructions/pipelines-is-based-on-cromwell-and-wdl.md), [How to use the Pipelines tool](/working-in-the-sandbox/running-analyses-in-sandbox/pipelines-tool-instructions/how-to-use-the-pipelines-area.md) and [How to submit a pipeline from the command line](/working-in-the-sandbox/running-analyses-in-sandbox/pipelines-tool-instructions/how-to-submit-a-pipeline-from-a-command-line.md).

### Using command line

Once your files are in order, you can submit your run by typing the following command in the FinnGen terminal:

```
finngen-cli rw -w /path/to/regenie.wdl \
                -i /path/to/your.json \
                -d /path/to/regenie_sub_wdl.zip
```

REMEMBER to save your job ID `[WORKFLOW_ID]`to keep track of your job and to be able to view the output! See also tips on [how to find a pipeline job ID](/working-in-the-sandbox/running-analyses-in-sandbox/pipelines-tool-instructions/tips-on-how-to-find-a-pipeline-job-id.md). The `[WORKFLOW_ID]` and your job can be monitored from the pipelines:

![](/files/d6Pt2kJBSuETo38jJXlv)

#### Output

Once your job is successfully done, you can find your output files from: `/finngen/pipeline/cromwell/workflows/regenie/[WORKFLOW_ID]/call-sub_step2/shard-#/sub.regenie_step2/[SUBWORKFLOW_ID]/call-gather/shard-#/`

Running Regenie (R7) in the Sandbox was presented in [User Meeting 24th of August 2021](https://www.finngen.fi/en/members/recordings/finngen-data-users-meeting-24th-august-2021)

### Some things to consider:

* Make sure that all files you have edited yourself, our edited phenotype-covariate file is in `/finngen/red/`. Note that for copying files to `/finngen/red/`, you need to use [`gsutil cp` ](/working-in-the-sandbox/quirks-and-features/how-to-download-results-from-your-ivm.md)and `gs://fg-production-sandbox-<NO>-red/` path for `/finngen/red`. If you're unsure of your sandbox number (`<NO>`), then you can see the bucket paths in `buckets.txt` on your Desktop (`~/Desktop/buckets.txt`) - look for "Sandbox ivm bucket" for the path to your sandbox's red bucket.
* Bucket paths in the .json file need to follow the form proposed in `buckets.txt` when specifying the inputs (e.g. for the modified .json file).
* Make sure that you are using the latest version of REGENIE in `regenie.sub_step1.step1.docker` and `regenie.sub_step2.docker`.
* \[pheno].gz is the preferred summary statistics format (not \[pheno].regenie.gz) for downstream analyses, e.g. finemapping.

See how the Sandbox paths and pipelines are mapped [here](/working-in-the-sandbox/running-analyses-in-sandbox/pipelines-tool-instructions/sandbox-path-and-pipeline-mappings.md).

**Related:**

* [If your pipeline job fails](/working-in-the-sandbox/running-analyses-in-sandbox/pipelines-tool-instructions/if-your-pipeline-job-fails.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-genome-wide-association-studies-gwas/how-to-run-gwas-using-regenie.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
