# How to run PRS pipeline

This pipeline is for calculating PRS from external summary statistics for FinnGen individuals. See more information about the pipeline in the [github- page](https://github.com/FINNGEN/CS-PRS-pipeline).\
\
There are two pipelines in SB:

* unmodifiable pipeline for weights
* modifiable pipeliene for scores (and weights)

## Input table

For both pipelines a fundamental input table containing metadata for the munging step and weight calculation is required. Munging takes care of (almost) all possible scenarios in terms of snpid (rsid,chrom\_pos\_ref\_alt,chrom:pos) and builds in order to generate weights that can be used with FinnGen data.<br>

One can use the default parameter used in both pipelines as a template. Here is an examplee of how the file should look:

![](https://lh6.googleusercontent.com/yCizIx81uKJ53QaWscdqZicCpsUU7Vy3D4x5ZDP2Ss-SJ5AXMh6AxlOKIWQqce1vxByQlXOdnwuClqsleutOXHyG_VYgxiNSDyWB84tiEDmS9FUoDiqUTgTHiQYME6hjuMCdcgMf)

* The metadata file needs to be tab-separated with no spaces in any field (Cromwell specs). We recommend taking the default file in the json, upload it into excel, insert custom data and export as tsv, including header row!
* Mandatory fields are in **bold** in the sample above. Others can be filled with a placeholder, like NA (but no empty fields!)
* `filename` is the basename of the file (the path to it is specified in another json field)
* `pheno` is used for region exclusion in scores (see scores pipeline inputs for more info)
* `n_total` is the sample size
* if `variant` is provided, the script will try to extract information from the SNPID either by using the rsid (if it's present) or extracting the first two integers (ideally chrom and pos) from the string
* `chrom` & `pos` will be used to define the variant as `CHROM_POS_REF_ALT` if no RSID is found in the variant field
* `effect` should either be `BETA` or `OR`
* `pval` is the name of the column

GWAS summary statistic file (referred to in column ‘filename’) needs to be gzipped and have a .gz extension

In the WDL, the tsv is reduced to only a subset of required fields. You can also check manually/locally if it runs properly by running the following command (all on one line):

`cat TABLE_ABOVE.tsv| sed -E 1d | cut -f 1,3,8-17 > sumstats.txt`

Please check that all fields are present (with NAs if the case) and that they match the expected output.

## Unmodifiable pipeline for weights only

This pipeline will generate weights (both rsid and chrompos format) and automatically export them to green library as they contain no personal data.

### Inputs

The required inputs are:

* sandbox\_prs\_weights.prefix: prefix of outputs
* sandbox\_prs\_weights.ss\_meta : the metadata file mentioned above
* sandbox\_prs\_weights.munge.ss\_data\_path: path where the sumstats (filename) are located
* sandbox\_prs\_weights.weights.bim\_file: bim file (by default FinnGen hm3 bim) that can be used to force CS-PRS to work with a selection of snps. The final snplist will be the intersection between sumstats snps, FinnGen hm3 snps and the ones in the bim file

### Outputs

The pipeline will produce the following files:

* `rsid_weights` : snp weights in rsid format
* `chrompos_weights` : snp weights in chrompos format
* `logs` : all CS-PRS logs for the weights

## Modifiable pipeline for scores

This pipeline will generate weights \*and\* scores, but the results will require permission to be exported.

### Inputs

The required inputs are:

* sandbox\_prs.gwas\_data\_path : as above, the path where the sumstats are located
* sandbox\_prs.gwas\_meta : the metadata file mentioned above
* sandbox\_prs.scores.bed\_file : bed file for scores (default FinnGen hm3 bed)
* sandbox\_prs.scores.regions

**regions** is a file (use the current one for template) that allows one to calculate scores excluding regions based on phenotypes (for the time being APOE for Alzheimer’s). If one specifies a mapping between phenos and regions, for each pheno and extra score labeled no\_regions is produced. The information about the phenotype is passed in the `gwas_meta` table under `pheno`. The two strings need to match

### Outputs

The pipeline will produce the following files:

* `out_scores` : scores for FinnGen samples
* `out_weights` : weights in chrompos format
* `out_logs` : all CS-PRS logs for the weights


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-prs.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
