How to run PRS pipeline

This pipeline is for calculating PRS from external summary statistics for FinnGen individuals. See more information about the pipeline in the github- page. There are two pipelines in SB:

unmodifiable pipeline for weights
modifiable pipeliene for scores (and weights)

Input table

For both pipelines a fundamental input table containing metadata for the munging step and weight calculation is required. Munging takes care of (almost) all possible scenarios in terms of snpid (rsid,chrom_pos_ref_alt,chrom:pos) and builds in order to generate weights that can be used with FinnGen data.

One can use the default parameter used in both pipelines as a template. Here is an examplee of how the file should look:

The metadata file needs to be tab-separated with no spaces in any field (Cromwell specs). We recommend taking the default file in the json, upload it into excel, insert custom data and export as tsv, including header row!
Mandatory fields are in bold in the sample above. Others can be filled with a placeholder, like NA (but no empty fields!)
filename is the basename of the file (the path to it is specified in another json field)
pheno is used for region exclusion in scores (see scores pipeline inputs for more info)
n_total is the sample size
if variant is provided, the script will try to extract information from the SNPID either by using the rsid (if it's present) or extracting the first two integers (ideally chrom and pos) from the string
chrom & pos will be used to define the variant as CHROM_POS_REF_ALT if no RSID is found in the variant field
effect should either be BETA or OR
pval is the name of the column

GWAS summary statistic file (referred to in column ‘filename’) needs to be gzipped and have a .gz extension

In the WDL, the tsv is reduced to only a subset of required fields. You can also check manually/locally if it runs properly by running the following command (all on one line):

cat TABLE_ABOVE.tsv| sed -E 1d | cut -f 1,3,8-17 > sumstats.txt

Please check that all fields are present (with NAs if the case) and that they match the expected output.

Unmodifiable pipeline for weights only

This pipeline will generate weights (both rsid and chrompos format) and automatically export them to green library as they contain no personal data.

Inputs

The required inputs are:

sandbox_prs_weights.prefix: prefix of outputs
sandbox_prs_weights.ss_meta : the metadata file mentioned above
sandbox_prs_weights.munge.ss_data_path: path where the sumstats (filename) are located
sandbox_prs_weights.weights.bim_file: bim file (by default FinnGen hm3 bim) that can be used to force CS-PRS to work with a selection of snps. The final snplist will be the intersection between sumstats snps, FinnGen hm3 snps and the ones in the bim file

Outputs

The pipeline will produce the following files:

rsid_weights : snp weights in rsid format
chrompos_weights : snp weights in chrompos format
logs : all CS-PRS logs for the weights

Modifiable pipeline for scores

This pipeline will generate weights *and* scores, but the results will require permission to be exported.

Inputs

The required inputs are:

sandbox_prs.gwas_data_path : as above, the path where the sumstats are located
sandbox_prs.gwas_meta : the metadata file mentioned above
sandbox_prs.scores.bed_file : bed file for scores (default FinnGen hm3 bed)
sandbox_prs.scores.regions

regions is a file (use the current one for template) that allows one to calculate scores excluding regions based on phenotypes (for the time being APOE for Alzheimer’s). If one specifies a mapping between phenos and regions, for each pheno and extra score labeled no_regions is produced. The information about the phenotype is passed in the gwas_meta table under pheno. The two strings need to match

Outputs

The pipeline will produce the following files:

out_scores : scores for FinnGen samples
out_weights : weights in chrompos format
out_logs : all CS-PRS logs for the weights

PreviousHow to run the LDSC unmodifiable pipeline NextHow to calculate PRS weights for FinnGen data

Last updated 4 months ago

Was this helpful?