How to run PRS pipeline

This pipeline is for calculating PRS from external summary statistics for FinnGen individuals. See more information about the pipeline in the github- page. There are two pipelines in SB:

  • unmodifiable pipeline for weights

  • modifiable pipeliene for scores (and weights)

Input table

For both pipelines a fundamental input table containing metadata for the munging step and weight calculation is required. Munging takes care of (almost) all possible scenarios in terms of snpid (rsid,chrom_pos_ref_alt,chrom:pos) and builds in order to generate weights that can be used with FinnGen data.

One can use the default parameter used in both pipelines as a template. Here is an examplee of how the file should look:

  • The metadata file needs to be tab-separated with no spaces in any field (Cromwell specs). We recommend taking the default file in the json, upload it into excel, insert custom data and export as tsv, including header row!

  • Mandatory fields are in bold in the sample above. Others can be filled with a placeholder, like NA (but no empty fields!)

  • filename is the basename of the file (the path to it is specified in another json field)

  • pheno is used for region exclusion in scores (see scores pipeline inputs for more info)

  • n_total is the sample size

  • if variant is provided, the script will try to extract information from the SNPID either by using the rsid (if it's present) or extracting the first two integers (ideally chrom and pos) from the string

  • chrom & pos will be used to define the variant as CHROM_POS_REF_ALT if no RSID is found in the variant field

  • effect should either be BETA or OR

  • pval is the name of the column

GWAS summary statistic file (referred to in column ‘filename’) needs to be gzipped and have a .gz extension

In the WDL, the tsv is reduced to only a subset of required fields. You can also check manually/locally if it runs properly by running the following command (all on one line):

cat TABLE_ABOVE.tsv| sed -E 1d | cut -f 1,3,8-17 > sumstats.txt

Please check that all fields are present (with NAs if the case) and that they match the expected output.

Unmodifiable pipeline for weights only

This pipeline will generate weights (both rsid and chrompos format) and automatically export them to green library as they contain no personal data.

Inputs

The required inputs are:

  • sandbox_prs_weights.prefix: prefix of outputs

  • sandbox_prs_weights.ss_meta : the metadata file mentioned above

  • sandbox_prs_weights.munge.ss_data_path: path where the sumstats (filename) are located

  • sandbox_prs_weights.weights.bim_file: bim file (by default FinnGen hm3 bim) that can be used to force CS-PRS to work with a selection of snps. The final snplist will be the intersection between sumstats snps, FinnGen hm3 snps and the ones in the bim file

Outputs

The pipeline will produce the following files:

  • rsid_weights : snp weights in rsid format

  • chrompos_weights : snp weights in chrompos format

  • logs : all CS-PRS logs for the weights

Modifiable pipeline for scores

This pipeline will generate weights *and* scores, but the results will require permission to be exported.

Inputs

The required inputs are:

  • sandbox_prs.gwas_data_path : as above, the path where the sumstats are located

  • sandbox_prs.gwas_meta : the metadata file mentioned above

  • sandbox_prs.scores.bed_file : bed file for scores (default FinnGen hm3 bed)

  • sandbox_prs.scores.regions

regions is a file (use the current one for template) that allows one to calculate scores excluding regions based on phenotypes (for the time being APOE for Alzheimer’s). If one specifies a mapping between phenos and regions, for each pheno and extra score labeled no_regions is produced. The information about the phenotype is passed in the gwas_meta table under pheno. The two strings need to match

Outputs

The pipeline will produce the following files:

  • out_scores : scores for FinnGen samples

  • out_weights : weights in chrompos format

  • out_logs : all CS-PRS logs for the weights

Last updated

Was this helpful?