How to run PRS pipeline
This pipeline is for calculating PRS from external summary statistics for FinnGen individuals. See more information about the pipeline in the github- page. There are two pipelines in SB:
unmodifiable pipeline for weights
modifiable pipeliene for scores (and weights)
Input table
For both pipelines a fundamental input table containing metadata for the munging step and weight calculation is required. Munging takes care of (almost) all possible scenarios in terms of snpid (rsid,chrom_pos_ref_alt,chrom:pos) and builds in order to generate weights that can be used with FinnGen data.
One can use the default parameter used in both pipelines as a template. Here is an examplee of how the file should look:
The metadata file needs to be tab-separated with no spaces in any field (Cromwell specs). We recommend taking the default file in the json, upload it into excel, insert custom data and export as tsv, including header row!
Mandatory fields are in bold in the sample above. Others can be filled with a placeholder, like NA (but no empty fields!)
filename
is the basename of the file (the path to it is specified in another json field)pheno
is used for region exclusion in scores (see scores pipeline inputs for more info)n_total
is the sample sizeif
variant
is provided, the script will try to extract information from the SNPID either by using the rsid (if it's present) or extracting the first two integers (ideally chrom and pos) from the stringchrom
&pos
will be used to define the variant asCHROM_POS_REF_ALT
if no RSID is found in the variant fieldeffect
should either beBETA
orOR
pval
is the name of the column
GWAS summary statistic file (referred to in column ‘filename’) needs to be gzipped and have a .gz extension
In the WDL, the tsv is reduced to only a subset of required fields. You can also check manually/locally if it runs properly by running the following command (all on one line):
cat TABLE_ABOVE.tsv| sed -E 1d | cut -f 1,3,8-17 > sumstats.txt
Please check that all fields are present (with NAs if the case) and that they match the expected output.
Unmodifiable pipeline for weights only
This pipeline will generate weights (both rsid and chrompos format) and automatically export them to green library as they contain no personal data.
Inputs
The required inputs are:
sandbox_prs_weights.prefix: prefix of outputs
sandbox_prs_weights.ss_meta : the metadata file mentioned above
sandbox_prs_weights.munge.ss_data_path: path where the sumstats (filename) are located
sandbox_prs_weights.weights.bim_file: bim file (by default FinnGen hm3 bim) that can be used to force CS-PRS to work with a selection of snps. The final snplist will be the intersection between sumstats snps, FinnGen hm3 snps and the ones in the bim file
Outputs
The pipeline will produce the following files:
rsid_weights
: snp weights in rsid formatchrompos_weights
: snp weights in chrompos formatlogs
: all CS-PRS logs for the weights
Modifiable pipeline for scores
This pipeline will generate weights *and* scores, but the results will require permission to be exported.
Inputs
The required inputs are:
sandbox_prs.gwas_data_path : as above, the path where the sumstats are located
sandbox_prs.gwas_meta : the metadata file mentioned above
sandbox_prs.scores.bed_file : bed file for scores (default FinnGen hm3 bed)
sandbox_prs.scores.regions
regions is a file (use the current one for template) that allows one to calculate scores excluding regions based on phenotypes (for the time being APOE for Alzheimer’s). If one specifies a mapping between phenos and regions, for each pheno and extra score labeled no_regions is produced. The information about the phenotype is passed in the gwas_meta
table under pheno
. The two strings need to match
Outputs
The pipeline will produce the following files:
out_scores
: scores for FinnGen samplesout_weights
: weights in chrompos formatout_logs
: all CS-PRS logs for the weights
Last updated
Was this helpful?