Using Polygenic Risk Scores
Last updated
Was this helpful?
Last updated
Was this helpful?
Large-scale genetic association studies comparing disease cases with controls have identified thousands of genetic loci associated with various diseases. Studies have been done for traits such as height, lipid levels, and educational attainment. Individually, the detected loci typically modify the disease risks only minimally, but their cumulative impact across the genome can be considerable.
Polygenic risk scores (PRSs) measure this cumulative genetic burden, and their effect on risk stratification and risk prediction has been shown for many diseases and traits.
Several approaches exist for developing PRS. The most simple methods to sum linearly the contribution of each regional peak, only focusing on low p-values. Newer methods use a more sophisticated approach that takes into account the whole genomic structure () and assigns a weight to each variant, increasing the weight of the most significant contributions and reducing the weight of irrelevant and statistically correlated signals to 0 (Figure 1). Once such weights have been calculated, one can proceed to sum all weights over the genome of the target population. Details on the statistical modeling underlying PRS generation can be found in .
Figure 1: General principle of newer methods used for building PRS
Ultimately, all PRS algorithms produce, for each individual, a score that is meaningless by itself, but that allows us to rank the individuals in terms of relative risk to each other. Common ways to present PRS effects include:
scaling the PRS to mean zero and a standard deviation of one, which allows one to show effect sizes by one standard deviation increase in the PRS. Also with this, the individuals’ PRS values can be interpreted in a similar way as for instance growth charts familiar to clinicians, and we can, for instance, say that “an individual has a PRS of +2.0SD”.
categorizing individuals into groups based on levels of PRS. No widely accepted categories exist, but some commonly used categories include quintiles or a comparison between individuals above the 90th percentile vs the rest of the distribution.
A reporting framework for PRS studies can be found at:
By summarizing common genetic effects across the genome, we capture germline genetic susceptibility to disease into a single measure, the PRS. The PRS can be used for several types of analyses, from understanding biological processes underlying diseases, to estimating their potential role as clinical tools for risk stratification and targeting individuals for risk mitigation. Examples of PRS use cases can be found from .
With improved and larger GWAS, PRS computations will continue to improve. Moreover, the methodologies used for generating PRS are constantly improving. An important limitation of PRS is that the majority of the research has been performed in individuals of European ancestry (), and an important goal for the field is to improve the diversity of PRS studies, including the development of methods that allow PRS modeling in individuals of admixed ancestry.
In FinnGen, we provide a large number of PRS already calculated for the community and are ready for use. Custom PRS for diseases and traits of interest can also be generated with the . The current method used by FinnGen for generating PRS is: .
. [A review describing the concept of genetic architecture, which is relevant for understanding and applying PRSs.]
. [A summary of the methodologies used for building, evaluating and applying risk prediction models that include information from genetic testing and environmental risk factors. New methods for building PRSs have been developed after this article was published, but the review is a great summary of general methodology and terminology used in PRS studies.]
. Nat. Rev. Genet. 19, 581–590 (2018). [This review lays out the principles of PRS for clinical risk stratification in common diseases.]
\