GWAS Analysis
Last updated
Was this helpful?
Last updated
Was this helpful?
Genome-wide association studies, or GWAS, are one of the most common ways to analyze the statistical significance of genetic data. A GWAS statistically tests if a genetic variant occurs more frequently in cases than controls. A FinnGen GWAS looks at millions of variants across the whole genome.
These studies typically compare the effect of variants across the genome on a desired phenotype, and their respective effects and significance thereof. Steps to conduct a GWAS is as diagramatically shown in an excellent paper that we recommend reading by :
One of the simplest models to model GWAS data is with a linear model
y = µ+xβ + ε, where:
● y is the phenotype
● x is the genotype, coded as either 0, 1, or 2:
0 meaning the individual has no copies of the variant gene or homozygous reference,
1 meaning they are heterozygous, and
2 meaning they are homozygous variant
● µ is the mean value of individuals without the variant
● β is the effect each copy of the variant has on the mean phenotype
● ε is a normally distributed error term (a good estimate for most biological data).
If you have all of these, running a base GWAS and getting a P-value in using a commonly used statistical programming tool, R is as simple as running:
Which will output information about your dataset. For more information about R, see the section.
Note: All GWAS results for all available endpoints/phenotypes from FinnGen is available in .
.
, the National Human Genome Research Institute, USA.
at the University of Helsinki provides an excellent introduction to the topic to those interested in a more in-depth look at running your own GWAS, and much of the background here was sourced from his which are free to use.