GWAS Analysis

Genome-wide association studies, or GWAS, are one of the most common ways to analyze the statistical significance of genetic data. A GWAS statistically tests if a genetic variant occurs more frequently in cases than controls. A FinnGen GWAS looks at millions of variants across the whole genome.

These studies typically compare the effect of variants across the genome on a desired phenotype, and their respective effects and significance thereof. Steps to conduct a GWAS is as diagramatically shown in an excellent paper that we recommend reading by Uffelmann et al (2021) Nature Reviews Methods Primersarrow-up-right:

Adapted from Uffelmann et al (2021) Nature Reviews Methods Primers

One of the simplest models to model GWAS data is with a linear model

y = µ+xβ + ε, where:

y is the phenotype

x is the genotype, coded as either 0, 1, or 2:

  • 0 meaning the individual has no copies of the variant gene or homozygous reference,

  • 1 meaning they are heterozygous, and

  • 2 meaning they are homozygous variant

● µ is the mean value of individuals without the variant

● β is the effect each copy of the variant has on the mean phenotype

● ε is a normally distributed error term (a good estimate for most biological data).

If you have all of these, running a base GWAS and getting a P-value in using a commonly used statistical programming tool, R is as simple as running:

Which will output information about your dataset. For more information about R, see the Getting Started with R section.

Note: All GWAS results for all available endpoints/phenotypes from FinnGen is available in FinnGen PheWebarrow-up-right.

Additional reading

Click here to read more about how you can run GWAS in Sandbox using FinnGen data

Last updated

Was this helpful?