# GWAS Analysis

Genome-wide association studies, or GWAS, are one of the most common ways to analyze the statistical significance of genetic data. A GWAS statistically tests if a genetic variant occurs more frequently in cases than controls. A FinnGen GWAS looks at millions of variants across the whole genome.

These studies typically compare the effect of variants across the genome on a desired phenotype, and their respective effects and significance thereof. Steps to conduct a GWAS is as diagramatically shown in an excellent paper that we recommend reading by [Uffelmann et al (2021) *Nature Reviews Methods Primers*](https://www.nature.com/articles/s43586-021-00056-9):

![Adapted from Uffelmann et al (2021) Nature Reviews Methods Primers](https://3072695768-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MhYL0UTLjqsuIdK0SSO%2Fuploads%2Fgit-blob-95484ee8818d3c1a160c63ca402e3e1e8c22956f%2Fimage%20\(777\).png?alt=media)

One of the simplest models to model GWAS data is with a linear model

y = µ+xβ + ε, where:

● *y* is the phenotype

● *x* is the genotype, coded as either 0, 1, or 2:

* 0 meaning the individual has no copies of the variant gene or homozygous reference,
* 1 meaning they are heterozygous, and
* 2 meaning they are homozygous variant

● µ is the mean value of individuals without the variant

● β is the effect each copy of the variant has on the mean phenotype

● ε is a normally distributed error term (a good estimate for most biological data).

If you have all of these, running a base GWAS and getting a P-value in using a commonly used statistical programming tool, R is as simple as running:

```
lm.fit = lm(y ~ x)
summary(lm.fit)
```

Which will output information about your dataset. For more information about R, see the [Getting Started with R](https://docs.finngen.fi/background-reading/how-to-get-started-with-r) section.

**Note**: All GWAS results for all available endpoints/phenotypes from FinnGen is available in [FinnGen PheWeb](https://results.finngen.fi).

#### Additional reading

* [Uffelmann et al (2021) *Nature Reviews Methods Primers*](https://www.nature.com/articles/s43586-021-00056-9)*.*
* [GWAS Primer](https://www.genome.gov/about-genomics/fact-sheets/Genome-Wide-Association-Studies-Fact-Sheet), the National Human Genome Research Institute, USA.
* [Matti Pirinen’s Genome-wide Association Studies (course code: LSI34002) course](https://www.mv.helsinki.fi/home/mjxpirin/GWAS_course/) at the University of Helsinki provides an excellent introduction to the topic to those interested in a more in-depth look at running your own GWAS, and much of the background here was sourced from his [course notes](https://www.mv.helsinki.fi/home/mjxpirin/GWAS_course/material/GWAS1.pdf) which are free to use.

[Click here to read more about how you can run GWAS in Sandbox using FinnGen data](https://docs.finngen.fi/working-in-the-sandbox/which-tools-are-available/untitled)
