# GWAS Analysis

Genome-wide association studies, or GWAS, are one of the most common ways to analyze the statistical significance of genetic data. A GWAS statistically tests if a genetic variant occurs more frequently in cases than controls. A FinnGen GWAS looks at millions of variants across the whole genome.

These studies typically compare the effect of variants across the genome on a desired phenotype, and their respective effects and significance thereof. Steps to conduct a GWAS is as diagramatically shown in an excellent paper that we recommend reading by [Uffelmann et al (2021) *Nature Reviews Methods Primers*](https://www.nature.com/articles/s43586-021-00056-9):

![Adapted from Uffelmann et al (2021) Nature Reviews Methods Primers](/files/aOjjD015k5vtdOFlR4pv)

One of the simplest models to model GWAS data is with a linear model

y = µ+xβ + ε, where:

● *y* is the phenotype

● *x* is the genotype, coded as either 0, 1, or 2:

* 0 meaning the individual has no copies of the variant gene or homozygous reference,
* 1 meaning they are heterozygous, and
* 2 meaning they are homozygous variant

● µ is the mean value of individuals without the variant

● β is the effect each copy of the variant has on the mean phenotype

● ε is a normally distributed error term (a good estimate for most biological data).

If you have all of these, running a base GWAS and getting a P-value in using a commonly used statistical programming tool, R is as simple as running:

```
lm.fit = lm(y ~ x)
summary(lm.fit)
```

Which will output information about your dataset. For more information about R, see the [Getting Started with R](/background-reading/how-to-get-started-with-r.md) section.

**Note**: All GWAS results for all available endpoints/phenotypes from FinnGen is available in [FinnGen PheWeb](https://results.finngen.fi).

#### Additional reading

* [Uffelmann et al (2021) *Nature Reviews Methods Primers*](https://www.nature.com/articles/s43586-021-00056-9)*.*
* [GWAS Primer](https://www.genome.gov/about-genomics/fact-sheets/Genome-Wide-Association-Studies-Fact-Sheet), the National Human Genome Research Institute, USA.
* [Matti Pirinen’s Genome-wide Association Studies (course code: LSI34002) course](https://www.mv.helsinki.fi/home/mjxpirin/GWAS_course/) at the University of Helsinki provides an excellent introduction to the topic to those interested in a more in-depth look at running your own GWAS, and much of the background here was sourced from his [course notes](https://www.mv.helsinki.fi/home/mjxpirin/GWAS_course/material/GWAS1.pdf) which are free to use.

[Click here to read more about how you can run GWAS in Sandbox using FinnGen data](/working-in-the-sandbox/which-tools-are-available/untitled.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/background-reading/gwas-analysis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
