# Analysis covariates

This page has been last updated for R13.

### Sandbox directory

Analysis covariates are available in the following Sandbox directory:

`/finngen/library-red/finngen_R[RELEASE]/analysis_covariates`

### Data files

The analysis covariate file is a tab-separated, gzip-compressed text file that contains covariate and endpoint data for each sample. The file contains three sets of columns:

* column 1: Sample ID
* columns 2 to N: covariates including principal components, \~200 columns for R13
* columns N+1 to N+1+number of endpoints: individual's phenotype status for each FinnGen endpoint

The covariate file does not contain FinnGen genotypes for individuals with non-Finnish ancestry. For more complete phenotype data see [the phenotype files](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1).

Often users will subset this file in R to run their own analyses and/or add additional analysis columns.

#### Some column descriptions:

| **Column name**                        | **Description**                                                                                                      |
| -------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| FINNGENID                              | Sample ID                                                                                                            |
| AGE\_AT\_DEATH\_OR\_END\_OF\_FOLLOWUP  | Age of sample at death or end of followup                                                                            |
| batch                                  | batch                                                                                                                |
| n\_var                                 | Number of genotyped variants                                                                                         |
| chip                                   | Chip used for genotyping                                                                                             |
| IS\_AFFY                               | Whether the sample was genotyped using Affymetrix chip                                                               |
| IS\_FINNGEN1\_CHIP                     | Whether the sample was genotyped using Finngen v1 chip                                                               |
| IS\_FINNGEN2\_CHIP                     | Whether the sample was genotyped using Finngen v2 chip                                                               |
| IS\_AFFY\_\*                           | Whether the chip genotypes were called using the specified version of the calling algorithm                          |
| AGE\_AT\_DEATH\_OR\_END\_OF\_FOLLOWUP2 | AGE\_AT\_DEATH\_OR\_FOLLOWUP\*AGE\_AT\_DEATH\_OR\_FOLLOWUP                                                           |
| BATCH\*                                | Whether the sample was part of that genotyping batch. Can be used to control for batch-specific effects in analysis. |
| PC\*                                   | Individual's PCA value for that component                                                                            |
| \*\_IRN                                | Inverse rank-normalized quantitative endpoints                                                                       |

For other columns, refer to the [minimum extended phenotype](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/minumum-extended-phenotype-data) and the [endpoint data](https://docs.finngen.fi/finngen-data-specifics/red-library-data-individual-level-data/what-phenotype-files-are-available-in-sandbox-1/endpoint-and-endpoint-longitudinal-data) pages.

### Further information

The covariate file is used for GWAS and other analyses. The following covariates are used in FinnGen's core GWAS analyses:

* Age
* Sex
* First 10 principal components
* Genotyping batch (Finngen 1 or 2 chip and legacy genotyping batch)

**Note**: This file is usually released a little later than the phenotype files as it needs the PCA results to be created.
