Sandbox download requests – rules and examples for minimum N
Background: Due to data privacy reasons, only aggregate level data from at least 5 individuals are allowed to be exported from Sandbox. Thus, all subgroups used and visible in analysis results must have >=5 individuals to be allowed to download. No individual data points or IDs can be shown.
Most common download request types
1) Case/control analysis results (for instance GWAS results run with SAIGE or REGENIE)
Are allowed to be exported if case and control groups have >=5 individuals. If there are columns that show genotype counts for each variant among cases and controls they can be kept even if the count is <5.
2) Histograms
Each bar shown should be from >=5 individual data. If the bar has other identifiers (like red colored area for females and blue colored area for males), these should also refer to sample groups of >=5 individuals.
3) Curves
Each curve should be drawn from >=5 individuals. For instance in survival curves the N will at some point fall <5 but this is OK if the entire group from which the curve has been drawn from has enough individuals. However, there should not be any other identifiers in the curve unless they also point to >=5 individuals (like colored areas). Please note that curves should not contain vertical bars or any other pointers that show individual events (however a staircase-like curve is allowed as illustrated in the below example).
A slightly-modified real-life example of an approved curve (endpoint name, event details, SNP, and genotype counts changed)

4) Scatter plots
Are allowed if each dot is an average of at least 5 individuals. Sometimes instead of a scatter plot, you can consider something else, for instance a density plot.
Exception: PCA plots (where each data point reflects a single individual) are allowed to be downloaded if total N in plot is >=5.
Slightly modified real-life example of an approved density plot for two variables

5) Pie charts
Each section should derive from >=5 individual's data. If there are other identifiers such as the sector being further divided, the subparts should also be derived from >=5 individuals.
6) Code
Can be exported as long as code has "pure" commands only and not any table or header views of the data analyzed with it. If there are summary stats or counts or similar shown in the code or in comments, the minimum N should be reported. The code can't contain any FinnGen IDs, so they should be removed prior to export.
Some exceptions:
1) Allele/genotype frequencies and counts
Such statistics are allowed to be exported even if the allele/genotype is present in <5 individuals.
The allele counts for SNPs within haplotypes should be derived from a minimum of >=5 individuals. This requirement extends to the haplotype frequencies as well. Please also note that any extra information from a haplotype group must also fulfill N>=5 rule (for instance case/control counts in a haplotype group).
2) TBI files
Binary files are usually not allowed, since admins do not have a way to check them. The tbi files are an exception to this and they can be exported if you have generated one for your summary statistics. However, please keep in mind that these can be generated also outside Sandbox with the data you have downloaded.
3) Basic descriptive statistics
Min/max/median/quartile values shown for instance in box plots often point to single individuals. These can still be currently exported but it is recommended that some fluctuation is added to them especially in cases where other data presented in the study/manuscript causes a danger that someone could be identified by combining multiple pieces of results. Values can be shown either via a boxplot or as a table of exact values. Note, however, that the boxplot should not contain any additional dots that point to single individual values (like outlier dots around min and max values)
A slightly modified real-life example of an approved boxplot (endpoint name and case/control counts changed)

Imaginary example of basic descriptive statistics in a table (would be approved):
ENDPOINT_EVENT_AGE
min
2.56
1st quartile
10.44
median
34.23
mean
33.01
3rd quartile
44.99
max
100.2
Additional things to consider:
1) Identify your subgroups correctly
For instance you could be running a case-control analysis for different PRS bins. Then it is not enough to consider total amount of cases and controls but the groups within each bin should be also >=5.
2) Limited results from <5 groups
Results based on data from at least 5 individuals are generally considered anonymous and can be downloaded. We strongly recommend that all results should have N >= 5.
However, it is possible to export limited information of small groups (N < 5). If there is need for this please consider the following extra responsibilities:
- Evaluate if keeping small groups (N < 5 ) adds scientific value to your study. 
- If there are results from small groups (N < 5), the exact sample counts and other related statistics (such as p-value, percentage and standard deviation) must be concealed by marking "< 5". 
- User must make sure (before even placing a download request) that the concealed data from the small group (N < 5) cannot be deduced from other data in the table (such as total sum of N) or from other parts of the manuscript. 
3) Exception to the N>=5 rule
The N>=5 rule is the primary guideline, and we strongly recommend using the N>=5 rule to ensure the anonymity of your results. In exceptional cases, the N>=3 rule may be applied - but only if it is essential for the analysis and scientific validity of the results.
Please note that applying the N>=3 rule does not override the requirement that only anonymized data may be downloaded from the Sandbox.
If you apply the N>=3 rule to your results, you must also provide an explanation of the measures taken to ensure anonymity. This ensures that the anonymity of the data is thoroughly assessed and properly documented. This explanation must be included as part of your download request.
4) File formats
Admins are able to inspect files that open in the terminal or with most common graphical programs such as excel and word. Files of other formats such as binary files will be rejected as we are not able to inspect them.
5) What files do you actually need
Please restrict your download request to the files that you really need. They all have to go through manual inspection and therefore keeping the files to an absolute minimum will help admins and you will receive the files faster.
6) Timing
Kindly note that we give support for download requests approximately from 08:00 to 16:00 Finnish time. There is no support on weekends or on public holidays. It will usually take a few working days to inspect your file, so last-minute requests will likely not reach you in time.
7) Keep your zip and tar files clean
If there are hidden files or folders or locked files in your request, it will be automatically rejected.
Last updated
Was this helpful?
