# Managing memory in Sandbox and data filtering tips

Using optimal machine size for tasks performed will save costs as [Sandbox billing](/working-in-the-sandbox/billing-information-and-where-to-find-more-details.md) is also based on the size of the machine used.

The 'Basic Machine' (1 vCPU, 3.75 GB) is good for standard use like [navigating the Sandbox](/working-in-the-sandbox/quirks-and-features/navigating-the-sandbox.md), [building cohorts in Atlas](/working-in-the-sandbox/which-tools-are-available/atlas/detailed-guide/how-to-define-a-cohort-in-atlas.md), and [starting pipelines](/working-in-the-sandbox/running-analyses-in-sandbox.md). Loading phenotype data in R needs a lot of memory and 'Rather Big Machine' (16 vCPUs, 104 GB).

### Saving data

Saving data to your home disk /home/ivm/ in Sandbox consumes the space in home disk that is not dependent on the IVM size. [Checking the space in home disk](#check-space-usage-in-home-disk) and [Resizing the home disk](#resizing-home-disk).

![](/files/SLxPMZCmJtKIjJ0v7MnH)

It is possible to consume more memory than there is in your IVM. When memory runs out IVM gets very slow or stuck. If your IVM is unresponsive you may force your IVM to shut down after you can continue working normally by creating a new IVM (from the ‘Start machine’ button, see figure above).

To force IVM to shut down see if the [start button in the left sidebar](/working-in-the-sandbox/quirks-and-features/how-to-shut-down-your-ivm.md) is available and click it. If the start button is not available contact <humgen-servicedesk@helsinki.fi>. Admin at the service desk can force your IVM to shut down. After the IVM is terminated you can continue working normally by creating a new IVM. **Note! Forcing IVM to shut down will cause loss of all unsaved data. In the worst case forcing IVM to shut down may corrupt your persistent disk causing loss of all data at your /home/ivm folder.**

To plan memory usage, you can check how much memory there is in your IVM. Open Terminal Emulator and type `free -m`

![](/files/0D5eyb1r7833O0Wk2Bti)

To check memory and cpu usage per process type in Terminal `top` and `q` to exit.

![](/files/KfWwKlKmIrvHbbYwzhN8)

### Memory managing in RStudio

Reading big files like phenotype, genotype, or longitudinal data into RStudio will consume a lot of memory and requires the ‘Rather Big Machine’ (16 vCPUs, 104 GB). You can check how much memory RStudio session is currently using and how much you have left from the memory usage widget in RStudio Environments. Here for example the RStudio session is currently using 232 MiB. For a detailed report of memory usage click the small triangle to see a drop-down menu and select "Memory Usage Report". Here current session is using 43% of the memory while 57% of the memory is free.

![](/files/Q5ooO9SwbdJ4uflJIBAc)

![](/files/nLG2aJZ8WhRRpPSHtSNo)

### Filtering in Terminal

Filtering data with Unix commands consumes considerably less memory than filtering data with R. For example, filtering with RStudio needs loading e.g. detailed longitudinal data to RStudio and consequently ‘Rather Big Machine’. On the contrary, the same filtering can be done with ‘Basic Machine’ using Terminal. After the data is prefiltered in Terminal it may be loaded to R/RStudio for further analyses possibly with Basic Machine.

For example, to filter with Linux command for J45 (ICD10 code for Asthma) in Terminal

`zcat path/to/finngen_R8_detailed_longitudinal.txt.gz | grep J45 > my_result_file.txt`

The filtered file containing all rows with the text “J45” will appear in your /home/ivm directory. The result file can be loaded to R/RStudio and continue analyzing there. To load the pre-filtered table in R/Rstudio

`library(R.utils)`

`my_result_file = fread("/home/ivm/my_result_file.txt", data.table = FALSE)`

**NB!! If you filter at the command line be careful in R to check the code set. For example, F29 = psychosis in ICD10 and eye discomfort in ICPC2 so you will get both sets filtering simply like this at the command line and will need to check in R that the code set is correct.**

We may not need all the columns in the file to perform our analyses. Subsetting 10 columns to 5 columns will cut the size of the file in half.

To head columns

![](/files/Vs317vHjkHYUIG1SDWsK)

To select columns

![](/files/mM3LiqzYY0qVaMubmE2X)

### Check free space in Home Disk

Home disk is the users' private disk (/home/ivm/ folder in Sandbox) where users can save their own files. No other users besides the account owner have access to the private home disk. By default, the size of the home disk is 10 GB. The amount of space in the home disk is not dependent on the IVM size (Basic, Advanced, or Rather Big Machine).

To check the size and amount of space in your home disk type in Terminal

`df -h /home`

The output will give the size of the home disk, used space, available space, percent of space used, and the folder

![](/files/1VFs36AyzhQkOdAvq9Mz)

If the space in home disk is running out it is recommended to free space by removing unneeded files and folders e.g. with the `rm` command in Terminal. Note that the `rm` command is irreversible. Be careful when using `rm` as restoration of removed files and folders is not possible after `rm` command. Using `-i` flag option will prompt before removal.

To remove a file

`rm -i my_file.txt`

![](/files/fCVzB4ANrqswlM2gFkYZ)

To remove a folder and all of its content

`rm -ri my_folder`

It is also possible to [resize the home disk](#resizing-home-disk) to enable more space for the user's files and folders.

The trash bin may hold a lot of files consuming home disk space. Make sure to clear the trash bin from time to time.

Docker images and containers can take lot of space in user home disk. These most commonly accumulate when running Cohort Operation and other container based applications in Sandbox. Original container images are stored in shared cloud repository and pulled there automatically to the IVM when running the application like Cohort Operation hence these can be relatively safely removed from user home disk to free disk space. Below you can find few relevant code examples to manage docker containers and images. More information can be found from docker web pages and from this [handbook page](https://finngen.gitbook.io/finngen-handbook/working-in-the-sandbox/quirks-and-features/docker-images/how-to-get-a-new-docker-image-to-sandbox). To manage docker resources in the IVM "docker >resource< prune" command is very help full (see details [here](https://docs.docker.com/config/pruning/)). Following code would clean the docker resources (images, containers, and networks) from the IVM.

`docker system prune`

### Resizing Home Disk

You can resize your home disk from the front page of the Virtual Machine from ![](/files/HSnPZW5DOzYiT5QDnkgi)

**Note that the change is permanent!** Once you have upgraded your home disk size, you can’t reduce it.

By default, the size of the home disk is 10 GB. Open Terminal and type

`df -h | grep home`

![](/files/cX6kgGiTrCyhJegT3CQ6)

Then close IVM, resize home disk size up to 20 GB, and start the smallest IVM again **Note that once done you can’t revert this action.** Your smallest IVM will permanently be 20 GB instead of 10 GB and it will [cost](/working-in-the-sandbox/billing-information-and-where-to-find-more-details.md) accordingly.

![](/files/xv6zsT4tBo7E7JfA5lZW)

After the home disk is resized repeating the command `df -h | grep home` shows that IVM now has in total 20 G of memory from which 95 M is used and 19 G is free.

![](/files/Hbi6xzpjL0hak5vLW839)

To see how many CPU type `lscpu | grep 'CPU(s):'`. Note that CPU has increased from 1 to 2.

![](/files/zoRAZcyV2zXOuqHAb6EJ)

Before you resize your home disk, please consider that it can’t be reverted and it will affect on your IVM [costs](/working-in-the-sandbox/billing-information-and-where-to-find-more-details.md).

### Attaching extra disk for temporary storage

Since Sandbox version 12.9 users can attach an additional disk to the IVM with a customizable size between 10-10,000 GB. Disk management is accessed from the IVM selection page (only visible when the IVM is shut down). Click "Attach Disk" to begin the process.

<div align="center"><img src="/files/Hs5WDVgjUOjs5AYmuHXd" alt=""></div>

The attached disk is visible in the IVM selection page and can be deleted from there.

<figure><img src="/files/fWq5g2i9FWkHhvdCItTH" alt=""><figcaption></figcaption></figure>

Once created the disk can be found in the IVM under "/mnt/persistent-disk-"N"". **Removing the disk will cause all data to be deleted** and naturally the costs linked to the disk space will not accumulate anymore. Be sure that you have copied all results to your /home/ivm or red bucket before deleting the disk as after this the data is cannot be recovered.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.finngen.fi/working-in-the-sandbox/running-analyses-in-sandbox/managing-memory-in-sandbox-and-data-filtering-tips.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
