Sandbox paths and pipeline mappings

A list of folders accessible in sandbox and their full bucket paths and mappings

Sandbox paths and their bucket locations and mappings

The table below provides the sandbox (interval/IVM) paths, their "mappings", external bucket paths and their descriptions. If using these paths for running pipelines (e.g. setting file or folder locations in analysis input .json files), the Sandbox (IVM) path will not work and only the sandbox mapping and external bucket paths can be used for this purpose.

Sandbox (IVM) path

Sandbox mapping

External bucket path

Description

/finngen/library-green/

LIBRARY_GREEN/

gs://finngen-production-library-green/

FinnGen's core analysis results, summary data, anonymous data

/finngen/library-red/

LIBRARY_RED/

gs://finngen-production-library-red/

Phenotype, Genotype data, individual-level data

/finngen/green/

SANDBOX_GREEN/

gs://fg-production-sandbox-Y_green/

Researchers' own uploaded data.

/finngen/red/

SANDBOX_RED/

gs://fg-production-sandbox-Y-red/

Researchers' own analysis results, the organization's own "red" bucket

/finngen/pipelines/

SANDBOX_PIPELINE/

gs://fg-production-sandbox-Y-pipeline/

Pipeline results.

/home/ivm/

Home disk. Users' personal data and scripts. Cannot be accessed

/finngen/shared/

LIBRARY_SHARED/

gs://finngen-production-library-shared/

Shared data between all Sandboxes (organizations)

/finngen/library-green/finngen_RX/unmodifiable_pipelines/

CUSTOM_GWAS/

gs://library_green/finngen_RX/unmodifiable_pipeline/

Output location of unmodifiable pipeline results.

Note 1. The green, red and pipelines buckets (/finngen/green/, /finngen/red/ and /finngen/pipelines/) are specific to your organisation's bucket and data in these locations cannot be seen by other organisations. If using the external (bucket) path to access these locations, you will need to replace the letter Y with the number of your Sandbox. If you do not know your organisation's sandbox number, you can find it from the green, red and pipelines bucket paths in a file named buckets.txt on your sandbox desktop.

Note 2. For unmodifiable pipeline results, remember to replace the letter X in the sandbox or external path with the release number of the unmodifiable pipeline (e.g. 12).

Note 3. The greendownloads or the greenuploads buckets are not accessible from the Sandbox.

Finding bucket paths and mappings within sandbox

You can remind yourself of your organisation's Sandbox number and their buckets paths from the buckets.txt file on the Sandbox desktop (see Navigating the Sandbox) - the full path of this file is /home/ivm/Desktop/buckets.txt.

The full bucket paths are also stored in environment variables in your terminal environment. For example, to see the bucket paths of the red and pipeline buckets in the terminal, you can use the echo command:

echo $RED_BUCKET

echo $PIPELINE_BUCKET

and use $RED_BUCKET and $PIPELINE_BUCKET to refer those paths within the terminal when performing file operations (e.g. copying, moving, deleting etc.) using the gsutil command.

Copying files to your organisation's red bucket

In order to make files accessible for running analyses in the cloud (e.g. any pipeline submitted using the Pipelines tool), the files first need to be copied to a externally accessible location. For this purpose, the best location is the red bucket, which is specific to each organisation's sandbox environment and can be used to store red (individual-level) or green (summary-level) data. To copy a file to the red bucket, open the terminal ("Terminal Emulator" from the Applications menu) and run the command:

gsutil cp /path/to/file_to_copy.txt $RED_BUCKET/username/

where /path/to/file_to_copy.txt is the full sandbox path of the file you want to copy and username is your sandbox username. If the folder $RED_BUCKET/ doesn't already exist, this command will create it.

To use this file for analyses submitted to the cloud, the path you would need for the input .json file would therefore be SANDBOX_RED/username/file_to_copy.txt, using the mapping format.

It is recommended that you copy files only to your own red bucket folder (i.e. $RED_BUCKET/username/) so that you don't accidentally overwrite other users' files and also that you can find the files again when needed. It is good practice to create subfolders within your own red bucket folder to keep your files organised, e.g. by copying the required files to a specific subfolders. An example could be:

gsutil cp /path/to/myphenotypes.txt.gz $RED_BUCKET/username/Phenotype_data/R12/

PreviousHow to calculate PRS weights for FinnGen data NextIf your pipeline job fails

Last updated 1 month ago

Was this helpful?