Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation how to avoid data modification by tools #14239

Merged
merged 6 commits into from
Jul 4, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions doc/source/admin/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,25 @@ To get started with setting up local data, please see [Data Integration](https:/

File sizes have grown very large thanks to rapidly advancing sequencer technology, and it is not always practical to upload these files through the browser. Thankfully, a simple solution is to allow Galaxy users to upload them via FTP and import those files in to their histories. Configuration for FTP is explained on the [File Upload via FTP](special_topics/ftp.md) page.

### Protect Galaxy against data loss due to misbehaving tools

Tools have access to the paths of input and output data sets which are stored in
``file_path`` and by default the credentials used for running tools are the same
as for running Galaxy. Thus its possible that tools modify data in Galaxy's
``file_path``. Examples for such changes are:

- Addition of additional files, e.g. indices, which is a problem for cleaning up data, because Galaxy does not know about these files.
- Removal of input or output files of the tools. This will create problems with other tools using these data sets (note that most tool repositories use CI tests to to avoid this, but the problem may still occur).

Note that the tool only knows the paths to inputs and outputs, but if using the default configuration for other paths (e.g. configuration directory) also these paths are easily accessible.

There are two approaches to protect Galaxy against this:

- Use different credentials for running tools. This can be configured using the ``real_system_username`` config variable.
- Configure Galaxy to run jobs in a container and enable ``outputs_to_working_directory``. Then the tool will in an environment that allows write access only for the job working dir. All other paths will be accessible read only.

For both more information can be found in the [job configuration](jobs.md) documentatiion and see also [using a compute cluster](cluster.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For both more information can be found in the [job configuration](jobs.md) documentatiion and see also [using a compute cluster](cluster.md).
More information on pulsar configuration can be found in the [job configuration](jobs.md) documentation, and the other two are explained in [using a compute cluster](cluster.md).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's enough information for the pulsar option, jobs.md doesn't really cover much right? Would it maybe be useful to link to https://training.galaxyproject.org/training-material/topics/admin/tutorials/interactive-tools/tutorial.html#securing-interactive-tools (or better, have us extract that pulsar bit and link to that?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with both, but don't feel competent wrt pulsar to move the pulsar bit from GTN.


## Advanced configuration

### Load balancing and web application scaling
Expand Down