diff --git a/README.md b/README.md index e9a38d3..bda0be7 100644 --- a/README.md +++ b/README.md @@ -1,406 +1,20 @@ -# Docker template - -This repository is a template for managing a Docker image with automated -building and publishing via GitHub Actions. - -The README here also contains a guide on getting-started with Docker/Singularity containers on the -[Uni Hamburg (UHH) Maxwell cluster](https://docs.desy.de/maxwell/), which -should also be applicable to other HPC clusters. - -If you know already what Docker and Singularity are, you can go straight to the -[Quick-start](#quickstart) section. - - - -## TL;DR - -- Docker images allow users to create *isolated* and *reproducible* environments -- Especially the *reproducible* part is crucial for computing in Science -- Singularity is a container runtime that is installed on most HPC clusters and - can run Docker images -- Using docker/singularity containers can be seen as a more robust alternative - to `conda` environments (if we just talk about creating python environments) -- Once set up, singularity allows you and your colleagues to use exactly the - same environment - **no more "it works on my machine"** - -## Table of contents - - - -- [Docker template](#docker-template) - - [TL;DR](#tldr) - - [Table of contents](#table-of-contents) - - [Requirements](#requirements) - - [Quickstart](#quickstart) - - [What is Docker?](#what-is-docker) - - [What is Singularity?](#what-is-singularity) - - [Running Docker containers on Maxwell](#running-docker-containers-on-maxwell) - - [Mandatory configuration](#mandatory-configuration) - - [Running your first container](#running-your-first-container) - - [Creating your own images](#creating-your-own-images) - - [Setting up the DockerHub repo](#setting-up-the-dockerhub-repo) - - [Setting up the GitHub repo](#setting-up-the-github-repo) - - [Versioning your images](#versioning-your-images) - - [Pulling the image to Maxwell (or any other HPC)](#pulling-the-image-to-maxwell-or-any-other-hpc) - - [Set up VSCode for remote development with singularity](#set-up-vscode-for-remote-development-with-singularity) - - [`Dockerfile`](#dockerfile) - - [`.ssh/config` file setup](#sshconfig-file-setup) - - [VSCode settings](#vscode-settings) - - - -**Note**: -Singularity has been renamed to Apptainer some time ago, but the Maxwell cluster -still uses a version which is called `singularity`. Just so you won't be confused when -you google stuff about commands etc. at some point. - -## Requirements - -**For running containers on the cluster**: - -- Access to the cluster - -**For creating your own containers**: - -- A GitHub account -- A DockerHub account - -Strictly speaking, you don't need a GitHub account to create your own containers, -but it is a good idea to use GitHub to version your Dockerfiles and to use GitHub -Actions to automatically build and push your images to DockerHub. - -## Quickstart - -- Create a DockerHub repo with the name of your image -- Create a GitHub repo from this template -- Add the DockerHub username, repo and token to the GitHub repo secrets/variables - - `DOCKERHUB_TOKEN`: your DockerHub token (as secret) - - `DOCKERHUB_USERNAME`: your DockerHub username (as variable) - - `DOCKERHUB_REPO`: the name of your image / repo on DockerHub (as variable) -- Adjust the `Dockerfile` and `docker-publish.yml` files to your needs - -## What is Docker? - -Given that this is a getting-started guide for physicists, who are probably -looking for a way to create reproducible environments for their data analysis, -you can think of Docker as a way to create reproducible python environments. - -If you are familiar with `conda`, you can think of Docker as a more robust -alternative to `conda` environments. - -There is a nice video on YouTube that explains the basics of Docker in 100 -seconds: -[Docker in 100 seconds](https://www.youtube.com/watch?v=Gjnup-PuquQ). - -## What is Singularity? - -Singularity is a container runtime that can run Docker containers. -Usually, you would use Singularity to run containers on HPC clusters where you don't -have root access and can't install Docker. - -Singularity reduces the isolation of Docker containers slightly, in terms of -which used ID is used within the container and what parts of the host file system -you can access. In Singularity, your home directory is mounted into the container -by default and the user in the container is the same as the user on the -host system. -This simplifies the process of running containers on HPC clusters, because you don't -have to worry about mounting directories and permissions. - -[Nice article that briefly compares Docker and Singularity](https://pythonspeed.com/articles/containers-filesystem-data-processing). - -## Running Docker containers on Maxwell - -This section will guide you through the process of setting everything up for -running Docker containers on Maxwell using Singularity. - -For this, we assume that you have the name/URL of a docker image that you want -to run. -Later sections will explain how you can create your own images and can -build/version/manage them using GitHub and DockerHub. - -### Mandatory configuration - -In order to avoid running into storage limit problems, we will assign the -singularity cache to a directory in your Dust directory (this is -Maxwell-specific). -You can of course also use a different directory if you want, but make sure -that you have enough space there. - -First we create a directory where we want to store the cache and the temporary -files created by singularity: - -```bash -mkdir -p /gpfs/dust/maxwell/user/$USER/.singularity/cache -mkdir -p /gpfs/dust/maxwell/user/$USER/.singularity/tmp -``` - -Then we need to tell singularity to use these directories by setting the -environment variables `SINGULARITY_CACHEDIR` and `SINGULARITY_TMPDIR`. - -Add the following to your `.bashrc` (or `.zshrc` if you use `zsh`): - -```bash -export SINGULARITY_CACHEDIR=/gpfs/dust/maxwell/user/$USER/.singularity/cache -export SINGULARITY_TMPDIR=/gpfs/dust/maxwell/user/$USER/.singularity/tmp -``` - -### Running your first container - -With this setup, you can now run your first container. -For this, we will use the `hello-world` container from DockerHub. - -```shell -singularity run docker://hello-world -``` - -You should see the following output: - -```shell -INFO: Converting OCI blobs to SIF format -INFO: Starting build... -Getting image source signatures -Copying blob 719385e32844 done -Copying config 0dcea989af done -Writing manifest to image destination -Storing signatures -2023/08/30 17:56:13 info unpack layer: sha256:719385e32844401d57ecfd3eacab360bf551a1491c05b85806ed8f1b08d792f6 -INFO: Creating SIF file... -WARNING: passwd file doesn't exist in container, not updating -WARNING: group file doesn't exist in container, not updating - -Hello from Docker! -This message shows that your installation appears to be working correctly. - -To generate this message, Docker took the following steps: - 1. The Docker client contacted the Docker daemon. - 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. - (amd64) - 3. The Docker daemon created a new container from that image which runs the - executable that produces the output you are currently reading. - 4. The Docker daemon streamed that output to the Docker client, which sent it - to your terminal. - -To try something more ambitious, you can run an Ubuntu container with: - $ docker run -it ubuntu bash - -Share images, automate workflows, and more with a free Docker ID: - https://hub.docker.com/ - -For more examples and ideas, visit: - https://docs.docker.com/get-started/ - ``` - -**What happened here?** - -When you executed the `singularity run` command, Singularity first downloaded -the Docker image from DockerHub and then converted it to a Singularity image -(`.sif` file). -After that, it executed the image and streamed the output to your terminal. -The output `Hello from Docker!` and the lines after that are the output of -the `hello-world` container. - -## Creating your own images - -This section will guide you through the process of creating your own images -and how to version them using GitHub and DockerHub. - -Versioning your images is very useful, because it allows you to go back to -previous versions of your image if you run into problems with the current -version. - -The main idea of this setup is that you have a GitHub repository that contains -your Dockerfile and a GitHub Action that automatically builds and pushes your -image to DockerHub whenever you push a new commit to the repository. - -For commits on the `main` branch the image will be pushed to DockerHub with -the tag `latest`. -For tagged commits, the image will be also pushed to DockerHub with the tag -``. - -### Setting up the DockerHub repo - -First, create a DockerHub repository. -You can do this by clicking on the `Create Repository` button on the DockerHub -website. - - - -Choose a name for your repository and add a description if you want. -Then click on `Create`. - - - -Afterwards, you need to create a personal access token. -You can do this by clicking on your profile picture in the top right corner -and then clicking on `Account Settings`. - -Then click on `Security` in the left sidebar and then on `New Access Token`. - - - -Choose a name for your token and click on `Generate`. - - - -Copy the token and store it somewhere safe (you won't be able to see it again -after you close the window). - - - -### Setting up the GitHub repo - -You can use this repository as a template for your own repository. -Click on the `Use this template` button on the GitHub website. -Now you need to add the DockerHub API key, username and repository name -to the GitHub repository secrets / variables. - -Go to the repository settings and click on `Secrets` in the left sidebar. -Then click on `New repository secret`. - - - -Add the following secrets: - -- `DOCKERHUB_TOKEN`: the token you created in the previous section (shown in the screenshot - below) - - - -Afterwards, click on "Variables" and add the following variable: - -- `DOCKERHUB_USERNAME`: your DockerHub username -- `DOCKERHUB_REPO`: the name of your image / repo on DockerHub - - -There are two files in this repository that are important for building and -pushing your image to DockerHub: - -- `Dockerfile` -- `.github/workflows/docker-publish.yml` - -The `Dockerfile` is the file that contains the instructions for building your -image (check out the [Docker in 100 seconds](https://www.youtube.com/watch?v=Gjnup-PuquQ) -video linked above for a quick introduction to Dockerfiles). - -Content of the `Dockerfile`: - -```dockerfile -FROM python:3.11 -RUN pip install numpy -``` - -This Dockerfile will create an image based on the `python:3.11` image and -install `numpy` in it. -This will then create the environment that you can use for your data analysis, -send it to your colleagues etc. - -The `docker-publish.yml` file contains the GitHub Action that builds and pushes -your image to DockerHub. -GitHub Actions allows you to automate certain tasks on GitHub, like building -and pushing your Docker image in our case. -Check out the video [CI/CD in 100 seconds](https://www.youtube.com/watch?v=scEDHsr3APg) -for a quick introduction to CI/CD (we just use the automation part here, there -is no testing involved). - -### Versioning your images - -The GitHub Action that we set up in the previous section will automatically -build and push your image to DockerHub whenever you push a new commit to the -repository. -However, it will always push the image with the tag `latest`. - -If you want to version your images, you can do this by creating a new tag -for your commit. -You can do this by clicking on the `Releases` tab in your repository and then -on `Create a new release`. - -### Pulling the image to Maxwell (or any other HPC) - -Now that you have your image on DockerHub, you can pull it to Maxwell and run -it with singularity. - -```shell -singularity run docker:///: -``` - -I.e. if your username is `johndoe`, your repo is `myimage` and your tag is -`v1.0`, you would run: - -```shell -singularity run docker://johndoe/myimage:v1.0 -``` - -The first time you run this command, singularity will download the image from -DockerHub and convert it to a singularity image. -Afterwards, it will execute the image and stream the output to your terminal. - -The conversion can take quite some time once your image gets larger. - -You can also specify a name of the singularity image file, which allows you -to even share that file with colleagues (they just need `read` permission for that -file). - -```shell -singularity build docker:///: ///.sif -``` - -## Set up VSCode for remote development with singularity - -You can also use singularity containers for remote development -in VSCode. - -To set up everything, install the remote development extension pack in VSCode. - -### `Dockerfile` - -If you want to use this image for remote development with VSCode, you need to -install either `curl` or `wget` in the image, because the VSCode server -installation script uses one of these tools to download the server files. - -Add the following line to your `Dockerfile`: - -```dockerfile -RUN apt-get update && apt-get install -y curl wget -``` - -### `.ssh/config` file setup - -Add the following to your `.ssh/config` file: - -``` -Host singularity_image~* - RemoteCommand export SINGULARITY_CACHEDIR=/gpfs/dust/maxwell/user//.singularity/cache && export SINGULARITY_TMPDIR=/gpfs/dust/maxwell/user//.singularity/tmp && singularity shell --nv -B /gpfs/dust/maxwell/user path/to/image.sif - RequestTTY yes - -Host max-wgse-sing singularity_image~max-wgse-sing - HostName max-wgse.desy.de - User -``` - -### VSCode settings - -Add the following to your VSCode settings (you can open the `settings.json` file -via the VSCode command palette with `Ctrl+Shift+P` and then typing -`Preferences: Open User Settings (JSON)`): - -```json -"remote.SSH.serverInstallPath": { - "singularity_image~max-wgse-sing": "/gpfs/dust/maxwell/user//.vscode-container/", -}, -"remote.SSH.enableRemoteCommand": true, -``` - -This will enable the remote command feature in VSCode, which allows you to -run the singularity container on the remote machine and then connect to it. - -It will also set the path to where you want to store the VSCode container files -on the remote machine (specifying this path is recommended, especially if you are -working with multiple containers on the same machine, because otherwise the -VSCode server files will be stored in the home directory, and you will have to -delete them manually if you want to switch to a different container). - -Also create the directory on the remote machine: - -```shell -mkdir -p /gpfs/dust/maxwell/user//.vscode-container/ -``` +# Safe execution enviorenment for AI generated code +This repository contains the Dockerfile offering a safe execution enviornemnet for the ai agents in the '[Agents of Discovery](https://github.com/uhh-pd-ml/AgentsOfDiscovery)' repository. Precompiled images are availble on [Dockerhub](https://hub.docker.com/r/olgarius1/ai_agent). +The container limits the files wich are accesible and writeable by the agent, but it is still possible that it dumps the writable space with data. We do not take responisbility for any damages caused due to agents executed in this environment. +Two versions are availble: +- **latest**: Default version with the following python packages availble: + - numpy + - matplotlib + - pandas + - openai + - pylint + - scipy + - seaborn + - h5py + - tables + - scikit-learn +- **pytorch**: Additionally has pytorch installed to offerer more complex ML capabilities + +For usage please refer to the '[Agents of Discovery](https://github.com/uhh-pd-ml/AgentsOfDiscovery)' repository. + +This repository is based on [this](https://github.com/joschkabirk/docker-template).