Enhancement request: Use deterministic UID/GID in Dockerfile #1555

ghsa-retrieval · 2025-01-21T15:37:43Z

Is your enhancement request related to a problem? Please describe.
The UID and GID in the Dockerfile can change since the user and group are only created by name. This has implications for deployments that rely on identifying the user exactly. For example: Settings for the security context in Kubernetes/Helm charts, such as runAsUser and runAsGroup, cannot be applied, since the UID and GID are not known ahead of time and may change between versions. Similarly configurations to the user namespacing rely on this information.

What are the benefits of the requested enhancement?
The user and group are no longer assigned a non-deterministic ID. You can set up user namespaces in a predictable way.

Describe the solution you would like
Modify the adduser and addgroup commands in the Dockerfile to use a numerical UID and GID instead of a name. The UID and GID should be ones that are not already occupied by the Python base image.

Additional notes
Using a numerical UID and GID instead of name is also recommended according to Docker: https://docs.docker.com/build/building/best-practices/#user

The text was updated successfully, but these errors were encountered:

Signed-off-by: tdruez <[email protected]>

tdruez · 2025-01-27T07:33:39Z

@ghsa-retrieval Could you confirm that the changes at https://github.com/aboutcode-org/scancode.io/pull/1569/files are good enough for your needs?

ghsa-retrieval · 2025-01-27T11:19:23Z

@tdruez Yes, but this change should come with a big warning because it will likely cause issues for existing installations where the container image used a different uid/gid before and stored its data in a volume. This can result in a failure to start due to permissions errors and would require to chown the volume. While this could have happened unintentionally in the past as well, give the non-deterministic nature of the assignment, here it is expected to break.

For comparison, my local compose install without the patch shows the following IDs:
UID = 101
GID = 108

However, I have observed the IDs fluctuating on Kubernetes deployments, so I'm not sure if people just got lucky with their compose deployments in the past or if docker compose does some magic under the hood for this case.

tdruez · 2025-01-27T11:48:47Z

@ghsa-retrieval Thanks for the insight! Do you have any suggestions on how we can address this with minimal impact on existing instances?

ghsa-retrieval · 2025-01-27T12:57:23Z

@tdruez Good question. It seems that docker compose does not provide an option to automatically modify the user permissions, unlike Kubernetes. What you could do is introduce a new service to the docker compose which modifies the permissions before starting web, worker, and nginx. I'm not sure if there is a cleaner solution for this, as this would only have to be run once on update to the new container image version.

Note: This is just a quick example, you would likely want to put the UID and GID in the .env and reference the variables instead of hardcoding them as well as use a properly tagged version for the alpine image. I have also not checked if there are any files placed in the directories which may have been given ownership other than the one of the "app" user. The performance on startup will be impacted if there are many files to modify.

services:
  db:
    image: postgres:13
    env_file:
      - docker.env
    volumes:
      - db_data:/var/lib/postgresql/data/
    shm_size: "1gb"
    restart: always

  redis:
    image: redis
    # Enable redis data persistence using the "Append Only File" with the
    # default policy of fsync every second. See https://redis.io/topics/persistence
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    restart: always
    
  chown:
    image: alpine:latest
    restart: "no"
    command: sh -c "
        chown -R 1000:1000 /opt/scancodeio/.env && 
        chown -R 1000:1000 /etc/scancodeio && 
        chown -R 1000:1000 /var/scancodeio/workspace && 
        chown -R 1000:1000 /var/scancodeio/static"
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
      - static:/var/scancodeio/static/

  web:
    build: .
    command: wait-for-it --strict --timeout=60 db:5432 -- sh -c "
        ./manage.py migrate &&
        ./manage.py collectstatic --no-input --verbosity 0 --clear &&
        gunicorn scancodeio.wsgi:application --bind :8000 --timeout 600 --workers 8 ${GUNICORN_RELOAD_FLAG}"
    env_file:
      - docker.env
    expose:
      - 8000
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
      - static:/var/scancodeio/static/
    depends_on:
      chown:
         condition: service_completed_successfully
      db:
         condition: service_started

  worker:
    build: .
    # Ensure that potential db migrations run first by waiting until "web" is up
    command: wait-for-it --strict --timeout=120 web:8000 -- sh -c "
        ./manage.py rqworker --worker-class scancodeio.worker.ScanCodeIOWorker
                             --queue-class scancodeio.worker.ScanCodeIOQueue
                             --verbosity 1"
    env_file:
      - docker.env
    volumes:
      - .env:/opt/scancodeio/.env
      - /etc/scancodeio/:/etc/scancodeio/
      - workspace:/var/scancodeio/workspace/
    depends_on:
      chown:
         condition: service_completed_successfully
      redis:
         condition: service_started
      db:
         condition: service_started
      web:
         condition: service_started

  nginx:
    image: nginx:alpine
    ports:
      - "${NGINX_PUBLISHED_HTTP_PORT:-80}:80"
      - "${NGINX_PUBLISHED_HTTPS_PORT:-443}:443"
    volumes:
      - ./etc/nginx/conf.d/:/etc/nginx/conf.d/
      - /var/www/html:/var/www/html
      - static:/var/scancodeio/static/
    depends_on:
      web:
         condition: service_started
    restart: always

  clamav:
    image: clamav/clamav
    volumes:
      - clamav_data:/var/lib/clamav
      - workspace:/var/scancodeio/workspace/
    restart: always

volumes:
  db_data:
  redis_data:
  clamav_data:
  static:
  workspace:

tdruez added a commit that referenced this issue Jan 27, 2025

Use deterministic UID/GID in Dockerfile #1555

40504bb

Signed-off-by: tdruez <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement request: Use deterministic UID/GID in Dockerfile #1555

Enhancement request: Use deterministic UID/GID in Dockerfile #1555

ghsa-retrieval commented Jan 21, 2025 •

edited

Loading

tdruez commented Jan 27, 2025

ghsa-retrieval commented Jan 27, 2025

tdruez commented Jan 27, 2025

ghsa-retrieval commented Jan 27, 2025 •

edited

Loading

Enhancement request: Use deterministic UID/GID in Dockerfile #1555

Enhancement request: Use deterministic UID/GID in Dockerfile #1555

Comments

ghsa-retrieval commented Jan 21, 2025 • edited Loading

tdruez commented Jan 27, 2025

ghsa-retrieval commented Jan 27, 2025

tdruez commented Jan 27, 2025

ghsa-retrieval commented Jan 27, 2025 • edited Loading

ghsa-retrieval commented Jan 21, 2025 •

edited

Loading

ghsa-retrieval commented Jan 27, 2025 •

edited

Loading