Skip to content
130 changes: 130 additions & 0 deletions docs/adr/0005-replace-apache-with-nginx-gunicorn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# ADR 5: Replace Apache with Nginx + Gunicorn for Manager Service

- **Author(s)**: [Mikolaj Kasprzak](https://github.com/MikolajKasprzak)
- **Date**: 2025-10-21
- **Status**: `Accepted`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary concern is that the Manager service runs as root, which can be fixed without major architectural changes. We should first clean up technical debt - simplify/remove init scripts, fix permissions, and remove legacy dependencies - similar to what was done for the Controller in PR #235.

Long-term separation of the HTTP server and Manager service makes sense, so this ADR should focus solely on that aspect.


## Context

The SceneScape Manager service was originally hosted using Apache with mod_wsgi, which introduced several operational and security challenges:

### Key Problems:

1. **Security vulnerabilities**: Apache required root privileges for initialization, creating unnecessary attack surface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache does not need to run as root inside a container. By using a non-privileged port and writable PID/log paths, the container can run entirely as non-root (daemon) while still serving requests and loading all modules.

Switching to NGINX would face the same issue: the master process normally starts as root to bind to port 80 and write the PID file, then drops privileges to the nginx user. Non-root operation is possible if you use a non-privileged port (e.g., 8080) and writable directories.

Example non-root Apache Dockerfile

FROM httpd:2.4

COPY ./index.html /usr/local/apache2/htdocs/index.html

# Change Apache to listen on a non-root port
RUN sed -i 's/^Listen 80/Listen 8080/' /usr/local/apache2/conf/httpd.conf

# Add PidFile directive to use /tmp and make logs directory writable by daemon user
RUN echo "PidFile /tmp/httpd.pid" >> /usr/local/apache2/conf/httpd.conf \
    && chown -R daemon:daemon /usr/local/apache2/logs

# Switch to non-root user after all configuration changes
USER daemon
EXPOSE 8080

CMD ["httpd-foreground", "-DFOREGROUND", "-f", "/usr/local/apache2/conf/httpd.conf"]

Testing

Example index.html:

$ cat index.html 
hello world!

The run.sh convenience test script:

#!/bin/bash

docker build . -t non-root-apache
if [ "$(docker ps -aq -f name=non-root-apache)" ]; then
  docker rm -f non-root-apache >/dev/null 2>&1 || true
fi
docker run -dit --name non-root-apache -p 8080:8080 non-root-apache
docker logs non-root-apache 
curl localhost:8080
ps aux | grep httpd

Result:

$ ./run.sh 
[+] Building 0.0s (9/9) FINISHED                                                                                                                                                                                                                                                                                   docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                         0.0s
 => => transferring dockerfile: 666B                                                                                                                                                                                                                                                                                         0.0s
 => [internal] load metadata for docker.io/library/httpd:2.4                                                                                                                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                            0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                                                                              0.0s
 => [internal] load build context                                                                                                                                                                                                                                                                                            0.0s
 => => transferring context: 31B                                                                                                                                                                                                                                                                                             0.0s
 => [1/4] FROM docker.io/library/httpd:2.4                                                                                                                                                                                                                                                                                   0.0s
 => CACHED [2/4] COPY ./index.html /usr/local/apache2/htdocs/index.html                                                                                                                                                                                                                                                      0.0s
 => CACHED [3/4] RUN sed -i 's/^Listen 80/Listen 8080/' /usr/local/apache2/conf/httpd.conf                                                                                                                                                                                                                                   0.0s
 => CACHED [4/4] RUN echo "PidFile /tmp/httpd.pid" >> /usr/local/apache2/conf/httpd.conf     && chown -R daemon:daemon /usr/local/apache2/logs                                                                                                                                                                               0.0s
 => exporting to image                                                                                                                                                                                                                                                                                                       0.0s
 => => exporting layers                                                                                                                                                                                                                                                                                                      0.0s
 => => writing image sha256:5c40698332da9cf13b01895ae1b4c4380d814c5774f010602a8b6ed67e9039ac                                                                                                                                                                                                                                 0.0s
 => => naming to docker.io/library/non-root-apache                                                                                                                                                                                                                                                                           0.0s
e9a5b9bb2eab519b4c62e1bbeaaf494f09bf44f0bd58b35703dcd43e5854c8cf

AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.2. Set the 'ServerName' directive globally to suppress this message
[Thu Oct 23 10:34:35.363237 2025] [mpm_event:notice] [pid 1:tid 1] AH00489: Apache/2.4.65 (Unix) configured -- resuming normal operations
[Thu Oct 23 10:34:35.363348 2025] [core:notice] [pid 1:tid 1] AH00094: Command line: 'httpd -D FOREGROUND -D FOREGROUND -f /usr/local/apache2/conf/httpd.conf'

hello world!

daemon   2837032 23.0  0.0   6212  4480 pts/0    Ss+  12:34   0:00 httpd -DFOREGROUND -DFOREGROUND -f /usr/local/apache2/conf/httpd.conf
daemon   2837239  0.0  0.0 1997396 2600 pts/0    Sl+  12:34   0:00 httpd -DFOREGROUND -DFOREGROUND -f /usr/local/apache2/conf/httpd.conf
daemon   2837240  0.0  0.0 1997452 3240 pts/0    Sl+  12:34   0:00 httpd -DFOREGROUND -DFOREGROUND -f /usr/local/apache2/conf/httpd.conf
daemon   2837244  0.0  0.0 1997396 2600 pts/0    Sl+  12:34   0:00 httpd -DFOREGROUND -DFOREGROUND -f /usr/local/apache2/conf/httpd.conf
jdanieck 2837332  0.0  0.0   6548  1920 pts/1    S+   12:34   0:00 grep httpd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully aware that apache can be run without root permissions. Read it more carefully and read init script and you will see that apache in container is run as non user. Root user is used to configure apache and creating non root user (during init script) for that because apache is installed from apt package.

The problem is that there is no point in creating image similar to official httpd based on ubuntu 22:04 that manager use.
The image is already very big does resemble a monolithic approach rather than following container best practices. Why lost time on creating that image when we can create separate container that use nginx/httpd image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate goal was to remove the root user for this release with minimal changes, and that’s exactly what I propose β€” it’s a meaningful security and compliance improvement worth the effort. Broader architectural redesigns can be addressed later at lower-priority. Let’s take this discussion offline as I see it requires more discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read it more carefully

@MikolajKasprzak , please review the CODE_OF_CONDUCT.md, and I hope you'll apply it in your communication next time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I offended you - no harm was intended. I wanted to clarify that Apache requires root privileges for initialization (in the context of our current solution), whereas @jdanieck's comment suggests he interpreted the documentation as saying Apache needs to run as root.

2. **Complex configuration**: Multi-layered Apache configuration with mod_wsgi, SSL termination, and proxy rules was difficult to maintain
3. **Volume permission issues**: Dynamic UID/GID changes led to unreliable file system permissions
4. **Deployment complexity**: Monolithic container with Apache + Django made scaling and debugging difficult
5. **Resource overhead**: Apache's process model was heavier than needed for a Python web application

### Technical Debt:

- Complex initialization scripts (`scenescape-init`, `webserver-init`) with root privilege requirements
- Dynamic user/group ID management causing permission denied errors
- Mixed responsibilities in single container (web server + application server)
- Legacy configuration files and unused dependencies

The system needed a more secure, maintainable, and cloud-native architecture aligned with modern containerization best practices.

## Decision

We will replace the Apache + mod_wsgi architecture with a **Nginx + Gunicorn** separation of concerns approach:

### New Architecture:

#### Docker Compose Deployment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use mermaid diagrams to depict all deployments mentioned in the doc

  • the original deployment (in the context)
  • proposed compose deployment
  • proposed k8s deployments
  • (optionally) alternatives


1. **Manager Container**:
- Runs Gunicorn WSGI server as unprivileged user
- Serves Django application on internal port 8000
- Simplified entrypoint script without root privileges

2. **Nginx Container**:
- Handles SSL/TLS termination (HTTPS on port 443)
- Serves static files efficiently
- Proxies dynamic requests to Gunicorn
- Manages WebSocket proxy for MQTT broker
- Runs as separate service for better isolation

#### Kubernetes Deployment:

1. **Sidecar Pattern**:
- **Django Container**: Gunicorn WSGI server on port 8000
- **Nginx Sidecar**: Static files and internal routing within Pod
- **Broker**: Kubernetes ingress should be sufficient no need for sidecar
- Shared volumes for static files and media between containers
- Communication over localhost within Pod

2. **Kubernetes Ingress**:
- Handles external traffic routing and SSL/TLS termination
- Can manage certificate lifecycle with cert-manager
- Routes HTTP/HTTPS traffic to nginx sidecar
- Provides load balancing and high availability

3. **Volume Management**:
- Init container sets proper permissions once at startup
- Fixed UID/GID (1000:1000) eliminates dynamic user changes
- Shared emptyDir volumes for static files within Pod

### Configuration Changes:

#### Docker Compose:

- Replace complex Apache config with simple nginx.conf
- Eliminate `webserver-init` and `scenescape-init` scripts
- Use single `entrypoint.sh` for Django initialization
- Add CSRF trusted origins for reverse proxy setup

#### Kubernetes:

- Helm chart with sidecar nginx configuration
- Kubernetes Ingress resource for external access
- ConfigMaps for nginx configuration
- Init containers for static file collection
- Service mesh ready architecture (if there is requirement for TLS in internal network)

## Alternatives Considered

### Option A: Nginx + Gunicorn with Kubernetes Sidecar (Selected)

- **Pros**: Industry standard, security best practices, clean separation, excellent performance, Kubernetes-native
- **Cons**: Requires container architecture changes, initial migration effort, slightly more complex Pod spec

### Option B: Pure Kubernetes Ingress (Considered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like us to explore a more Kubernetes-native approach before making a decision β€” we might even need a POC. What specific challenges are we seeing with WebSocket/MQTT support? There are many Ingress controller implementations, so perhaps it’s just a matter of choosing the right one.

Also, in recent Kubernetes versions, the Gateway API
was introduced as a more advanced alternative to Ingress, offering richer routing capabilities. It might be worth considering for our use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Gateway IP it provides the same capabilities as old Ingress approach.

I can extend description of this section. We always want to use Kubernetes ingress, we just need second nginx as side container or different approach to host our static files. You shouldn't use Ingress for that.

Challenges with configuring timeouts and production ready config for websockets connections nothing impossible or time consuming.


- **Pros**: Fully cloud-native, managed SSL, automatic scaling
- **Cons**: Complex static file handling, limited WebSocket support, MQTT proxy challenges

### Option C: Use WhiteNoise

- **Pros**: Can be used on top of Option B, static files handled by Whitenoise, no need for nginx sidecar
- **Cons**: Manager code changes

## Consequences

### Positive

- **Enhanced Security**: No root privileges required in application container
- **Simplified Maintenance**: Clean separation of concerns
- **Improved Scalability**: Independent scaling of web server and application server
- **Reduced Attack Surface**: Minimal container with only necessary components
- **Easier Debugging**: Clear separation between infrastructure (nginx) and application (django)
- **Volume Reliability**: Fixed permissions eliminate dynamic UID/GID issues for manager and apache user
- **Kubernetes Native**: Sidecar pattern enables proper cloud-native deployment
- **TLS Management**: Kubernetes Ingress with cert-manager or different solution for automatic certificate lifecycle or we can stay with current approach

### Negative

- **Migration Effort**: Requires updating deployment scripts and documentation
- **Two Containers**: Slightly more complex docker-compose setup
- **Initial Setup**: Need to configure nginx proxy rules and SSL certificates
- **Compatibility**: May require CSRF and WebSocket configuration adjustments

## References

- [Nginx official documentation](https://nginx.org/en/docs/)
- [Gunicorn deployment guide](https://docs.gunicorn.org/en/stable/deploy.html)
- [Django production deployment best practices](https://docs.djangoproject.com/en/4.2/howto/deployment/)
- [Container security best practices](https://kubernetes.io/docs/concepts/security/overview/)
- [Kubernetes Sidecar pattern](https://kubernetes.io/docs/concepts/workloads/pods/)
- [Kubernetes Ingress controllers](https://kubernetes.io/docs/concepts/services-networking/ingress/)
- [Kubernetes Init Containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
Loading