Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Refactor Dockerfile & entrypoint #8923

Merged
merged 88 commits into from
Feb 13, 2025
Merged

ci: Refactor Dockerfile & entrypoint #8923

merged 88 commits into from
Feb 13, 2025

Conversation

upbqdn
Copy link
Member

@upbqdn upbqdn commented Oct 10, 2024

Motivation

CI contains a bunch of nits:

  • The entrypoint script quietly uses its own Zebra config if the user passes an invalid path to the config. This causes some of our CI tests to pass, even though they shouldn't since they're passing non-existent configs to Zebra in Docker.
  • Some tests for the scanner don't run.
  • The suite for all unit tests runs twice unnecessarily.
  • We no longer have experimental features in Zebra, but we still have a failing workflow that tries to build a Docker image with experimental features.
  • Zebra runs under root inside Docker, which is a security risk.
  • The runtime Docker stage for production images contains unneeded packages, making the images larger than necessary and increasing the surface for security vulnerabilities.
  • The deps Docker stage for tests contains unneeded packages, which slows CI down.
  • The entrypoint script contains unused variables and is rather complex.
  • The EXPOSE instruction contains two ports, but Zebra uses only one of them at a time, and it doesn't contain some other ports Zebra uses.
  • Rust features are passed from CI to Dockerfile and entrypoint in many variables. In the Dockerfile, we currently have:
    • FEATURES (which comes from RUST_PROD_FEATURES in CI)
    • TEST_FEATURES (which comes from RUST_TEST_FEATURES in CI)
    • EXPERIMENTAL_FEATURES (which should probably come from RUST_EXPERIMENTAL_FEATURES in CI, containing "shielded-scan", but it doesn't, so the Dockerfile uses the default value "journald prometheus filter-reload")
    • ENTRYPOINT_FEATURES = "$FEATURES $TEST_FEATURES"
    • ENTRYPOINT_FEATURES_EXPERIMENTAL = $ENTRYPOINT_FEATURES $EXPERIMENTAL_FEATURES
  • CI uses some vars that don't exist in the entrypoint.
  • The RPC server is enabled by default in our production Docker images, even though Zebra doesn't enable it by default.

Close #9210.

Solution

  • Fix the tests for the scanner.
  • Don't run all unit tests twice.
  • Use only one variable to pass Rust features from CI to Dockerfile and entrypoint.
  • Remove the workflow for building Docker images with experimental features.
  • Create a non-privileged system user in the runtime Docker stage and switch to it.
  • Don't use gosu.
  • Remove all packages from the runtime stage.
  • Remove unneeded packages from the deps stage
  • Fix some malfunctioning CI tests.
  • Don't use the EXPOSE instruction in Docker.
  • Bump the Rust version in Dockerfile.
  • Change the location of the entrypoint in Docker images from /etc/zebrad to /usr/local/bin.
  • Refactor the structure of the entrypoint; remove redundant env vars, and add docs.
  • Explicitly specify the location of the conf and cache dirs in Docker according to https://specifications.freedesktop.org/basedir-spec/latest/.
  • Prepare the Dockerfile and entrypoint files for deploying a Testnet mining instance.
  • Rename the ZEBRA_CACHED_STATE_DIR env var to ZEBRA_CACHE_DIR since that dir no longer contains only the state but also the network cache and the cookie file.
  • Don't enable the RPC server by default. This is a significant breaking change for users.
  • Refactor how we configure Zebra inside Docker. This is a breaking change.

Tests

  • Manually test that zebrad runs under the new zebra user:

    Running

    docker build -f docker/Dockerfile --target runtime -t zebra:local .
    docker run -d --rm --name zebra_local zebra:local
    docker exec -it -u root zebra_local bash
    apt-get update && apt-get install -y procps
    ps aux

    displays

    USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    zebra          1 86.3  2.7 6605720 2691512 ?     Ssl  09:49  31:03 zebrad -c /etc/zebrad/zebrad.toml
    root         150  0.0  0.0   4188  3368 pts/0    Ss   10:23   0:00 bash
    root         438  0.0  0.0   8088  4044 pts/0    R+   10:25   0:00 ps aux
    

PR Checklist

  • The PR name is suitable for the change log.
  • The solution is tested.
  • The PR has a priority label.

@upbqdn upbqdn added C-bug Category: This is a bug A-devops Area: Pipelines, CI/CD and Dockerfiles C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG P-Medium ⚡ labels Oct 10, 2024
@upbqdn upbqdn self-assigned this Oct 10, 2024
@upbqdn upbqdn requested a review from a team as a code owner October 10, 2024 10:44
@upbqdn upbqdn requested review from arya2 and removed request for a team October 10, 2024 10:44
@upbqdn upbqdn marked this pull request as draft October 10, 2024 10:45
@upbqdn upbqdn removed the request for review from arya2 October 10, 2024 10:45
@oxarbitrage oxarbitrage added the do-not-merge Tells Mergify not to merge this PR label Oct 10, 2024
Copy link
Member

@gustavovalverde gustavovalverde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my comments have been addressed. LGTM!

@mpguerra mpguerra removed the request for review from oxarbitrage February 13, 2025 09:04
mergify bot added a commit that referenced this pull request Feb 13, 2025
@mergify mergify bot merged commit 4132c0e into main Feb 13, 2025
141 of 142 checks passed
@mergify mergify bot deleted the docker-refactor branch February 13, 2025 09:15
@upbqdn upbqdn removed the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Feb 13, 2025
gustavovalverde pushed a commit that referenced this pull request Feb 13, 2025
* Refactor formatting & docs

* Refactor the `runtime` stage in Dockerfile

* Remove unused code from `entrypoint.sh`

* Simplify `entrypoint.sh` setup

* Revise docs & formatting

* Adjust default values for env vars

* Bump Rust v from 1.79 to 1.81 in Dockerfile

* Refactor `entrypoint.sh`

* Refactor `Dockerfile`

* Add TODOs for monitoring stage to Dockerfile

* Refactor `Dockerfile`

* Add TODOs for monitoring stage to Dockerfile

* Fix a typo

* Allow running `zebrad` in test mode

* Allow custom config for `zebrad` in test mode

* Remove `curl` from the `runtime` Docker image

* Remove redundant echos

* Remove a malfunctioning CD test

The test was using a custom config file set in `test_variables`.
However, the file was not included in the Docker image, and the
entrypoint script created a new, default one under the original file's
path. Zebra then loaded this new file, and the test passed because the
pattern in `grep_patterns` matched Zebra's output containing the
original path, even though the config file was different.

* Remove a redundant CI test

* Remove all packages from the `runtime` stage

* Docs cosmetics

* Clarify docs

* Bump Rust version

* Remove a security note

* Explicitly specify network cache dir

* Explicitly specify cookie dir

* Set UID, GID and home dir for the `zebra` user

* Set a working dir for the `zebra` user

* Don't remove `FEATURES`

* Try re-introducing the `testnet-conf` check

* `ZEBRA_CACHED_STATE_DIR` -> `ZEBRA_CACHE_DIR`

This dir doesn't hold only the state cache anymore, but also the cache
for network peers, and the cookie file.

* Refactor the dir structure

* Check that `ZEBRA_CONF_PATH` exists in the image

* Improve the check for `ZEBRA_CONF_PATH`

* Use different flag in the `ZEBRA_CONF_PATH` check

* Simplify the `ZEBRA_CONF_PATH` check

* Fix spelling

* Comment out the `testnet-conf` CI check

* Add commented out `test-zebra-conf-path` CI check

* Reintroduce `testnet-conf` CI check

* Update the `custom-conf` CI check

* Add `v2.1.0.toml` conf file

* Refine the `v2.1.0.toml` conf file

* Remove `ZEBRA_LISTEN_ADDR` from the entrypoint

* Remove `ZEBRA_CHECKPOINT_SYNC` from the entrypoint

* Stop supporting configuration of the RPC port

* Add default conf file

* Prepare Zebra's config in the entrypoint script

* Remove unneeded packages from the `deps` target

* Docs cosmetics

* Use only `$FEATURES` in entrypoint

* Simplify handling of Rust features

* Add a TODO

* Add CI debug statements

* Don't require test vars in conf test

* Reintroduce `protoc`

* Remove `-e NETWORK`

* Remove `ZEBRA_FORCE_USE_COLOR=1`

* Remove `ZEBRA_CACHE_DIR=/var/cache/zebrad-cache`

* Reintroduce the "custom-conf" test

* Set up test env the same way as prod

* Don't repeatedly check for conf file in entrypoint

* Simplify file ownership in Dockerfile

* Fix checkpoint tests in entrypoint

* Fix Zebra config CI tests

* `LIGHTWALLETD_DATA_DIR` -> `LWD_CACHE_DIR`

* Add config for `LWD_CACHE_DIR` to Dockerfile

* `/var/cache/zebrad-cache` -> `~/.cache/zebra`

* `var/cache/lwd-cache` -> `/home/zebra/.cache/lwd`

* Remove `LOG_COLOR=false` from GCP setup

* Don't specify `LWD_CACHE_DIR` in CI tests

* Don't switch to `zebra` user for tests in Docker

* Join "experimental" and "all" tests in CI

* Remove outdated docs

* Refactor tests with fake activation heights

* Fix tests for scanner
elijahhampton pushed a commit to elijahhampton/zebra that referenced this pull request Feb 25, 2025
* Refactor formatting & docs

* Refactor the `runtime` stage in Dockerfile

* Remove unused code from `entrypoint.sh`

* Simplify `entrypoint.sh` setup

* Revise docs & formatting

* Adjust default values for env vars

* Bump Rust v from 1.79 to 1.81 in Dockerfile

* Refactor `entrypoint.sh`

* Refactor `Dockerfile`

* Add TODOs for monitoring stage to Dockerfile

* Refactor `Dockerfile`

* Add TODOs for monitoring stage to Dockerfile

* Fix a typo

* Allow running `zebrad` in test mode

* Allow custom config for `zebrad` in test mode

* Remove `curl` from the `runtime` Docker image

* Remove redundant echos

* Remove a malfunctioning CD test

The test was using a custom config file set in `test_variables`.
However, the file was not included in the Docker image, and the
entrypoint script created a new, default one under the original file's
path. Zebra then loaded this new file, and the test passed because the
pattern in `grep_patterns` matched Zebra's output containing the
original path, even though the config file was different.

* Remove a redundant CI test

* Remove all packages from the `runtime` stage

* Docs cosmetics

* Clarify docs

* Bump Rust version

* Remove a security note

* Explicitly specify network cache dir

* Explicitly specify cookie dir

* Set UID, GID and home dir for the `zebra` user

* Set a working dir for the `zebra` user

* Don't remove `FEATURES`

* Try re-introducing the `testnet-conf` check

* `ZEBRA_CACHED_STATE_DIR` -> `ZEBRA_CACHE_DIR`

This dir doesn't hold only the state cache anymore, but also the cache
for network peers, and the cookie file.

* Refactor the dir structure

* Check that `ZEBRA_CONF_PATH` exists in the image

* Improve the check for `ZEBRA_CONF_PATH`

* Use different flag in the `ZEBRA_CONF_PATH` check

* Simplify the `ZEBRA_CONF_PATH` check

* Fix spelling

* Comment out the `testnet-conf` CI check

* Add commented out `test-zebra-conf-path` CI check

* Reintroduce `testnet-conf` CI check

* Update the `custom-conf` CI check

* Add `v2.1.0.toml` conf file

* Refine the `v2.1.0.toml` conf file

* Remove `ZEBRA_LISTEN_ADDR` from the entrypoint

* Remove `ZEBRA_CHECKPOINT_SYNC` from the entrypoint

* Stop supporting configuration of the RPC port

* Add default conf file

* Prepare Zebra's config in the entrypoint script

* Remove unneeded packages from the `deps` target

* Docs cosmetics

* Use only `$FEATURES` in entrypoint

* Simplify handling of Rust features

* Add a TODO

* Add CI debug statements

* Don't require test vars in conf test

* Reintroduce `protoc`

* Remove `-e NETWORK`

* Remove `ZEBRA_FORCE_USE_COLOR=1`

* Remove `ZEBRA_CACHE_DIR=/var/cache/zebrad-cache`

* Reintroduce the "custom-conf" test

* Set up test env the same way as prod

* Don't repeatedly check for conf file in entrypoint

* Simplify file ownership in Dockerfile

* Fix checkpoint tests in entrypoint

* Fix Zebra config CI tests

* `LIGHTWALLETD_DATA_DIR` -> `LWD_CACHE_DIR`

* Add config for `LWD_CACHE_DIR` to Dockerfile

* `/var/cache/zebrad-cache` -> `~/.cache/zebra`

* `var/cache/lwd-cache` -> `/home/zebra/.cache/lwd`

* Remove `LOG_COLOR=false` from GCP setup

* Don't specify `LWD_CACHE_DIR` in CI tests

* Don't switch to `zebra` user for tests in Docker

* Join "experimental" and "all" tests in CI

* Remove outdated docs

* Refactor tests with fake activation heights

* Fix tests for scanner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug P-Medium ⚡
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove CI functionality related to experimental features
5 participants