Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container-toolkit does not modify the containerd config correctly when there are multiple instances of the containerd binary #982

Open
tariq1890 opened this issue Mar 11, 2025 · 5 comments

Comments

@tariq1890
Copy link
Contributor

tariq1890 commented Mar 11, 2025

NOTE: This issue is specific to container-toolkit when run as an OCI container via gpu-operator

To enable the nvidia-specific container runtime handlers, the toolkit must overlay config changes on the existing containerd configuration. For the toolkit to do this, it must retrieve the existing containerd config. Currently, it executes the following steps to achieve this:

i) Run containerd config dump (chroot into the host system)
ii) If i) fails, fall back to retrieving the config TOML from the file specified in CONTAINERD_CONFIG

This algorithm falls short in the scenario of multiple containerd instances running on the same host.

Consider the example of a k0s-based node which runs off a containerd embedded within the k0s system.

The same system also has a vanilla containerd installed. So we have containerd binaries in two locations

  1. /usr/bin/containerd
  2. /var/lib/k0s/bin/containerd

In this case, we expect the toolkit to modify the config of the k0s-embedded containerd. What ends up happening is - Step i) of the algorithm is run, which executes the containerd binary located in /usr/bin (this binary path is chosen since it is resolved via the PATH env var).

In this case, we would have wanted the toolkit to fall back to step ii., as it would then retrieve the desired config from the k0s-managed containerd; but the fallback is never triggered as step i. is successful.

Reproduction

To reproduce this issue

i. Install a vanilla containerd package on the host and ensure that it's running
ii. Install k0s and setup a k0s cluster.
iii. Install gpu-operator (which includes toolkit) with the necessary config overrides to point to k0s

helm install gpu-operator -n gpu-operator --create-namespace \
  nvidia/gpu-operator $HELM_OPTIONS \
    --version=v24.9.2 \
    --set toolkit.env[0].name=CONTAINERD_CONFIG \
    --set toolkit.env[0].value=/etc/k0s/containerd.d/nvidia.toml \
    --set toolkit.env[1].name=CONTAINERD_SOCKET \
    --set toolkit.env[1].value=/run/k0s/containerd.sock \
    --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
    --set toolkit.env[2].value=nvidia
@diamonwiggins
Copy link

diamonwiggins commented Mar 11, 2025

Closed a duplicate issue I had created in the gpu operator repo - NVIDIA/gpu-operator#1323. Some additional context and a temporary workaround below for others running into this.

Additional Context

GPU Operator v24.9.x switched to fetching container runtime configuration via CLI, causing failures specifically in Kubernetes distributions like k0s that statically compile containerd binaries. Although issue #777 added fallback support if containerd CLI doesn't exist, the current problem persists because there is no explicit way to always enforce using the configuration file instead of the containerd CLI(to my knowledge).

References:
Original related issue (#1109)
PR to implement fallback to file base retreival(#777)

Temporary Workaround

Either:

  • Downgrade GPU Operator to v24.6.2 and override the driver version to 550.127.05, or
  • Use GPU Operator v24.9.2 and downgrade NVIDIA Container Toolkit to version 1.16.2.
helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --version=v24.9.2 \
    --set toolkit.version=v1.16.2-ubuntu20.04 \
    --set toolkit.env[0].name=CONTAINERD_CONFIG \
    --set toolkit.env[0].value=/etc/k0s/containerd.d/nvidia.toml \
    --set toolkit.env[1].name=CONTAINERD_SOCKET \
    --set toolkit.env[1].value=/run/k0s/containerd.sock \
    --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
    --set toolkit.env[2].value=nvidia

@elezar
Copy link
Member

elezar commented Mar 12, 2025

@diamonwiggins since you have an environment that reproduces this information would you be able to verify that:

  • that a socket can be passed to the containerd config dump command
  • that the containerd config dump command returns the "correct" config when the socket is specified

@elezar
Copy link
Member

elezar commented Mar 12, 2025

@tariq1890 is the following really the desired functionality:

In this case, we would have wanted the toolkit to fall back to step ii., as it would then retrieve the desired config from the k0s-managed containerd; but the fallback is never triggered as step i. is successful.

Would retrieving the config from the k0s containerd binary not also be sufficient? Note that it seems that we're generating an NVIDIA-specific config and not a full config in the containerd.d folder seeming to idicate that the behaviour may need to be more complex in this case.

@diamonwiggins
Copy link

since you have an environment that reproduces this information would you be able to verify that:

  • that a socket can be passed to the containerd config dump command
  • that the containerd config dump command returns the "correct" config when the socket is specified

@elezar The following two commands produce the same config for me:

containerd --address=/run/k0s/containerd.sock config dump

containerd --address=/run/containerd/containerd.sock config dump

No matter what address i pass in, the grpc address in the config dumped seems to never change. It's always:

[grpc]
  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

If i uncomment and modify the grpc address via /etc/containerd/config.toml then the correct address is shown in the config dump, but it still doesn't look like the "correct" config when /run/k0s/containerd.sock is set.

Running containerd -c /etc/k0s/containerd.toml config dump generates a config that has Nvidia specific information, so feels like the CLI gives more precedence to the config vs. the selected socket.

@elezar
Copy link
Member

elezar commented Mar 17, 2025

Thanks for checking the behaviour @diamonwiggins. This means that we can't "simply" specify the socket when running an arbitrary containerd binary and expect the config to be consistent. One option we would have is to allow the path to the contianerd binary to be specificed as an argument to the toolkit container instead of looking for this in the path. (This is what I was refering to in #982 (comment) but I did not express it clearly).

Could you confirm that running the config dump command k0s containerd binary renders the expected config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants