-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman with bind mount leaving cgroup debris and prevents container restart #730
Comments
After getting into this situation, I can't start even non-bind mounted containers.
|
If I don't use bind mounts at all and leave |
@mrunalp I think this might be runc not running posthook oci-systemd-hook. @aalba6675 Could you see if the journal reports that oci-systemd-hook ran in the posthook? |
@rhatdan The posthook doesn't seem to be run. This is the journal when the container is stopped. Leaving behind a lot of cgroup mounted on impossibly long paths like
Journal:
Old school (after the container is stopped):
|
Is there a way to manually clear these cgroups/mounts? Once I get into this state I can't start simple containers (i.e. those without bind mounts). |
Can't you just umount them? |
Trying:
I get either
or
|
Another observation: notify_on_release is 0 in the |
@mheon - any idea why this would be triggered by a kludgy bind mount ( Non-bind mount containers are functioning without leaving group debris . Once this situation is triggered (lots of cgroup mounted on |
@aalba6675 The mounts situation makes it sound like it could be oci-umount-hook firing, and less like our other CGroup issues (though it's wierd you're not seeing CGroups left over even in cases where mounts aren't involved, we should still be leaking one or two). I'm not familiar enough with that hook to know for sure what the cause might be, though. |
@mheon you are right, I spoke too soon. The cgroups are only visible in the legacy tools libcgroup-tools.
There is leaking in In the case of bind mounts, there seems to be a recursive loop as the directory |
The recursive CGroup path seems like a separate bug - our current CGroup issues are mainly a lack of cleanup, whereas this seems to actually be duplicating the Conmon scope repeatedly. Interesting - I'll look at this more tomorrow. |
Thanks! Let me summarize a reproducer for Fedora 28:
|
To provide an update here, I'm working on a more general overhaul of our CGroup handling now. Hopefully, once it's ready, it will address this and our other CGroup issues. |
My original thought was that this was the oci-umount hook, given that it seems to only occur with mounts. However, oci-umount does nothing with CGroups, so that doesn't seem likely. oci-systemd-hook does use both mounts and CGroups, so that seems to be a likely candidate. |
Yes this is definately oci-systemd-hook, but the problem, I believe, is runc is not firing the oci-systemd-hook in poststop. |
This is probably a reference to opencontainers/runc#1797. |
In some discussion on |
We aren't consuming this yet, but these pkg/hooks changes lay the groundwork for future libpod changes to support post-exit hooks [1,2]. [1]: containers#730 [2]: opencontainers/runc#1797 Signed-off-by: W. Trevor King <[email protected]>
Hi I have an observation not related to exit/posthook: when the container is merely started there is already a doubled path. We can't blame runc posthook for this so is it oci-systemd-hook that starts with a doubled path mount? Container is running (not tried to exit at all):
When I stop the container, the doubled path remains mounted but I can now manually unmount both the single/double path from the host. |
@aalba6675 Is this after a restart (container has already started and stopped once, then started again)? |
@mheon - no - this is a clean start from boot with the |
It seems that if I take the trouble to |
This allows callers to avoid delegating to OCI runtimes for cases where they feel that the runtime hook handling is unreliable [1]. [1]: #730 (comment) Signed-off-by: W. Trevor King <[email protected]> Closes: #855 Approved by: rhatdan
Instead of delegating to the runtime, since some runtimes do not seem to handle these reliably [1]. [1]: containers#730 (comment) Signed-off-by: W. Trevor King <[email protected]>
Instead of delegating to the runtime, since some runtimes do not seem to handle these reliably [1]. [1]: #730 (comment) Signed-off-by: W. Trevor King <[email protected]> Closes: #864 Approved by: rhatdan
@aalba6675 Can you try this with Podman 0.6.2 to see if it's fixed? We execute postrun hooks ourselves now, instead of calling out to |
I was able to cause this problem with 0.6.2 but I'm not sure how to reproduce it or unmount the cgroups... |
Verified here. We're further than we were before - |
Sorry for being MIA - just reproduced this on Things are looking better: @mheon I saw the error message
The failure is due to the doubled child path: @thoraxe - after the pod is stopped you should be able to manually umount the paths on the host; to cleanup I do the following
|
I spent some more time debugging this yesterday. New conclusions:
|
Matt did you attempt to remove the pid failure, and it still failed to find the directory or was this just on a non systemd container |
@rhatdan I do believe we had another error, let me see what that was |
@rhatdan It's trying to hit c/storage config when c/storage has already deleted the container. |
So in this case everything was cleaned up correct? |
It looks like everything is being cleaned up with no volume mounts present, but I'm not 100% sure, given that |
I can confirm that, while the container is exiting instantly if mounts are present, it is leaving mounts lying around in |
@mheon I think you are seeing cascading effects of bugs: The #893 - may also be a duplicate of this bug; it you don't clean up the cgroups mount manually on the host before your ExecReload, the cgroup will leak and you will see the "explosion". |
@aalba6675 I think #893 is probably separate, as the CGroups mounts aren't being created there - On |
A workaround is to patch the template to |
First off we should probably not use /tmp at all, except for when we are doing this as a non privileged user. We should be doing this under /run/libpod |
On Fedora 28 /run has shared propagation, so this would at least need a tmpfs at /run/libpod/tmp with private propagation then use a template like /run/libpod/tmp/ocitmp.XXXXXX. |
How about projectatomic/oci-systemd-hook#98 to fix this? |
Could you try it out. |
@aalba6675 Could you checkout podman-0.7.1 which added podman container cleanup which could fix some of these issues. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
Description
When podman stops a systemd container with bind mounts, it leaves behind a lot of cgroup debris.
This prevents the container from starting the third time.
Steps to reproduce the issue:
Workaround: currently to get working bind mounts I have to set
mount --make-private /tmp
. Otherwise oci-systemd-hook cannot move the mount to the overlay. This on Fedora 28. Cannot move mount from /tmp/ocitmp.XXXX to .../merged/run projectatomic/oci-systemd-hook#92Create a systemd-based fedora:28 container
Describe the results you received:
After first start/stop cycle there is cgroup debris:
journal
Describe the results you expected:
start/stop without any issue
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info
:``
host:
MemFree: 17688948736
MemTotal: 33667493888
SwapFree: 0
SwapTotal: 0
arch: amd64
cpus: 8
hostname: podman.localdomain
kernel: 4.16.5-300.fc28.x86_64
os: linux
uptime: 10h 3m 7.91s (Approximately 0.42 days)
insecure registries:
registries: []
registries:
registries:
store:
ContainerStore:
number: 4
GraphDriverName: overlay
GraphOptions:
GraphRoot: /var/lib/containers/storage
GraphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "true"
Supports d_type: "true"
ImageStore:
number: 2
RunRoot: /var/run/containers/storage
The text was updated successfully, but these errors were encountered: