Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running prestart hook 0 caused error #71

Open
pschiffe opened this issue Jul 19, 2017 · 19 comments
Open

running prestart hook 0 caused error #71

pschiffe opened this issue Jul 19, 2017 · 19 comments

Comments

@pschiffe
Copy link

Hello,

oci-systemd-hook on latest RHEL atomic host doesn't work with runc.

# atomic host status
State: idle
Deployments:
● rhel-atomic-host:rhel-atomic-host/7/x86_64/standard
             Version: 7.3.6 (2017-06-23 16:20:45)
              Commit: e073a47baa605a99632904e4e05692064302afd8769a15290d8ebe8dbfd3c81b
# rpm -q oci-systemd-hook runc
oci-systemd-hook-0.1.7-4.gite533efa.el7.x86_64
runc-1.0.0-6.gite800860.el7.x86_64

This is the error I'm getting:

runc[15209]: container_linux.go:259: starting container process caused "process_linux.go:345: container init caused \"process_linux.go:328: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""

Config.json template is here: https://github.com/pschiffe/gce-system-container/blob/master/image/config.json.template

Can you help? How to debug?

@rhatdan
Copy link
Member

rhatdan commented Jul 19, 2017

It might be a mismatch between the version of runc and the version of oci-systemd-hook? Did those two versions ship together? How did you call out to oci-systemd-hook?

@pschiffe
Copy link
Author

Those two packages should align correctly as both are from the latest version of rhel atomic host (7.3.6):

# rpm -q oci-systemd-hook runc
oci-systemd-hook-0.1.7-4.gite533efa.el7.x86_64
runc-1.0.0-6.gite800860.el7.x86_64

This is how I'm calling hooks in the config.json.template:

    "hooks": {
        "prestart": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook"
            },
            {
                "path": "/usr/libexec/oci/hooks.d/oci-register-machine"
            }
        ],
        "poststop": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook"
            },
            {
                "path": "/usr/libexec/oci/hooks.d/oci-register-machine"
            }
        ]
},

@rhatdan
Copy link
Member

rhatdan commented Jul 19, 2017

The hooks expect the first option to the hook to be the prestart/poststart. Add those arguments and it should work.

Hooks can not tell which phase they are running unless you pass in the argv[1]

@pschiffe
Copy link
Author

Unfortunately, even with the args I see the same error:

runc[16416]: container_linux.go:259: starting container process caused "process_linux.go:345: container init caused \"process_linux.go:328: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""

I've updated the config.json hooks to look like this:

    "hooks": {
        "prestart": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
                "args": [ "prestart" ]
            },
            {
                "path": "/usr/libexec/oci/hooks.d/oci-register-machine",
                "args": [ "prestart" ]
            }
        ],
        "poststop": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
                "args": [ "poststop" ]
            },
            {
                "path": "/usr/libexec/oci/hooks.d/oci-register-machine",
                "args": [ "poststop" ]
            }
        ]
    },

BTW, my previous configuration (without the args) worked fine on rhel atomic host 7.3.2.

@pschiffe
Copy link
Author

I've tried the same image on centos atomic host continuous:

# atomic host status
State: idle
Deployments:
● centos-atomic-continuous:centos-atomic-host/7/x86_64/devel/continuous
                   Version: 7.2017.490 (2017-07-19 16:05:22)
                    Commit: a948637e77018755831659c791a8cf8595f6d267d4fddba24a6b42f0ec6f1bd7

# rpm -q oci-systemd-hook runc
oci-systemd-hook-0.1.7-4.gite533efa.el7.x86_64
runc-1.0.0-9.git6394544.el7.x86_64

But I see the same error:

runc[12188]: container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"process_linux.go:322: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""

@rhatdan
Copy link
Member

rhatdan commented Jul 20, 2017

@mrunalp WDYT?

@pschiffe Are you seeing anything in the journal? Are you seeing the same thing in Fedora?

@pschiffe
Copy link
Author

Hmm, I see something more in journal:

Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Starting Linux Guest Environment for Google Compute Engine...
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12188-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12188-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Created slice libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Removed slice libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer_12188_systemd_test_default.slice.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Started libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal prestart[12201]: systemdhook <error>: root not found in state: Success
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Stopped libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer container gce-agents.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal poststop[12204]: systemdhook <error>: root not found in state: Success
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal runc[12188]: container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"process_linux.go:322: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service: main process exited, code=exited, status=1/FAILURE
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: Unit gce-agents.service entered failed state.
Jul 20 12:18:03 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service failed.

@giuseppe
Copy link

@pschiffe you should specify args as:

        "prestart": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
                "args": [ "oci-systemd-hook", "prestart" ]
            },

as they are passed as they are to exec. args[0] can really be anything as it is not used.

It would be nice if it still works without any arg as it used to work before, I am taking a look right now.

@pschiffe
Copy link
Author

Didn't help, still the same error:

Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Starting Linux Guest Environment for Google Compute Engine...
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12731-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Scope libcontainer-12731-systemd-test-default-dependencies.scope has no PIDs. Refusing.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Created slice libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Removed slice libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer_12731_systemd_test_default.slice.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Started libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Starting libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal oci-systemd-hook[12744]: systemdhook <error>: root not found in state: Success
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Stopped libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Stopping libcontainer container gce-agents.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal oci-systemd-hook[12747]: systemdhook <error>: root not found in state: Success
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal runc[12731]: container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"process_linux.go:322: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\""
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service: main process exited, code=exited, status=1/FAILURE
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Unit gce-agents.service entered failed state.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service failed.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service holdoff time over, scheduling restart.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: start request repeated too quickly for gce-agents.service
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Failed to start Linux Guest Environment for Google Compute Engine.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: Unit gce-agents.service entered failed state.
Jul 20 12:50:12 centosah-1.c.ose-refarch.internal systemd[1]: gce-agents.service failed.
    "hooks": {
        "prestart": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
                "args": [ "oci-systemd-hook", "prestart" ]
            },
            {
                "path": "/usr/libexec/oci/hooks.d/oci-register-machine",
                "args": [ "oci-register-machine", "prestart" ]
            }
        ],
        "poststop": [
            {
                "path": "/usr/libexec/oci/hooks.d/oci-systemd-hook",
                "args": [ "oci-systemd-hook", "poststop" ]
            },
            {
                "path": "/usr/libexec/oci/hooks.d/oci-register-machine",
                "args": [ "oci-register-machine", "poststop" ]
            }
        ]
    },

@rhatdan
Copy link
Member

rhatdan commented Jul 20, 2017

This means that the config file you generated does not indicate with the "root" is, oci-systemd-hook and oci-umount need to go to the root of the container to mount or umount content, can you add this to your config?

@rhatdan
Copy link
Member

rhatdan commented Jul 20, 2017

@giuseppe we can not rely on the fact that he pid=0 or not, since runc specifies three ways to run a hook.
prestart
poststart
poststop
We can't tell the difference betwee a prestart and a poststart call. Since we are building this feature into cri-o, we don't want to ignore the specification.

@pschiffe
Copy link
Author

@rhatdan do you have an example how to specify the "root"? I can't find what you mean. I have in the config the following:

    "root": {
        "path": "rootfs",
        "readonly": true
    },

and in the process section, there is "cwd": "/",

@giuseppe
Copy link

giuseppe commented Jul 20, 2017

We can decide if be backward compatible and handle only prestart and poststop when the type is not specified. On the other hand there are not probably many users out there so we can just enforce it and not worry in the future of supporting this additional case.

Anyway the current development version support the case where the hook is not specified. The issue reported here depend on the hook version that is too old. The issue is fixed upstream with:

commit 69858fa
Author: Daniel J Walsh [email protected]
Date: Fri Jun 23 19:31:05 2017 +0000

Needs this change to work directly with runc and cri-o

The path bundle path passed to a container is called bundle, not bundlepath.
Also root is not in the state file, but can be retrieved from the config.json.

Signed-off-by: Daniel J Walsh <[email protected]>

Also this other commit is required, otherwise oci-systemd-hook will just segfault:

commit 40ab578
Author: Jason Wessel [email protected]
Date: Wed Jul 12 09:21:44 2017 -0700

Allow container definitions where rootfs is not an absolute path

In earlier versions of the runc frame work the rootfs path was passed
as a key with the initial json that was passed on the stdin and it was
automatically computed to be an absolute path.

This translation to an absolute path must be done in the hook based on
the bundlePath.  This allows the config.json to be relocated by the
container hosting system storage without modifying the config.json.

Signed-off-by: Jason Wessel <[email protected]>

@rhatdan
Copy link
Member

rhatdan commented Jul 20, 2017

So we need to get this updated in RHEL7 package. @lsm5 Can you create a new oci-systemd-hook package for RHEL?

@fkluknav
Copy link

rebased to 1e84754 in rhel 7.4 branch

@pschiffe
Copy link
Author

Would it be possible to also get it to the rhel atomic host 7.3?

@rhatdan
Copy link
Member

rhatdan commented Jul 21, 2017

There is only on stream of extras, so if you install oci-umount after it gets shipped, it wil be placed into RHEL7.3

@wking
Copy link

wking commented Feb 22, 2018

Some of the discussion here is about how the hook decides which stage its running in. The most portable approach to that is to use status in stdin's state JSON (cri-o/cri-o#1360). I have an open PR for that against the very-similar oci-umount: containers/oci-umount#35. If/when that lands I'll cherry-pick it over here.

@caoruidong
Copy link

something maybe useful: intel/cc-oci-runtime#270

runc is passing root rather than bundlePath to the hooks.
oci-systemd-hook is compounding the problem by requiring the erroneous root value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants