Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRaC restore fails with ClassNotFoundException on Jar path #33226

Closed
shmyer opened this issue Jul 17, 2024 · 6 comments
Closed

CRaC restore fails with ClassNotFoundException on Jar path #33226

shmyer opened this issue Jul 17, 2024 · 6 comments
Assignees
Labels
for: external-project Needs a fix in external project status: invalid An issue that we don't feel is valid

Comments

@shmyer
Copy link

shmyer commented Jul 17, 2024

Affects: 6.1.10
JDK: zulu21.34.19-ca-crac-jdk21.0.3-linux_x64
Running on a Linux VM


I am trying to use CRaC with a Spring Boot app. I have come across many issues so far, including logback appenders causing jdk.crac.impl.CheckpointOpenFileException upon checkpoint creation (spring-projects/spring-boot#38548) and the Eureka Discovery Client causing an open connection because of fetching the registry before checkpoint. I was able to workaround those issues so far and I made the checkpointing work.

Now I am stuck on the restore. As you can see in the attached log, the restore code is trying to load my Spring Boot Jar as a class and of course it can't find that. I don't quite understand why it does that.

I've also attached the CRIU dump and restore logs below, they seem fine to me, but I might be wrong.


Spring Boot Log:

24476: Error (criu/tty.c:843): tty: Can't set tty params on 0x26, trying to skip...: Inappropriate ioctl for device
2024-07-17T12:18:51.537Z  INFO 24476 --- [app] [           main] o.s.c.support.DefaultLifecycleProcessor  : Restarting Spring-managed lifecycle beans after JVM restore
2024-07-17T12:18:51.694Z  INFO 24476 --- [app] [           main] o.s.c.support.DefaultLifecycleProcessor  : Spring-managed lifecycle restart completed (restored JVM running for 693 ms)
2024-07-17T12:18:51.762Z  WARN 24476 --- [app] [           main] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.context.ApplicationContextException: Failed to restore CRaC checkpoint on refresh
2024-07-17T12:18:51.855Z  INFO 24476 --- [app] [           main] com.netflix.discovery.DiscoveryClient    : Shutting down DiscoveryClient ...
2024-07-17T12:18:54.864Z  INFO 24476 --- [app] [           main] com.netflix.discovery.DiscoveryClient    : Unregistering ...
2024-07-17T12:18:55.297Z  INFO 24476 --- [app] [           main] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_app/<eureka-host>:app:8702 - deregister  status: 404
2024-07-17T12:18:55.301Z  INFO 24476 --- [app] [           main] com.netflix.discovery.DiscoveryClient    : Completed shut down of DiscoveryClient
2024-07-17T12:18:55.322Z  INFO 24476 --- [app] [           main] o.apache.catalina.core.StandardService   : Stopping service [Tomcat]
2024-07-17T12:18:55.373Z  INFO 24476 --- [app] [           main] .s.b.a.l.ConditionEvaluationReportLogger :

Error starting ApplicationContext. To display the condition evaluation report re-run your application with 'debug' enabled.
2024-07-17T12:18:55.441Z ERROR 24476 --- [app] [           main] o.s.boot.SpringApplication               : Application run failed

org.springframework.context.ApplicationContextException: Failed to restore CRaC checkpoint on refresh
        at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:539) ~[spring-context-6.1.10.jar!/:6.1.10]
        at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:194) ~[spring-context-6.1.10.jar!/:6.1.10]
        at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:981) ~[spring-context-6.1.10.jar!/:6.1.10]
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:627) ~[spring-context-6.1.10.jar!/:6.1.10]
        at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:146) ~[spring-boot-3.3.1.jar!/:3.3.1]
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754) ~[spring-boot-3.3.1.jar!/:3.3.1]
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:456) ~[spring-boot-3.3.1.jar!/:3.3.1]
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:335) ~[spring-boot-3.3.1.jar!/:3.3.1]
        at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:149) ~[spring-boot-3.3.1.jar!/:3.3.1]
        at de.app.platform.aggregate.CustomApplicationBuilder.run(CustomApplicationBuilder.java:36) ~[app-platform-aggregate-18.0.0-b002eaed.jar!/:18.0.0-b002eaed]
        at de.app.MyApplication.main(MyApplication.java:10) ~[!/:18.0.0-b002eaed]
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
        at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]
        at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:91) ~[app-18.0.0-b002eaed.jar:18.0.0-b002eaed]
        at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:53) ~[app-18.0.0-b002eaed.jar:18.0.0-b002eaed]
        at org.springframework.boot.loader.launch.JarLauncher.main(JarLauncher.java:58) ~[app-18.0.0-b002eaed.jar:18.0.0-b002eaed]
Caused by: org.crac.RestoreException: null
        at org.crac.Core$Compat.checkpointRestore(Core.java:150) ~[crac-1.4.0.jar!/:na]
        at org.crac.Core.checkpointRestore(Core.java:237) ~[crac-1.4.0.jar!/:na]
        at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:530) ~[spring-context-6.1.10.jar!/:6.1.10]
        ... 15 common frames omitted
        Suppressed: java.security.PrivilegedActionException: null
                at java.base/java.security.AccessController.doPrivileged(AccessController.java:575) ~[na:na]
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:230) ~[na:na]
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:294) ~[na:na]
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:273) ~[na:na]
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72) ~[jdk.crac:na]
                at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
                at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]
                at org.crac.Core$Compat.checkpointRestore(Core.java:141) ~[crac-1.4.0.jar!/:na]
                ... 17 common frames omitted
        Caused by: java.lang.ClassNotFoundException: /<path-to-jar>/app-18.0.0-b002eaed.jar
                at java.base/java.lang.Class.forName0(Native Method)
                at java.base/java.lang.Class.forName(Class.java:534)
                at java.base/java.lang.Class.forName(Class.java:513)
                at java.base/jdk.internal.crac.mirror.Core$2.run(Core.java:233)
                at java.base/jdk.internal.crac.mirror.Core$2.run(Core.java:230)
                at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
                ... 24 common frames omitted

CRIU Logs:
dump4.log
restore.log

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jul 17, 2024
@sdeleuze sdeleuze self-assigned this Jul 17, 2024
@sdeleuze
Copy link
Contributor

If you are using containers, be aware that configuring capabilities may be required, see https://github.com/sdeleuze/spring-boot-crac-demo/blob/main/restore.sh for an example. Also you may want to ensure the path app-18.0.0-b002eaed.jar does not change (which could be the case with volumes, etc.)

Is app-18.0.0-b002eaed.jar the executable JAR of your Spring Boot app?

@sdeleuze sdeleuze added the status: waiting-for-feedback We need additional information before we can continue label Jul 17, 2024
@shmyer
Copy link
Author

shmyer commented Jul 17, 2024

I am not in a container environment. I am on a Linux VM on a VMWare Host. Could capabilities still be an issue here? I am currently on a 4.12 Linux kernel, which does not have the CHECKPOINT_RESTORE capability yet. It seems like on older Linux kernels the capability SYS_ADMIN is the one required for checkpoint/restore. I am using a non-root user.

However, as far as I understood CRIU is nevertheless running as root, since one thing I had to let our sysadmins do was this here:
https://docs.azul.com/core/crac/crac-debugging#failures-in-native-checkpoint-or-restore

sudo chown root:root /path/to/criu
sudo chmod u+s /path/to/criu

Without that it didn't get past the CRIU part of the restore. But according to my restore.log the CRIU part of the restore seems to be working now.

Yes, the file's location is the same during the creation of the checkpoint and during the restore.
Yes, this Jar file is the executable JAR of my Spring Boot app.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 17, 2024
@sdeleuze
Copy link
Contributor

Looks like more a JDK/CRaC level issue so not sure what we can do about it on Framework side, do you agree?

@sdeleuze sdeleuze added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 18, 2024
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Jul 25, 2024
@shmyer
Copy link
Author

shmyer commented Jul 30, 2024

I guess you're right. In the end I've decided to abandon my plans to use CRaC. It doesn't seem mature enough to me.

@shmyer shmyer closed this as completed Jul 30, 2024
@bclozel bclozel closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2024
@bclozel bclozel added status: invalid An issue that we don't feel is valid for: external-project Needs a fix in external project and removed status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged or decided on status: feedback-reminder We've sent a reminder that we need additional information before we can continue labels Jul 30, 2024
@rvansa
Copy link

rvansa commented Feb 25, 2025

Hi @shmyer , CRaC support in frameworks could really deserve some extra work. The capabilities issue is really irrelevant; it seems that you are restoring with java -XX:CRaCRestoreFrom=... /<path-to-jar>/app-18.0.0-b002eaed.jar ... on command line.
Similar to java -cp ... org.acme.HelloWorld foo bar, java -XX:CRaCRestoreFrom=... org.acme.HelloWorld foo bar will try to load the HelloWorld class and invoke the main() method with those args (instead of returning from Core.checkpointRestore()); the error message could be improved to make this more obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project Needs a fix in external project status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

5 participants