Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop file-based logging appenders on checkpoint and restart them on restore #38548

Open
nobodyiam opened this issue Nov 25, 2023 · 2 comments
Open
Labels
theme: checkpoint-restore Issues related to coordinated restore at checkpoint type: enhancement A general enhancement
Milestone

Comments

@nobodyiam
Copy link

  • spring-boot version: 3.2.0
  • spring version: 6.1.1
  • azul version: zulu17.46.29-ca-crac-jdk17.0.9-linux_x64

I am currently working on integrating CRaC support with Spring Boot 3.2.0, as detailed in this GitHub pull request.
During the checkpoint phase, I encountered an issue. It appears that the application's logging file, /opt/logs/qa-bot.log (which you can find defined here), may not be closing properly.
The application utilizes Logback for logging. Could this be a configuration oversight on my part, or is Logback possibly not yet compatible?

2023-11-25T14:51:58.172+08:00  WARN 2341104 --- [main] onfigReactiveWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.context.ApplicationContextException: Failed to take CRaC checkpoint on refresh
2023-11-25T14:51:58.272+08:00  INFO 2341104 --- [main] .s.b.a.l.ConditionEvaluationReportLogger :

Error starting ApplicationContext. To display the condition evaluation report re-run your application with 'debug' enabled.
2023-11-25T14:51:58.326+08:00 ERROR 2341104 --- [main] o.s.boot.SpringApplication               : Application run failed

org.springframework.context.ApplicationContextException: Failed to take CRaC checkpoint on refresh
        at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:534)
        at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:193)
        at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:965)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:619)
        at org.springframework.boot.web.reactive.context.ReactiveWebServerApplicationContext.refresh(ReactiveWebServerApplicationContext.java:66)
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:753)
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:455)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:323)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1342)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1331)
        at com.apolloconfig.apollo.ai.qabot.QABotApplication.main(QABotApplication.java:14)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:91)
        at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:53)
        at org.springframework.boot.loader.launch.JarLauncher.main(JarLauncher.java:58)
Caused by: org.crac.CheckpointException: null
        at org.crac.Core$Compat.checkpointRestore(Core.java:144)
        at org.crac.Core.checkpointRestore(Core.java:237)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:528)
        ... 17 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: /opt/logs/qa-bot.log
                at java.base/jdk.internal.crac.JDKFileResource.lambda$beforeCheckpoint$1(JDKFileResource.java:108)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:169)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=10 type=unknown path=anon_inode:[eventpoll]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=11 type=unknown path=anon_inode:[eventfd]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenResourceException: FD fd=12 type=unknown path=anon_inode:[timerfd]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:117)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
        Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenSocketException: FD fd=13 type=socket path=socket:[91583428]
                at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:115)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
                at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:265)
                at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:568)
                at org.crac.Core$Compat.checkpointRestore(Core.java:141)
                ... 19 common frames omitted
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 25, 2023
@snicoll

This comment was marked as outdated.

@snicoll snicoll closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2023
@snicoll snicoll added status: duplicate A duplicate of another issue and removed status: waiting-for-triage An issue we've not yet triaged labels Nov 25, 2023
@sdeleuze
Copy link
Contributor

Per my comment in spring-projects/spring-framework#31680 (comment), I propose to reopen this issue since it looks possible to support that on Boot side as demonstrated by my experimental commit. I would suggest a Spring Lifecycle based implementation, and of course we need to check we can reliably reproduce what I tried in this experiment. Please let me know if you need my contribution.

@wilkinsona wilkinsona reopened this Nov 29, 2023
@wilkinsona wilkinsona changed the title CheckpointOpenFileException occurred with spring boot log files Stop file-based logging appenders on checkpoint and restart them on restore Nov 29, 2023
@wilkinsona wilkinsona added type: enhancement A general enhancement theme: checkpoint-restore Issues related to coordinated restore at checkpoint and removed status: duplicate A duplicate of another issue labels Nov 29, 2023
@wilkinsona wilkinsona added this to the 3.x milestone Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme: checkpoint-restore Issues related to coordinated restore at checkpoint type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

5 participants