Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - activation: panic if post service exits unexpectedly #5300

Closed
wants to merge 8 commits into from

Conversation

fasmat
Copy link
Member

@fasmat fasmat commented Nov 23, 2023

Motivation

Closes #5267 merge after spacemeshos/post#249

Changes

  • PoST supervisor was updated to add --max-retries as cmdline parameter to Post service
  • Number of retries can be configured using the PostSupervisorConfig (not exposed via node config or cmdline parameters) and defaults to 10. This gives the node about 50 seconds to be ready for the post service to connect after starting it

Test Plan

  • Updated existing tests to match new behavior
  • Add test that checks if incorrect setup that causes the post service to be unable to connect to the node causes zap.Fatal to be called

TODO

  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • Update documentation as needed
  • Update changelog as needed

@fasmat fasmat self-assigned this Nov 23, 2023
activation/post_supervisor.go Outdated Show resolved Hide resolved
@fasmat fasmat force-pushed the 5267-panic-on-post-service-exit branch from 4b21309 to 882bf8a Compare November 23, 2023 19:08
@fasmat
Copy link
Member Author

fasmat commented Nov 23, 2023

@poszu the new version of post-rs seems to not compile on windows and shows unusual errors on macOS (it also complains that it was built for a newer macOS than it is executed on).

@fasmat fasmat changed the title Panic if post service exits unexpectedly activation: panic if post service exits unexpectedly Nov 23, 2023
@fasmat
Copy link
Member Author

fasmat commented Nov 23, 2023

The new version of post-rs also seems to validate NumUnits to be at least 4 which broke a few tests: https://github.com/spacemeshos/go-spacemesh/actions/runs/6973779216/job/18978370093?pr=5300

@poszu
Copy link
Contributor

poszu commented Nov 24, 2023

The new version of post-rs also seems to validate NumUnits to be at least 4 which broke a few tests: https://github.com/spacemeshos/go-spacemesh/actions/runs/6973779216/job/18978370093?pr=5300

This is the mainnet default. You can overwrite it with --min-num-units.

@poszu
Copy link
Contributor

poszu commented Nov 24, 2023

@poszu the new version of post-rs seems to not compile on windows and shows unusual errors on macOS (it also complains that it was built for a newer macOS than it is executed on).

Weird, it builds on Windows nicely in post-rs CI.

Regarding Mac - it was previously built with Mac SDK 12.3 but is now built with the default SDK (no overwrite). Building with an older SDK was a workaround for a bug in RadnomX lib (causing crashes on Mac) that is now fixed.

Copy link

codecov bot commented Nov 24, 2023

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (0c3457a) 77.5% compared to head (e95ce21) 77.4%.
Report is 2 commits behind head on develop.

Files Patch % Lines
activation/post_supervisor.go 84.7% 8 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #5300     +/-   ##
=========================================
- Coverage     77.5%   77.4%   -0.1%     
=========================================
  Files          253     253             
  Lines        29632   29659     +27     
=========================================
+ Hits         22966   22979     +13     
- Misses        5204    5215     +11     
- Partials      1462    1465      +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fasmat
Copy link
Member Author

fasmat commented Nov 24, 2023

bors merge

@spacemesh-bors
Copy link

👎 Rejected by too few approved reviews

}
eg.Wait()
ps.logger.Fatal("post service exited", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it makes more sense to raise Fatal in node when error will bubble up. there are several examples of this, for example peersync process

Copy link
Member Author

@fasmat fasmat Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean just returning an error here and handle the error with log.Fatal in node.go?

The problem is that the error here happens asynchronously and doesn't bubble up at the moment. Returning it here would do nothing (besides preventing the internal errgroup from accepting new go routines).

@fasmat
Copy link
Member Author

fasmat commented Nov 24, 2023

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Nov 24, 2023
## Motivation
Closes #5267 merge after spacemeshos/post#249

## Changes
- PoST supervisor was updated to add `--max-retries` as cmdline parameter to Post service
- Number of retries can be configured using the `PostSupervisorConfig` (not exposed via node config or cmdline parameters) and defaults to 10. This gives the node about 50 seconds to be ready for the post service to connect after starting it

## Test Plan
- Updated existing tests to match new behavior
- Add test that checks if incorrect setup that causes the post service to be unable to connect to the node causes `zap.Fatal` to be called

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed
- [x] Update [changelog](../CHANGELOG.md) as needed
@spacemesh-bors
Copy link

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title activation: panic if post service exits unexpectedly [Merged by Bors] - activation: panic if post service exits unexpectedly Nov 24, 2023
@spacemesh-bors spacemesh-bors bot closed this Nov 24, 2023
@spacemesh-bors spacemesh-bors bot deleted the 5267-panic-on-post-service-exit branch November 24, 2023 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node should panic in PoST supervised mode if service doesn't connect
3 participants