Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to use CATCHUP_SKIP_KNOWN_RESULTS_FOR_TESTING in PubnetParallelCatchupV2 #227

Merged

Conversation

ThomasBrady
Copy link
Contributor

@ThomasBrady ThomasBrady commented Dec 2, 2024

Overview

  • Adds --catchup-use-known-results-for-testing cli flag to supercluster, which results PubnetParallelCatchupV2 running with CATCHUP_SKIP_KNOWN_RESULTS_FOR_TESTING=true in its config
  • Adds worker.catchup_skip_known_results_for_testing kube set option, which is set to true if supercluster was run with --catchup-use-known-results-for-testing
  • The catchup workers config volume is sourced from stellar-core-config configMap
  • if worker.use_known_results_config is true, the data of the stellar-core-config configMap includes CATCHUP_SKIP_KNOWN_RESULTS_FOR_TESTING = true

Testing:

  • Currently, the catchup worker invocation is set to --ll=INFO to allow for log message which validate results are downloaded and used for catchup
  • Locally I have run PubnetParallelCatchupV2 via helm and been able to see the --use_known_results_config flag resulting in Results being downloaded via logs
stellar_core_image: docker-registry.services.stellar-ops.com/dev/stellar-core:22.0.1-2178.fb725ab9b.focal-do-not-use-in-prd-buildtests
  use_known_results_config: true

Future

It would be nice to have a dynamic (or templated) generation of config files for this mission. We use a TOML library in the other missions to generate config files, perhaps we can do something similar for PubnetParallelCatchupV2?

Copy link
Contributor

@jayz22 jayz22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is okay for now. But I think after the experimental phase is done we should properly move this option to stellar core command line.
Conditionally appending options to the configmap can get out of hand quickly and be hard to maintain.
Let's not forget to create a followup issue for that.

src/App/Program.fs Outdated Show resolved Hide resolved
src/App/Program.fs Outdated Show resolved Hide resolved
@ThomasBrady ThomasBrady force-pushed the add-catchup-skip-known-results-missionops branch 2 times, most recently from e1eebc6 to 2c8c0bc Compare December 6, 2024 23:47
@ThomasBrady ThomasBrady requested a review from jayz22 December 6, 2024 23:50
@ThomasBrady
Copy link
Contributor Author

I was able to run with this option enabled in the test-stellar-supercluster pipeline https://buildmeister-v3.stellar-ops.com/job/Core/job/stellar-supercluster-test/81/

@jayz22
Copy link
Contributor

jayz22 commented Dec 9, 2024

I was able to run with this option enabled in the test-stellar-supercluster pipeline https://buildmeister-v3.stellar-ops.com/job/Core/job/stellar-supercluster-test/81/

Looks good, do you want to switch the default flag to false as @anupsdf suggested, or do that on the Jenkins side?

@ThomasBrady ThomasBrady force-pushed the add-catchup-skip-known-results-missionops branch from 2c8c0bc to 78ac6f5 Compare December 9, 2024 20:15
@ThomasBrady
Copy link
Contributor Author

I was able to run with this option enabled in the test-stellar-supercluster pipeline https://buildmeister-v3.stellar-ops.com/job/Core/job/stellar-supercluster-test/81/

Looks good, do you want to switch the default flag to false as @anupsdf suggested, or do that on the Jenkins side?

Thanks. I just pushed a change to make this default to true. I also have this PR to update stellar-supercluster.

Do you have permissions to edit the Jenkins schedule to ensure we do a once weekly non-skipping catchup run?

Copy link
Contributor

@jayz22 jayz22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to run with this option enabled in the test-stellar-supercluster pipeline https://buildmeister-v3.stellar-ops.com/job/Core/job/stellar-supercluster-test/81/

Looks good, do you want to switch the default flag to false as @anupsdf suggested, or do that on the Jenkins side?

Thanks. I just pushed a change to make this default to true. I also have this PR to update stellar-supercluster.

Do you have permissions to edit the Jenkins schedule to ensure we do a once weekly non-skipping catchup run?

No you need to set it up. There is also the policy question of when and how we use this mode (which is separate from this PR). I'm not sure if we have a time-based CI job currently.

Just want to double check, are we sure about setting default to true. Or should we instead have this option be opted-in, i.e. configured from outside. (which is what @MonsieurNicolas has suggested at the beginning). Since that issue is closed I suppose all the work mentioned has been completed and it is okay to turn this on by default?

src/App/Program.fs Outdated Show resolved Hide resolved
@ThomasBrady
Copy link
Contributor Author

I was able to run with this option enabled in the test-stellar-supercluster pipeline https://buildmeister-v3.stellar-ops.com/job/Core/job/stellar-supercluster-test/81/

Looks good, do you want to switch the default flag to false as @anupsdf suggested, or do that on the Jenkins side?

Thanks. I just pushed a change to make this default to true. I also have this PR to update stellar-supercluster.
Do you have permissions to edit the Jenkins schedule to ensure we do a once weekly non-skipping catchup run?

No you need to set it up. There is also the policy question of when and how we use this mode (which is separate from this PR). I'm not sure if we have a time-based CI job currently.

Just want to double check, are we sure about setting default to true. Or should we instead have this option be opted-in, i.e. configured from outside. (which is what @MonsieurNicolas has suggested at the beginning). Since that issue is closed I suppose all the work mentioned has been completed and it is okay to turn this on by default?

The linked issue is referencing stellar-core catchup. This mode is opt-in in stellar-core (in fact one can only opt-in when running catchup on a test build). As discussed with Anup in standup this morning we want all supercluster runs to use this mode by default, with a less frequent weekly non-skip ParallelCatchupV2 run. Based on that, I think it makes sense to default this mode being enabled in our supercluster runs.

…mplating to conditionally include CATCHUP_SKIP_KNOWN_RESULTS_FOR_TESTING in PubnetParallelCatchupV2 workers stellar-core.cfg when this flag is provided.
@ThomasBrady ThomasBrady force-pushed the add-catchup-skip-known-results-missionops branch from 78ac6f5 to 70e4d6f Compare December 9, 2024 21:57
@jayz22
Copy link
Contributor

jayz22 commented Dec 10, 2024

The linked issue is referencing stellar-core catchup. This mode is opt-in in stellar-core (in fact one can only opt-in when running catchup on a test build). As discussed with Anup in standup this morning we want all supercluster runs to use this mode by default, with a less frequent weekly non-skip ParallelCatchupV2 run. Based on that, I think it makes sense to default this mode being enabled in our supercluster runs.

Sounds good, thanks

@jayz22 jayz22 merged commit 0bf8516 into stellar:main Dec 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants