Skip to content

Conversation

ademidoff
Copy link
Member

@ademidoff ademidoff commented Oct 16, 2025

This PR will collect all contributions to HA epic.

FB - SUBMODULES-4078

Caution

Please DO NOT merge!

)

* Initial plan

* PMM-14402 Rename PMM_TEST_HA_* environment variables to PMM_HA_*

Co-authored-by: BupycHuk <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: BupycHuk <[email protected]>
* PMM-13812 Dynamic migrations.

* PMM-13812 Changes.

* PMM-13812 Make changes.

* PMM-13812 Make.

* PMM-13812 Fix gen.

* PMM-13812 Make.

* PMM-13812 Lint.

* PMM-13812 Debug.

* PMM-13812 Debug.

* PMM-13812 Debug.

* PMM-13812 Changes.

* PMM-13812 Changes.

* PMM-13812 Typo in template.

* PMM-13812 Format.

* PMM-13812 Logging.

* PMM-13812 Remove SQL after use.

* PMM-13812 Fix for last migration.

* PMM-13812 Debug.

* PMM-13812 EOF fix.

* PMM-13812 Fix version numbering.

* PMM-13812 Test with static names.

* PMM-13812 Test skip DB engine.

* PMM-13812 Schema migrations engine.

* PMM-13812 Right table path in engine.

* PMM-13812 Debug nginx configuration

* PMM-13812 Generate ssl certs for nginx in build time

* PMM-13812 Debug nginx logs

* PMM-13812 Check supervisord status

* PMM-13812 Remove ENGINE from PG.

* PMM-13812 Force schema_migrations engine for cluster.

* PMM-13812 Apply on correct table.

* PMM-13812 Add debug.

* PMM-13812 Try add ORDER BY for schema_migrations creation.

* PMM-13812 Refactor.

* PMM-13812 Add env variable to decide if cluster or not.

* PMM-13812 Lint/refactor.

* PMM-13812 Allow provide specific cluster name.

* PMM-13812 Add missing "ON CLUSTER" keyword.

* PMM-13812 Handle empty cluster name.

* PMM-13812 Debug.

* PMM-13812 Fix for not provided cluster names.

* Revert "PMM-13812 Debug."

This reverts commit 5b6779d.

* PMM-13812 Typo.

* PMM-13812 Gets error from isCluster.

* PMM-13812 Using always default DB to check if cluster is ready.

* PMM-13812 Fix template to get proper format.

* PMM-13812 Debug - double quotes and more logging.

* PMM-13812 Debug fix.

* PMM-13812 Debug.

* PMM-13812 Debug - not escape part of query in params.

* PMM-13812 Include cluster name in DB creation.

* PMM-13812 TODO flags?

* PMM-13812 Include cluster name in ready check.

* PMM-13812 Unify quotes.

* PMM-13812 Improve doc and limitations.

* PMM-13812 Env var to kingpin.

* PMM-13812 Changed based on Nurlans draft.

* PMM-13812 Revert Alex debug changes.

* PMM-13812 Another revert.

* PMM-13812 Temp static files to satisfy tests.

* PMM-13812 Changes.

* PMM-13812 Remove static for clickhouse client.

* PMM-13812 Refactor.

* PMM-13812 Lint.

* PMM-13812 Lint.

* PMM-13812 License.

* PMM-13812 Format.

* PMM-13812 Header.

* PMM-13812 License.

* PMM-13812 Lint.

* PMM-13812 Lint.

* PMM-13812 Lint.

* PMM-13812 Formatting.

* PMM-13812 Reduct pass in logs.

* Update qan-api2/utils/templatefs/templatefs.go

Co-authored-by: Michael Okoko <[email protected]>

* PMM-13812 Changes requested by Alex.

---------

Co-authored-by: Alex Demidoff <[email protected]>
Co-authored-by: Michael Okoko <[email protected]>
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 0.91743% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.49%. Comparing base (3d14106) to head (9ceab4f).

Files with missing lines Patch % Lines
managed/services/ha/highavailability.go 0.00% 100 Missing ⚠️
managed/cmd/pmm-managed/main.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               v3    #4677      +/-   ##
==========================================
- Coverage   44.56%   44.49%   -0.08%     
==========================================
  Files         363      363              
  Lines       45706    45785      +79     
==========================================
+ Hits        20369    20371       +2     
- Misses      23677    23755      +78     
+ Partials     1660     1659       -1     
Flag Coverage Δ
admin 17.33% <ø> (ø)
agent 53.31% <ø> (ø)
managed 44.41% <0.91%> (-0.12%) ⬇️
vmproxy 74.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

* PMM-13812 Dynamic migrations.

* PMM-13812 Changes.

* PMM-13812 Make changes.

* PMM-13812 Make.

* PMM-13812 Fix gen.

* PMM-13812 Make.

* PMM-13812 Lint.

* PMM-13812 Debug.

* PMM-13812 Debug.

* PMM-13812 Debug.

* PMM-13812 Changes.

* PMM-13812 Changes.

* PMM-13812 Typo in template.

* PMM-13812 Format.

* PMM-13812 Logging.

* PMM-13812 Remove SQL after use.

* PMM-13812 Fix for last migration.

* PMM-13812 Debug.

* PMM-13812 EOF fix.

* PMM-13812 Fix version numbering.

* PMM-13812 Test with static names.

* PMM-13812 Test skip DB engine.

* PMM-13812 Schema migrations engine.

* PMM-13812 Right table path in engine.

* PMM-13812 Debug nginx configuration

* PMM-13812 Generate ssl certs for nginx in build time

* PMM-13812 Debug nginx logs

* PMM-13812 Check supervisord status

* PMM-13812 Remove ENGINE from PG.

* PMM-13812 Force schema_migrations engine for cluster.

* PMM-13812 Apply on correct table.

* PMM-13812 Add debug.

* PMM-13812 Try add ORDER BY for schema_migrations creation.

* PMM-13812 Refactor.

* PMM-13812 Add env variable to decide if cluster or not.

* PMM-13812 Lint/refactor.

* PMM-13812 Allow provide specific cluster name.

* PMM-13812 Add missing "ON CLUSTER" keyword.

* PMM-13812 Handle empty cluster name.

* PMM-13812 Debug.

* PMM-13812 Fix for not provided cluster names.

* Revert "PMM-13812 Debug."

This reverts commit 5b6779d.

* PMM-13812 Typo.

* PMM-13812 Gets error from isCluster.

* PMM-13812 Using always default DB to check if cluster is ready.

* PMM-13812 Fix template to get proper format.

* PMM-13812 Debug - double quotes and more logging.

* PMM-13812 Debug fix.

* PMM-13812 Debug.

* PMM-13812 Debug - not escape part of query in params.

* PMM-13812 Include cluster name in DB creation.

* PMM-13812 TODO flags?

* PMM-13812 Include cluster name in ready check.

* PMM-13812 Unify quotes.

* PMM-13812 Improve doc and limitations.

* PMM-13812 Env var to kingpin.

* PMM-13812 Changed based on Nurlans draft.

* PMM-13812 Revert Alex debug changes.

* PMM-13812 Another revert.

* PMM-13812 Enhance high availability service with persistent Raft storage and improved message handling. Added setup for BoltDB log and stable stores, implemented non-blocking message sending, and adjusted Raft configuration for better stability. Updated logging levels based on environment variable.

* Refactor high availability service to improve Raft handling and logging. Removed unused message channel, updated logging for applied log entries, and clarified snapshot and restore methods for stateless FSM. Adjusted Raft configuration for minimal snapshots and improved error handling in BroadcastMessage.

* Update go.mod and go.sum to include new dependencies for Raft and BoltDB. Added github.com/hashicorp/raft-boltdb/v2 v2.3.1 and github.com/boltdb/bolt v1.3.1, along with indirect dependencies for improved storage and message handling. Updated existing dependencies to their latest versions for better stability and performance.

* Refactor high availability service to simplify Raft cluster bootstrapping logic. Removed redundant bootstrap checks and improved error handling when joining the memberlist cluster.

* PMM-13812 Temp static files to satisfy tests.

* PMM-13812 Changes.

* Update snapshot store logging in high availability service to use logrus writer instead of stderr for improved log management.

* Update managed/services/ha/highavailability.go

Co-authored-by: Copilot <[email protected]>

* PMM-13812 Remove static for clickhouse client.

* Enhance high availability service by introducing new timeout constants for Raft operations, improving error handling in cluster bootstrapping, and updating snapshot methods to ensure proper resource management. This refactor aims to improve stability and clarity in the service's Raft implementation.

* Remove unnecessary resource management code in the Restore method of the high availability service. This change simplifies the method by eliminating the deferred closing of the reader, as the FSM is stateless and does not require state restoration.

* Update setupRaftStorage function to return a FileSnapshotStore instead of a generic SnapshotStore, enhancing the storage setup for Raft in the high availability service.

* PMM-13812 Refactor.

* PMM-13812 Lint.

* PMM-13812 Lint.

* PMM-13812 License.

* PMM-13812 Format.

* PMM-13812 Header.

* PMM-13812 License.

* PMM-13812 Lint.

* PMM-13812 Lint.

* PMM-13812 Lint.

* PMM-13812 Formatting.

* Refactor addMemberlistNodeToRaft method to improve readability by extracting server address formatting into a separate variable. This change enhances code clarity in the high availability service's Raft implementation.

---------

Co-authored-by: Jiří Čtvrtka <[email protected]>
Co-authored-by: Jiří Čtvrtka <[email protected]>
Co-authored-by: Alex Demidoff <[email protected]>
Co-authored-by: Copilot <[email protected]>
@percona percona deleted a comment from it-percona-cla Oct 16, 2025
@CLAassistant
Copy link

CLAassistant commented Oct 23, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ JiriCtvrtka
✅ BupycHuk
✅ ademidoff
❌ Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants