Skip to content

HIVE-28976 : Enhance Commit message in notification_log to correctly filter events during incremental replication #5841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

harshal-16
Copy link
Contributor

@harshal-16 harshal-16 commented Jun 3, 2025

Details:

  • Added write ID, database name into commit & abort message
  • Updated filter in CommitTxnHandler to utilise it during replication dump process

Testing:

  • Added test cases
  • Tested on YCloud

What changes were proposed in this pull request?

  • In commit & abort message in notification_log will have additional write IDs and databases details to be used by replication.
  • During Replication dump, if hive.repl.filter.transactions is true then, it will use these details there are some DDLs which needs new write ID but it does not add an entry into txn_write_notification_log table which was only source of truth earlier to figure out whether that txn was read / write related.

Why are the changes needed?

  • In presence of hive.repl.filter.transactions property, replication will fail on the DR side post 11 days if user runs Truncate operation (or any other which implements DDLDescWithWriteId). Reason being it generates alloc_write_id event in notification_log table. In presence of above filter on source side we want to filter out commit_txn which are associated with read operation or it's not related to database in replication.
  • To decide this we rely on another historical table txn_write_notification_log, now, in insert and other DML operatioins it does get an entry. But for truncate / DDLDescWithWriteId it doesn't get any entry into that which leads to skipping the commit operation on the source and on the DR side, we open the implicit txn if alloc_write_id event is present and we wait for commit / abort to come from the source
  • This results into txn being open for a long time and then it eventually times out and adds repl_incompatible flag on the database on the DR side
  • So, now there are 2 options to address this issue. Either add an dummy entry into txn_write_notification_log, which may open up some other issue and we will have to keep on monitoring the code change because if hive adds any new command which implements DDLDescWithWriteId then it can break the replication again in future
  • The 2nd option is to enhance the commit message, which we are storing in the notifiation_log and add some additional context, which replication dump can use to figure it out.

Does this PR introduce any user-facing change?

  • No

How was this patch tested?

  • Added test cases
  • Tested on actual cluster

@harshal-16 harshal-16 marked this pull request as ready for review June 5, 2025 05:54
@harshal-16 harshal-16 changed the title CDPD-83520 : Enhance Commit message in notification_log to correctly filter events during incremental replication HIVE-28976 : Enhance Commit message in notification_log to correctly filter events during incremental replication Jun 5, 2025
@deniskuzZ
Copy link
Member

@harshal-16, please fill out all the details in the PR summary

* Details:
  * Commit txn of alter mv command was getting populated by write event infos.
  * This was causing a repl failure on load.
  * This is now fixed.

* Testing:
  * Added itest.
 to correctly filter events during incremental replication

Details:
  * Added write ID, database name into commit & abort message
  * Updated filter in CommitTxnHandler to utilise it during replication dump process

Testing:
  * Added test cases
  * Tested on live cluster
Copy link

Copy link
Contributor

@pudidic pudidic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Looks good to me.


Assert.assertEquals(999L, commitTxnMsg.getTxnId().longValue());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the blank lines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants