Skip to content

PHOENIX-7567 Replication Log Writer (Synchronous mode) #2144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: PHOENIX-7562-feature
Choose a base branch
from

Conversation

apurtell
Copy link
Contributor

@apurtell apurtell commented May 9, 2025

No description provided.

@apurtell apurtell requested review from kadirozde and Copilot May 9, 2025 17:49
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a synchronous replication log writer using a Disruptor-based mechanism for efficient and reliable log appends and syncs. Key changes include:

  • Adding the Disruptor dependency and version tag in the Maven POM files.
  • Implementing the new ReplicationLogWriter with asynchronous append and sync logic plus retry/rotation on failures.
  • Introducing comprehensive test cases for replication log writer and log manager behavior.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pom.xml Added the disruptor version property and dependency for the core module.
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogWriterTest.java New tests covering synchronous sync, retry behavior, and blocking semantics.
phoenix-core/src/test/java/org/apache/phoenix/replication/ReplicationLogManagerTest.java Tests validating log rotation based on time, size, and stale writer exceptions.
phoenix-core-server/src/main/java/org/apache/phoenix/replication/ReplicationLogWriter.java Implementation of the Disruptor-based replication log writer with sync and retry logic.
phoenix-core-server/pom.xml Added disruptor dependency for the server module.

* happen if the log file writer is unable to make progress, due to a HDFS level disruption.
* Should we enter that condition this method will block until the append can be inserted.
* @param tableName The name of the HBase table the mutation applies to.
* @param commitId The commit identifier (e.g., SCN) associated with the mutation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this basically the timestamp the coproc assigns to the mutation ?

Copy link
Contributor Author

@apurtell apurtell May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The mutation and the cells in the mutation carry that timestamp. The "commit identifier" is a Phoenix level transaction id. It can be whatever Phoenix needs it to be. If we don't need this concept at this layer it can be removed. This is defined in the design document, and is part of the replication log spec. (It can also be removed from those places.)

@apurtell
Copy link
Contributor Author

apurtell commented May 14, 2025

I had in mind two classes with separation of concerns, *Manager (lifecycle) and *Writer (disruptor), but my thinking has evolved since I started hacking on this. Actually it has been drifting for a couple of days but it didn't come together until today. There will be a single class ReplicationLog (at least initially). This will make more sense.

ReplicationLog log = ReplicationLog.get();
try {
    log.append(...);
    log.append(...);
    log.append(...);
    log.sync();
    return;
} catch (IOException e) {
    // Decide what to do about the exception. Abort, possibly.
}

I figure it will be easier to refactor this as we add more features, like store-and-forward, then to start with a half baked idea.

@apurtell
Copy link
Contributor Author

d56a9ae unifies the code into ReplicationLog and the new ReplicationLogTest contains the combined unit tests.

@apurtell apurtell requested a review from Copilot May 14, 2025 19:52
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds functionality for synchronous replication in the log writer by introducing a new generation field, and it integrates the Disruptor library as a dependency.

  • Added disruptor dependency in both root and module pom.xml files
  • Introduced a generation field with corresponding getter and setter in LogFileWriter.java, and updated the toString method to include this field

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File Description
pom.xml Added a new property and dependency for the Disruptor library
phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileWriter.java Introduced a generation field, including its getter, setter, and inclusion in the toString method
phoenix-core-server/pom.xml Added the Disruptor dependency without an explicit version, likely relying on dependency management
Comments suppressed due to low confidence (1)

phoenix-core-server/pom.xml:182

  • The disruptor dependency is added without an explicit version; ensure that version management is handled via a parent POM or dependency management section to avoid unexpected upgrades.
        <dependency>

@apurtell
Copy link
Contributor Author

I think the basic test coverage is pretty good now and I will probably take this out of draft status today. Let me think if there are a couple more test cases that would make sense.

Here is the test coverage to date:

Basic Append and Sync:

testAppendAndSync(): Verifies that multiple appends are correctly batched and written to the underlying LogFileWriter upon sync(), and that the order of operations (appends then sync) is maintained.

Failure Handling and Retry (Append/Sync):

testAppendFailureAndRetry(): Simulates an append failure on the LogFileWriter. It checks that the ReplicationLog correctly rotates to a new writer and retries the append (and subsequent sync) on the new writer.

testSyncFailureAndRetry(): Simulates a sync failure on the LogFileWriter. It verifies that after an append, a failed sync leads to a writer rotation, and the original batch (including the failed append) is replayed and synced on the new writer.

Backpressure (Ring Buffer Full):

testBlockingWhenRingFull(): Tests the backpressure mechanism. It simulates a slow consumer (inner LogFileWriter) to fill the Disruptor ring buffer and confirms that subsequent append() calls block until space is available, then eventually succeed.

Sync Timeout:

testSyncTimeout(): Simulates a LogFileWriter.sync() operation that takes longer than the configured timeout. It verifies that the ReplicationLog.sync() call throws an IOException caused by a TimeoutException.

Concurrent Producers:

testConcurrentProducers(): Tests concurrent appends from multiple threads, ensuring all records are eventually processed and written by the LogFileWriter in the correct order (verified by commitId).

Time-Based Rotation:

testTimeBasedRotation(): Verifies that the log rotates automatically after the configured time interval, even if no new appends trigger a size check.

testRotationTask(): Specifically tests the LogRotationTask runnable, ensuring it rotates the log when the time threshold is met.

Size-Based Rotation:

testSizeBasedRotation(): Checks that the log rotates when the current writer's file size exceeds the configured size threshold.

Close Behavior:

testClose(): Ensures that close() flushes pending appends, closes the underlying writer, and prevents further operations. Also checks for idempotency of close().

Error Handling during Rotation/Initialization:

testFailedRotation(): Simulates a failure during the creation of a new writer during rotation. It verifies that the ReplicationLog continues to use the old writer for subsequent appends and syncs, rather than failing completely.

testEventProcessingException(): Simulates a RuntimeException during the LogEventHandler's processing of an append event. It verifies that the LogExceptionHandler catches this, closes the ReplicationLog (and its writer), and subsequent operations fail.

testSyncFailureAllRetriesExhausted(): Simulates persistent sync failures across multiple retries (and rotations). It verifies that the sync operation eventually fails with a timeout after exhausting retries.

testFsInitFailure(): Simulates a failure during FileSystem.mkdirs() when initializing the ReplicationLog. Verifies that init() throws an IOException.

Rotation During Batch:

testRotationDuringBatch(): Tests the scenario where a rotation occurs while a batch of appends is in-flight (i.e., published to the ring buffer but not yet processed by the event handler). It verifies that the in-flight batch is correctly replayed to the new writer after rotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants