Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable retry mechanism to broadcast interface #901

Merged
merged 5 commits into from
Sep 30, 2024

Conversation

gwbaik9717
Copy link
Contributor

@gwbaik9717 gwbaik9717 commented Sep 17, 2024

What this PR does / why we need it?

This PR introduces a broadcast interface with configurable retry mechanisms. In the event of a network failure, the broadcast will automatically retry based on the provided settings.

Any background context you want to provide?

Here is the updated broadcast interface. The BroadcastOptions interface includes optional settings for error callbacks, maxRetries. Retry options apply only to network errors. (retries will not occur for non-network errors such as broadcasting unserializable payloads.)

  // document.ts

  public broadcast(topic: string, payload: Json, options?: BroadcastOptions) {
      // skipped
  }
export interface BroadcastOptions {
  /**
   * `error` is called when an error occurs.
   */
  error?: ErrorFn;

  /**
   * `maxRetries` is the maximum number of retries.
   */
  maxRetries?: number;  // Default is 0 meaning it won't retry on error.
}
  // usage
  doc.broadcast("YOUR_TOPIC",  "SERIALIZABLE_PAYLOAD", {
      maxRetries: 10,
      error: (error) => {
         console.log(error) 
     } 
  })

I'm concerned that the maxRetries option is not as intuitive as the shouldQueueEventIfNotReady option provided by Liveblocks.

We could also provide the same option by two possible solutions:

  1. The simplest approach would be to set maxRetries to Infinity, though this may lead to performance issues.

  2. Another approach is to implement a loop similar to runSyncLoop for broadcasts, continuously checking if the network is restored. However, adding a separate loop for broadcasting feels redundant at this point. Therefore, I've opted to keep the solution as simple as possible for now.

What are the relevant tickets?

Fixes #891

Checklist

  • Added relevant tests or not required
  • Didn't break anything

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced DefaultBroadcastOptions for enhanced broadcasting settings, including retry logic.
    • Added flexibility in the broadcasting method to handle options for error handling and retries.
  • Bug Fixes

    • Improved error handling during network failures with refined retry capabilities.
  • Tests

    • Added new test cases to validate broadcasting behavior under network failure scenarios, ensuring correct retry logic.

This commit introduces error handling for network failures by implementing a retry mechanism. The broadcasting operation will now attempt to resend the message up to a defined maximum number of retries with a defined retry interval when a network error occurs, ensuring resilience against temporary connectivity issues.
Copy link

coderabbitai bot commented Sep 17, 2024

Walkthrough

The changes introduce a new constant, DefaultBroadcastOptions, and modify the broadcast methods in the Client class to accept an optional options parameter. This enhancement allows for configurable retry logic during network failures. Additionally, new test cases are added to validate the broadcasting behavior under various network conditions, ensuring robust error handling and improved reliability.

Changes

File Change Summary
packages/sdk/src/client/client.ts Added DefaultBroadcastOptions and modified broadcast method to accept an options parameter.
packages/sdk/test/integration/client_test.ts Updated tests to include Json type and added cases for retry logic during network failures.

Assessment against linked issues

Objective Addressed Explanation
Exception handling for network offline state ( #891 )
Provide a method to return specific error for broadcast API failures ( #891 )
Implement a mechanism to queue broadcast requests and retry ( #891 ) Changes do not include a queuing mechanism for requests.

Possibly related PRs

Poem

In the meadow where bunnies play,
New options for broadcasts come our way.
With retries in tow, we hop with glee,
Handling errors as smooth as can be!
So let's cheer for the code that's bright,
Broadcasting joy, from morning till night! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 950b89a and 5608e24.

📒 Files selected for processing (1)
  • packages/sdk/test/integration/client_test.ts (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/sdk/test/integration/client_test.ts

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gwbaik9717 gwbaik9717 requested a review from sejongk September 21, 2024 02:24
@gwbaik9717
Copy link
Contributor Author

gwbaik9717 commented Sep 24, 2024

@hackerwins

There are a few important topics that need further discussion:

1. Individual Retries vs. Queue-based Retry Mechanism:
Currently, the retry mechanism is implemented in a way where each request manages its own retry logic, with a set interval between retries. This was designed to respect the interval for each request.

However, @sejongk proposed an alternative approach: instead of retrying each request individually, why not queue up failed requests and process them once the network is reconnected? Since these events need to be sent anyway once the network is available, a queue-based retry mechanism could potentially be more efficient. Also, we could keep the interface simple as Liveblocks.

{
    shouldQueueEventIfNotReady: bool
}

In this approach, we might need a separate loop similar to current runSyncLoop to monitor the network status and flush the queue as soon as the connection is restored. Here is the pseudo code.

broadcastRetryQueue = []  // Global queue for failed requests

Broadcast() {
  response = rpcClient.Broadcast()

   // Add to queue if retryable
  if ConnectionError(response):
    broadcastRetryQueuequeue.append(request) 
}

runBroadcastLoop() {
  while true {
    if NetworkAvailable():
      ProcessQueue()
    delay = ExponentialBackoff(attempts)
    sleep(delay)
  }
}

It might also be a good idea to apply Exponential Backoff for determining delay, as network connectivity issues can persist for varying durations. Gradually increasing the retry interval can prevent overwhelming the network with frequent retries and give the system time to recover before the next attempt. Here is the reference from AWS SDK

seconds_to_sleep_i = min(b*r^i, MAX_BACKOFF)
In the preceding algorithm, the following values apply:

b = random number within the range of: 0 <= b <= 1

r = 2

MAX_BACKOFF = 20 seconds for most SDKs. See your specific SDK guide or source code for confirmation.

2. Broadcast Dependency Issue:
Another topic for discussion is whether or not we should guarantee the order of broadcast events. Currently, there are no explicit measures in place to ensure this. If there is any dependency between the events broadcasted (such as in a chat application), even with a queue, the current structure does not ensure that they will be delivered in the correct order. If maintaining the order of broadcast events is critical, additional steps will be required to address this issue.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (5)
packages/sdk/src/document/document.ts (3)

87-97: Approved: New BroadcastOptions interface looks good.

The new BroadcastOptions interface aligns well with the PR objectives, providing configurable retry mechanisms for handling network failures. The optional error callback and maxRetries properties offer flexibility for users to customize error handling and retry behavior.

Consider adding a @see tag in the JSDoc comment to link to related interfaces or methods, such as the broadcast method, for improved documentation navigation.


412-412: Approved: LocalBroadcastEvent interface update is consistent.

The addition of the optional options property of type BroadcastOptions to the LocalBroadcastEvent interface is consistent with the changes made to the BroadcastEvent interface. This update allows local broadcast events to include retry options, maintaining consistency across the broadcast-related interfaces.

For improved consistency with the BroadcastEvent interface, consider adding a similar JSDoc comment to explain the purpose of the options property in the LocalBroadcastEvent interface.


2090-2097: Approved: broadcast method update implements retry options.

The broadcast method has been successfully updated to accept an optional options parameter of type BroadcastOptions. This change aligns with the PR objectives to introduce configurable retry mechanisms for handling network failures. The method correctly includes the options in the LocalBroadcastEvent, allowing the retry logic to be applied to broadcast operations.

Consider the following improvements to enhance the method's robustness and usability:

  1. Add input validation for the topic and payload parameters to ensure they meet any required criteria (e.g., non-empty topic, valid JSON payload).
  2. Implement error handling within the method to catch and process any exceptions that might occur during the broadcast operation.
  3. If there are default retry options, consider merging them with the provided options to ensure consistent behavior.

Example implementation:

public broadcast(topic: string, payload: Json, options?: BroadcastOptions) {
  if (!topic || typeof topic !== 'string') {
    throw new Error('Invalid topic: must be a non-empty string');
  }

  // Merge with default options if necessary
  const mergedOptions = { ...DEFAULT_BROADCAST_OPTIONS, ...options };

  try {
    const broadcastEvent: LocalBroadcastEvent = {
      type: DocEventType.LocalBroadcast,
      value: { topic, payload },
      options: mergedOptions,
    };

    this.publish([broadcastEvent]);
  } catch (error) {
    if (mergedOptions.error) {
      mergedOptions.error(error);
    } else {
      // Default error handling
      console.error('Broadcast failed:', error);
    }
  }
}

This implementation adds basic input validation, error handling, and merges the provided options with default options (if any).

packages/sdk/src/client/client.ts (2)

167-171: Specify units in comments for clarity.

To improve code readability, consider specifying the units (e.g., milliseconds) for initialRetryInterval and maxBackoff in the comments.


659-701: Consider implementing a queue-based retry mechanism for broadcasts.

As discussed in the PR comments, an alternative approach is to implement a queue-based retry mechanism. This could improve efficiency by managing retries globally and handling network reconnection events more effectively. It may also simplify the interface and provide better control over the order of broadcast events.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7e192e0 and 7cd4c86.

📒 Files selected for processing (3)
  • packages/sdk/src/client/client.ts (5 hunks)
  • packages/sdk/src/document/document.ts (6 hunks)
  • packages/sdk/test/integration/client_test.ts (7 hunks)
🔇 Additional comments not posted (10)
packages/sdk/src/document/document.ts (1)

406-406: Approved: BroadcastEvent interface update is consistent.

The addition of the optional options property of type BroadcastOptions to the BroadcastEvent interface is consistent with the PR objectives. This change allows for the inclusion of retry options in broadcast events, enhancing the flexibility of the broadcast functionality.

packages/sdk/src/client/client.ts (3)

46-46: Import of Json and BroadcastOptions is appropriate.

The necessary types are correctly imported from '@yorkie-js-sdk/src/document/document'.


319-322: Event handler correctly handles optional options and error callback.

The use of optional chaining ensures safe access to event.options and its properties without causing errors if they are undefined.


622-622: Addition of optional options parameter to broadcast method is appropriate.

This change maintains backward compatibility while allowing users to customize the broadcast behavior with retry options.

packages/sdk/test/integration/client_test.ts (6)

36-36: Importing 'Json' enhances type safety

The addition of import { Json } from '@yorkie-js-sdk/src/document/document'; is appropriate since you're updating EventCollector instances to use Json instead of any, improving type safety.


906-908: Update 'broadcast' method call to use options object is correct

The broadcast method now correctly accepts an options object with an error handler. This change aligns with the updated method signature and enhances readability.


920-920: Updating EventCollector to use 'Json' improves type safety

Changing the EventCollector type parameter from any to Json ensures that the payload conforms to expected JSON structures, enhancing type safety.


951-951: Updating EventCollector to use 'Json' improves type safety

Changing the EventCollector type parameter from any to Json ensures that the payload conforms to expected JSON structures, enhancing type safety.


986-986: Updating EventCollector to use 'Json' improves type safety

Changing the EventCollector type parameter from any to Json ensures that the payload conforms to expected JSON structures, enhancing type safety.


1024-1025: Updating EventCollectors to use 'Json' improves type safety

Changing eventCollector1 and eventCollector2 type parameters from any to Json ensures that the payloads conform to expected JSON structures, enhancing type safety.

packages/sdk/src/client/client.ts Show resolved Hide resolved
packages/sdk/src/client/client.ts Outdated Show resolved Hide resolved
packages/sdk/src/client/client.ts Show resolved Hide resolved
packages/sdk/src/client/client.ts Outdated Show resolved Hide resolved
packages/sdk/test/integration/client_test.ts Outdated Show resolved Hide resolved
packages/sdk/test/integration/client_test.ts Outdated Show resolved Hide resolved
@gwbaik9717 gwbaik9717 force-pushed the broadcast-error-handling branch from 7cd4c86 to 67db886 Compare September 25, 2024 23:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
packages/sdk/src/client/client.ts (3)

164-171: LGTM! Consider making broadcast options configurable at the client level.

The addition of DefaultBroadcastOptions is a good practice for providing default configuration values. However, to increase flexibility, consider making these options configurable at the client level. This would allow users to set their preferred default values for all broadcast operations when initializing the client.

You could modify the ClientOptions interface to include these broadcast options:

export interface ClientOptions {
  // ... existing options ...
  broadcastOptions?: {
    maxRetries?: number;
    initialRetryInterval?: number;
    maxBackoff?: number;
  };
}

Then, in the Client constructor, you could merge these with the default options:

constructor(rpcAddr: string, opts?: ClientOptions) {
  // ... existing code ...
  this.broadcastOptions = {
    ...DefaultBroadcastOptions,
    ...opts?.broadcastOptions,
  };
  // ... rest of the constructor
}

This approach would provide more flexibility while still maintaining sensible defaults.


Line range hint 619-701: LGTM! Retry mechanism improves robustness, but there's a minor issue.

The addition of a retry mechanism with exponential backoff is a great improvement to the broadcast method. It enhances the robustness of the system by handling transient network issues. The error handling distinguishing between retryable and non-retryable errors is also well implemented.

However, there's a potential off-by-one error in the retry count increment:

The retryCount is incremented after scheduling the retry, which means the same retryCount is used for multiple retries. This affects the backoff timing calculation. To fix this, move the increment before scheduling the retry:

if (retryCount < maxRetries) {
+   retryCount++;
    setTimeout(() => doLoop(), exponentialBackoff(retryCount - 1));
-   retryCount++;
    logger.info(
      `[BC] c:"${this.getKey()}" retry attempt ${retryCount}/${maxRetries}`,
    );

Also, consider optimizing the exponential backoff calculation by memoizing the exponentialBackoff function or pre-calculating the backoff times. This can be especially beneficial if maxRetries is large:

const backoffTimes = Array.from({ length: maxRetries }, (_, i) =>
  Math.min(DefaultBroadcastOptions.initialRetryInterval * 2 ** i, maxBackoff)
);

// Then in the retry logic:
setTimeout(() => doLoop(), backoffTimes[retryCount - 1]);

This optimization reduces repeated calculations of the backoff time.


46-46: LGTM! Consider using object destructuring for cleaner default value assignment.

The addition of the import statement for Json and BroadcastOptions types is necessary and correct. The use of options with fallback to default values is also a good practice.

To make the code slightly more concise and easier to read, consider using object destructuring with default values:

const { maxRetries = DefaultBroadcastOptions.maxRetries, maxBackoff = DefaultBroadcastOptions.maxBackoff } = options ?? {};

This approach combines the retrieval of values from options and the assignment of default values in a single, clear statement.

Also applies to: 645-647

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7cd4c86 and 67db886.

📒 Files selected for processing (3)
  • packages/sdk/src/client/client.ts (5 hunks)
  • packages/sdk/src/document/document.ts (6 hunks)
  • packages/sdk/test/integration/client_test.ts (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/sdk/src/document/document.ts
  • packages/sdk/test/integration/client_test.ts
🔇 Additional comments not posted (1)
packages/sdk/src/client/client.ts (1)

Line range hint 1-1180: Overall assessment: Significant improvements to broadcast functionality with minor optimization opportunities.

The changes in this file greatly enhance the robustness of the broadcast functionality by introducing a retry mechanism with exponential backoff and improved error handling. The addition of DefaultBroadcastOptions and the option to customize these in the broadcast method provide good flexibility.

Key improvements:

  1. Retry mechanism for handling transient network issues.
  2. Exponential backoff strategy to avoid overwhelming the server.
  3. Distinction between retryable and non-retryable errors.

Suggested optimizations:

  1. Fix the potential off-by-one error in retry count increment.
  2. Consider pre-calculating backoff times for performance.
  3. Make broadcast options configurable at the client level for greater flexibility.
  4. Use object destructuring for cleaner default value assignment.

These changes significantly improve the system's resilience to network issues while maintaining a clean and flexible API. The suggested optimizations, if implemented, would further enhance the code's efficiency and maintainability.

@gwbaik9717
Copy link
Contributor Author

gwbaik9717 commented Sep 25, 2024

@sejongk @hackerwins

We've decided to to proceed with the current implementation (Individual Retries) for now. We can test it out and make improvements as we go.

Additionally, I’ve removed the retryInterval option due to performance concerns. Instead, I’ve implemented a exponential backoff mechanism for retries. This simplifies the interface and improves efficiency by dynamically adjusting the retry timing based on the current attempt.

Lastly, I've set default maxRetries to Infinity. Unless user sets it to 0 or less, the retry mechanism will continue attempting the operation up to the specified number of retries before stopping.

@gwbaik9717 gwbaik9717 requested a review from sejongk September 28, 2024 07:42
Copy link
Contributor

@sejongk sejongk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@hackerwins hackerwins changed the title Exception Handling for Network Offline State in Broadcast API Add configurable retry mechanism to broadcast interface Sep 30, 2024
@hackerwins hackerwins merged commit 9bf42e4 into main Sep 30, 2024
2 checks passed
@hackerwins hackerwins deleted the broadcast-error-handling branch September 30, 2024 11:03
@krapie
Copy link
Member

krapie commented Sep 30, 2024

I hope to see many use cases of broadcasting of Yorkie! 😄

JOOHOJANG pushed a commit that referenced this pull request Oct 22, 2024
Implement a new BroadcastOptions interface to allow for automatic retries
on network failures during broadcasts. This enhancement improves resilience
against temporary network issues, ensuring more reliable message delivery.
The maxRetries option allows users to control retry behavior, with a default
of 0 (no retries). Only network errors trigger retries; other errors, such
as unserializable payloads, will not initiate retry attempts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Exception Handling for Network Offline State in Broadcast API
4 participants