Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Improve Retryer specs #86

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 12 additions & 14 deletions lib/temporal/client/retryer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,17 @@ module Retryer
# List pulled from RpcRetryOptions in the Java SDK
# https://github.com/temporalio/sdk-java/blob/ad8831d4a4d9d257baf3482ab49f1aa681895c0e/temporal-serviceclient/src/main/java/io/temporal/serviceclient/RpcRetryOptions.java#L32
# No amount of retrying will help in these cases.
def self.do_not_retry_errors
[
GRPC::AlreadyExists,
GRPC::Cancelled,
GRPC::FailedPrecondition,
GRPC::InvalidArgument,
# If the activity has timed out, the server will return this and will never accept a retry
GRPC::NotFound,
GRPC::PermissionDenied,
GRPC::Unauthenticated,
GRPC::Unimplemented,
]
end
DO_NOT_RETRY_ERRORS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

GRPC::AlreadyExists,
GRPC::Cancelled,
GRPC::FailedPrecondition,
GRPC::InvalidArgument,
# If the activity has timed out, the server will return this and will never accept a retry
GRPC::NotFound,
GRPC::PermissionDenied,
GRPC::Unauthenticated,
GRPC::Unimplemented,
].freeze

# Used for backoff retries in certain cases when calling temporal server.
# on_retry - a proc that's executed each time you need to retry
Expand All @@ -33,7 +31,7 @@ def self.with_retries(times: DEFAULT_RETRIES, on_retry: nil, &block)
loop do
begin
return yield
rescue *do_not_retry_errors
rescue *DO_NOT_RETRY_ERRORS
raise
rescue => e
raise e if retry_i >= times
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,37 +78,30 @@
expect(on_retry_calls).to equal(retries)
end

{
GRPC::AlreadyExists => false,
GRPC::Cancelled => false,
GRPC::FailedPrecondition => false,
GRPC::InvalidArgument => false,
GRPC::NotFound => false,
GRPC::PermissionDenied => false,
GRPC::Unauthenticated => false,
GRPC::Unimplemented => false,
StandardError => true,
GRPC::Unknown => true,
GRPC::Unavailable => true
}.each do |error_class, expect_retry|
it "#{expect_retry ? 'does' : 'does not'} retry #{error_class}" do
on_retry_calls = 0
retried = false
on_retry = Proc.new {
on_retry_calls += 1
}

begin
described_class.with_retries(on_retry: on_retry) do
if !retried
retried = true
raise error_class.new("nope")
described_class::DO_NOT_RETRY_ERRORS.each do |error_class|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advise against this change because it assumes described_class::DO_NOT_RETRY_ERRORS is specified correctly and digs into implementation details. I'd rather the test be "adversarial" and assert what correct behavior should be.
Also, AFAICT, it no longer tests the cases where we do want to retry. It seems wrong not to test the other side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewhoskins-stripe do you have a failure mode in mind? Would it be someone adding an error to the list that should be retryable instead? But there isn't a great way to test against that unless you run test all possible error classes, which is not a real solution.

The retry case is covered by the spec above this one, but I see now that it's a bit different, I'll add the missing bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking at this again… the retry case is identical to the spec above because of the if !retried escape hatch — in the case when a retry is expected the error wasn't getting raised outside of the .with_retries block, so therefore not picked up by rescue and not checked. This is however covered by a few specs above, e.g. "backs off and stops retrying eventually"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not the same test. In one case, you retry a few times. In this one, you don't retry at all.

Copy link
Contributor

@drewhoskins-stripe drewhoskins-stripe Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a failure mode in mind?

Yep, it's a bit subtle but worth discussing.
Tests are the documentation of what the code is expected to do. They represent the "contract."
If a test of the public-facing API changes, a code reviewer should scrutinize it more.
Suppose the retryer is used in several locations. Perhaps an author removes one of these errors because, for their place where the retryer is being used, we should retry.
A test should fail to alert the author (and code reviewer) that a key contract is being changed, and can go look at where the retryer is used. They can then have the conversation of whether the retryer should have a dynamic list of errors that's passed in.

Copy link
Contributor Author

@antstorm antstorm Jul 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not the same test. In one case, you retry a few times. In this one, you don't retry at all.

I don't get it. In the current version you would retry once and then throw, which is no different from say "can succeed after retries" test case or "executes the on_retry callback" if the intention is to count callback calls. What exact use-case do you think is not covered after the proposed change?

I agree that having a "contract" in place is beneficial, it practise what would happen is whoever removes an error from the list will also update the spec with an equal change. But there's another side to it — someone might add an error to the list and not update the spec. In this case the spec will not fail and the contract gets broken. That can probably be solved by a specific test that checks which exact errors are listed as non-retriable

context "when #{error_class} is raised" do
let(:error) { error_class.new('nope') }
let(:on_retry) { Proc.new {} }

it 'does not retry' do
calls = 0

expect do
described_class.with_retries do
calls += 1
raise error
end
end
rescue => e
expect(e.class).to eq(error_class)
ensure
expect(on_retry_calls).to equal(expect_retry ? 1 : 0)
end.to raise_error(error)

expect(calls).to eq(1)
end

it 'does not call on_retry' do
allow(on_retry).to receive(:call).and_call_original

described_class.with_retries(on_retry: on_retry) { raise error } rescue nil

expect(on_retry).not_to have_received(:call)
end
end
end
Expand Down