Skip to content

Conversation

teohhanhui
Copy link

@teohhanhui teohhanhui commented Jun 4, 2025

Motivation

See #300

Solution

Change Timeout middleware to return HTTP 503 Service Unavailable status code, as this is the most appropriate HTTP status code to the extent of its knowledge.

Returning HTTP 408 Request Timeout status code wrongly indicates a client error.

status code

Returning HTTP `408 Request Timeout` status code wrongly indicates a
client error.
@seanmonstar
Copy link
Collaborator

While in general this does seem like a better error code, it's not always true. For instance, if the inner service times out because it's currently waiting on req.body.collect(), then that correctly would be a 408.

Looking at the linked issue, we could probably solve this by adding some configuration to the service/layer, so you can customize the response code returned.

@teohhanhui
Copy link
Author

teohhanhui commented Jun 4, 2025

For instance, if the inner service times out because it's currently waiting on req.body.collect()

Doesn't that tie into TimeoutBody?

EDIT:

Uhh... no.

The timeout is enforced between consecutive Frame polls, and it resets after each poll. The total time to produce a Body could exceed the timeout duration without timing out, as long as no single interval between polls exceeds the timeout.


adding some configuration to the service/layer

We'd have to take in a Future, right?

@jplatte
Copy link
Member

jplatte commented Jun 4, 2025

How about deprecating fn new in favor of a new fn with_status_code? Then we can publish the change as non-breaking patch release and people will be encouraged to make a choice about the status code. I think I'd want the status code to be entirely free form, with docs suggesting to use 503 or 408.

@teohhanhui
Copy link
Author

teohhanhui commented Jun 4, 2025

But the problem is, if we just provide a configuration with a fixed status code, that still does not solve the problem of "which case is it actually"? Or in other words, there's no way for the caller to do the right thing as well.

@teohhanhui
Copy link
Author

Actually I believe we should challenge the entire assumption made here:

tower’s Timeout middleware uses an error to signal timeout, i.e. it changes the error type to BoxError. For HTTP services that is rarely what you want as returning errors will terminate the connection without sending a response.

If we look at MDN docs for HTTP 408 Request Timeout here:

Note: Some servers will shut down a connection without sending this message.

Or the RFC 9110 section on HTTP 503 Service Unavailable here:

Note: The existence of the 503 status code does not imply that a server has to use it when becoming overloaded. Some servers might simply refuse the connection.

The general expectation seems to be that servers don't have to return these status codes, and could simply refuse or close the connection.

So my proposal is, unless we have a way for the inner service to signal which kind of state it is currently in: ReadingRequest | ProcessingRequest | WaitingForUpstreamResponse, we should simply deprecate this middleware and refer users to the Timeout middleware in tower.

@teohhanhui
Copy link
Author

Further thoughts:

HTTP 503 Service Unavailable is the only generally correct status code in this case. HTTP 408 Request Timeout and HTTP 504 Gateway Timeout are both only possible to be handled by the inner service, and as such we should expect the inner service to set a timeout every time they need to:

  1. Read the request.
  2. Wait for an upstream server response.

(If it's important to return those specific HTTP status codes to distinguish where things hit a timeout.)

Conclusion: I don't think tower-http can solve this, nor is it useful to provide a way to set a fixed status code for the Timeout middleware. It'll just lead to the caller making their best guess (which by definition cannot be right because the situation is dynamic), which would be counterproductive.

Proposed solution: We should add documentation to suggest handling timeouts for the above 2 cases.

@FalkWoldmann
Copy link

How about deprecating fn new in favor of a new fn with_status_code? Then we can publish the change as non-breaking patch release and people will be encouraged to make a choice about the status code. I think I'd want the status code to be entirely free form, with docs suggesting to use 503 or 408.

Hi @jplatte, I really like this approach because I have the requirement of returning a 204 error on timeout. Should I open this proposal in a separate PR?

@jplatte
Copy link
Member

jplatte commented Aug 14, 2025

Feel free to open a new PR, yeah. But what the heck are you doing where timeout should return 204 (which is certainly not an error code)? 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants