Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hanging requests with filtered steal #3016

Conversation

Razz4780
Copy link
Contributor

@Razz4780 Razz4780 commented Jan 13, 2025

So...

This started as a small refactor in order with some hope to fix the hanging requests issue. I could not find any bug that could cause the problem and only later I found out that there was no problem. The repro application I was using was handling only one connection at a time and the first HTTP connection was not closed by the k8s proxy (most probably to be reused later). And so the second request would hang on intproxy's HTTP handshake attempt. Since we want to be user friendly, this PR introduces reusing local HTTP connections, which solves the problem. However, since it started as a refactor, it's big. Sorry.

Changes summarized:

  1. StreamingBody was moved from mirrord-protocol to mirrord-intproxy without any notable changes. There was no need for it to be in the protocol crate.
  2. BodyExt trait in mirrord-protocol was renamed to BatchedBody. The only notable change is moving from custom Future implementation (FramesFut) to using now_or_never. I was afraid of it in the past, now I'm not. I tested this with heavy load and did not detect any difference. Using now_or_never simplifies things, because some code no longer needs to be async
  3. All requests in the intproxy are now of HttpRequest<StreamingBody> type, to remove ugly generics and match expressions. HttpRequestFallback enum, along with lots of conversion code, was removed from mirrord-protocol.
  4. HttpResponseFallback type was moved to the agent without any notable changes. There was no need for it to be in the protocol crate.
  5. ReversePortForwarder and its tests were fixed. It was never streaming responses' bodies, because IncomingProxy was not notified about agent protocol version. This change is not related to the issue, but the problem came up in the CI.
  6. Removed h2::Error::is_reset check and the dependency on h2 completely. Instead of checking if the HTTP error is transient, we check if it's not transient (using hyper::Error methods, e.g hyper::Error::is_user). I think it's simpler and safer, since retrying a request is not harmful.
  7. Added a simple BoundTcpSocket struct to the incoming proxy, which wraps logic for binding the same interface as user socket. Now we can actually see the bound socket address in tracing.
  8. Added a ClientStore struct that caches unused local HTTP connection and cleans them up after some timeout.
  9. HTTP requests stolen with a filter are now handled completely independently on the local side. Since HTTP is stateless, this is fine. Each HTTP request has its own dedicated HttpGatewayTask inside IncomingProxy. To reuse connections, they share a ClientStore instance.
  10. Improved how connections stolen/mirrored in whole are now handled in the IncomingProxy. Each connection is handled by its own TcpProxyTask. The task knows whether the connection is stolen or mirrored. If it's mirrored, the data is no longer being sent to the main IncomingProxy task, it is immediately discarded. If it's stolen, the connection is no longer artificially kept alive until silent for a second (this mechanism makes sense only with the mirror mode, can introduce weird behavior in steal mode).
  11. Interceptor task removed completely, now we have two separate tasks: HttpGatewayTask and TcpProxyTask
  12. MetadataStore was moved to its own module without any notable changes
  13. Added a unit test that verifies connection reuse (original issue)
  14. IncomingProxy now optimizes HTTP response variant. If the whole response body is available when the response head is received, we no longer send the chunked response variant. Instead we respond with the framed variant. This allows us to use only one mirrord_protocol message.
  15. IncomingProxy now does subscription checks when receiving a new connection/request. If we receive a connection/request on a remote port that we no longer subscribe, we unsubscribe immediately, without attempting to connect to the user application.
  16. Improved tracing around IncomingProxy, e.g added time spent on polling response frames

@Razz4780 Razz4780 force-pushed the michals/mbe-649-filtered-steal-hangs-on-playground branch from edb6727 to 22ce712 Compare January 13, 2025 11:04
@Razz4780 Razz4780 marked this pull request as ready for review January 13, 2025 11:06
@Razz4780 Razz4780 changed the title Fix hanging requests with filtered steal Fix hanging requests with filtered steal (WIP) Jan 13, 2025
@Razz4780 Razz4780 force-pushed the michals/mbe-649-filtered-steal-hangs-on-playground branch from 0c90fbf to 7508902 Compare January 16, 2025 21:32
@Razz4780 Razz4780 changed the title Fix hanging requests with filtered steal (WIP) Fix hanging requests with filtered steal Jan 16, 2025
@Razz4780 Razz4780 requested a review from meowjesty January 20, 2025 09:21
Copy link
Member

@meowjesty meowjesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even moar stuff

mirrord/intproxy/src/proxies/incoming.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/tasks.rs Outdated Show resolved Hide resolved
Copy link
Member

@meowjesty meowjesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally been through everything!

I have only nits, docs requests. The refactor seems to make the http stuff simpler, and that sparks joy.

mirrord/intproxy/src/proxies/incoming/http.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Outdated Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/http.rs Show resolved Hide resolved
mirrord/intproxy/src/proxies/incoming/metadata_store.rs Outdated Show resolved Hide resolved
Copy link
Member

@meowjesty meowjesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think there's a place with TPC instead of TCP, other than that

👍

@Razz4780 Razz4780 added this pull request to the merge queue Jan 21, 2025
Merged via the queue into metalbear-co:main with commit 2ee5a2c Jan 21, 2025
17 checks passed
@Razz4780 Razz4780 deleted the michals/mbe-649-filtered-steal-hangs-on-playground branch January 21, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants