Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP Steal/mirror doesn't work with shared sockets #864

Closed
aviramha opened this issue Dec 28, 2022 · 13 comments · Fixed by #2667
Closed

TCP Steal/mirror doesn't work with shared sockets #864

aviramha opened this issue Dec 28, 2022 · 13 comments · Fixed by #2667
Assignees
Labels
bug Something isn't working

Comments

@aviramha
Copy link
Member

aviramha commented Dec 28, 2022

Bug Description

When a parent process binds a socket, then children sockets listen/accept on that socket mirrord doesn't intercept those calls because the fd isn't managed (doesn't exist in our fork).
Using this issue to document use cases where this happens for now:

  • uvicorn when using --reload flag.
  • hypercorn

Steps to Reproduce

from fastapi import FastAPI
import sys
app = FastAPI()


@app.get("/")
async def root():
    print("cake")
    return {"message": "Hello World"}


@app.get("/hello/{name}")
async def say_hello(name: str):
    print("pake")
    return {"message": f"Hello {name}"}

run with mirrord exec --target pod/pod uvicorn -- --reload --port 80 main:app

Backtrace

No response

Relevant Logs

No response

Your operating system and version

macOS

Local process

python

Local process version

No response

Additional Info

No response

@aviramha aviramha added the bug Something isn't working label Dec 28, 2022
@aviramha
Copy link
Member Author

Possible solutions:

  1. hooking fork/exec, then passing down the SOCKET/FILES fd struct as environment variable for fds that aren't CLOEXEC
  2. Use shared memory (shmem) for SOCKET/FILES (dunno about this..)
  3. on listen with missing fd (this happens on uvicorn + reloader case) use getsockname to resolve addr and then decide if to add the socket and backfill the information. This is pretty good approach but has some caveats:
  • It would work only on the listen case, not if listen is called in parent and accept in child (not sure if this happens and what happens in this case)
  • the socket information we would get from getsockname would be the "fake port" and not the one the remote would need to listen for, so maybe this is impossible ;(

@aviramha
Copy link
Member Author

Please let us know if this is impacting anyone so we can know if we need to prioritize it.

@Razz4780 Razz4780 self-assigned this Jul 12, 2023
@danielloader
Copy link

danielloader commented Sep 10, 2023

This would probably affect me if I were not already blocked by this issue

@yoav-orca
Copy link

I'm not blocked by this, but it does affect my productivity.
MacOS M1 Pro

@aviramha
Copy link
Member Author

aviramha commented Nov 8, 2023

I'm not blocked by this, but it does affect my productivity.

MacOS M1 Pro

Hey, does it still happen to you on latest version?

@yoav-orca
Copy link

I'm running 3.75.0 and I'm not getting traffic with --reload and I do get traffic without it

@aviramha
Copy link
Member Author

aviramha commented Jan 1, 2024

Also doesn't work when using Uvicorn with more than one worker process.

@surbas
Copy link

surbas commented Jan 22, 2024

Happening to me on 3.84.1. I am not using the operator. Not sure if that matters.

@brandfocus
Copy link

Ran into the same issue. Might be worth flagging this in the docs somewhere until its fixed

@meowjesty meowjesty self-assigned this Jun 27, 2024
@meowjesty
Copy link
Member

I've been exploring this, and here are a few solutions I can think of:

  • 🌎 how can it be global, if the code is flat

The main issue we have is that our sockfds are stored in a per-layer global SOCKETS, which are not being shared between forks. So the proposal here is to just move the SOCKETS global to the intproxy, and change every operation that calls SOCKETS.[get|take](&sockfd) to use make_proxy_request_with_response (or something similar).

A problem with this solution is that now every sockfd becomes shared between execs and forks, which is not ideal. Another (minor) problem is that we must manually remember to call insert after take, as some operations call SOCKETS.take to modify the socket, meh.

  • 🫏 every masterpiece has its cheap copy

Instead of moving the whole SOCKETS to intproxy, whenever we detect a fork (new layer connection?), we send a copy of the SOCKETS minus the ones with CLOEXEC to the intproxy, which in turn sets the new layer connection's SOCKETS to this curated copy.

  • 🍑 thicc intproxy

Delegate more work to the intproxy, by moving the socket/ops there, and creating some sort of MirrordSocket API that keeps an association between a sockfd and the pid that created it, so we can avoid leaking sockfds.

That's it, so far. I'm leaning more towards the 🫏 solution, as 🌎 is a bug waiting to happen, and 🍑 is too big of a pointless refactor.

@aviramha
Copy link
Member Author

🫏 is what would make most sense and is how it works in terms of OS level too.
We can actually pass all SOCKETS downstream and iterate each fd and see if it's still open (so we'll just follow OS logic if cloexec or not)

@aviramha
Copy link
Member Author

Fixed in #2610

@meowjesty
Copy link
Member

Looks like the problem is still there on macOS + vscode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants