Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly Implement brk/sbrk and mmap #4

Closed
yzhang71 opened this issue Jul 23, 2024 · 8 comments
Closed

Properly Implement brk/sbrk and mmap #4

yzhang71 opened this issue Jul 23, 2024 · 8 comments

Comments

@yzhang71
Copy link
Contributor

We implemented brk/sbrk in glibc (userspace) by over-allocating to a page-aligned address and exposing a pseudo-break to the caller. This makes malloc fully functional for small chunks using sbrk. For larger chunks, malloc triggers the mmap path, which is not yet handled. Now, we should move brk/sbrk to the runtime space as syscalls and add a mutex to the pseudo-break to prevent race conditions.

We also need to properly handle mmap. The WASI-libc implementation of mmap is an emulation of malloc, which, after discussing with Nick and Coulson, is not acceptable by us. Therefore, we need to handle it in the runtime. For now, I used an ad-hoc solution by using malloc as well.

@rennergade
Copy link
Contributor

My opinion here:

  1. Eventually our goal should be to move all internals for mmap/brk out of glibc. The question then is: where should they go? For both we'll have to manipulate the linear memory allocation. I think the most intuitive way is to add some functionality into wasmtime to do this, though that messes with our goal of portability. We'll eventually have to add some runtime specific functionality to add fork/exec as well, and I'm not sure there's any way around this. I think maybe adding a single file with runtime specific implementations for all of these with a general API that can be eventually extended to other runtimes is probably our best option.

  2. We need to properly manage memory within each wasm instance to be able to separately manage mmap and brk or else we run into situations where they would attempt to use the same resources. We've mentioned a hacky way where we allocate a large chunk for mmaps at runtime, but that obviously would 1. be very inefficient and 2. still require additional infrastructure to manage. The long term solution is to most likely implement a virtual memory map similar to NaCl.

@JustinCappos
Copy link
Member

Even if we call from our microvisor back into the runtime to do the memory mapping, etc., to me this seems preferable. I'd advise we don't have system calls that are directly handled by the runtime / caging infrastructure, if we can at all avoid it.

@rennergade
Copy link
Contributor

rennergade commented Jul 24, 2024

Yeah I agree. I think the three things that the runtime currently does that we'd move or modify is 1. Creating a new instance of a cage (important for fork) 2. Creating new threads (for pthread_create, currently done by wasi-threads) 3. Modifying the bounds of linear memory (important for brk/mmap). So those are the scenarios that we probably need to put some thought into.

@rennergade
Copy link
Contributor

We discussed how to move this forward today in the weekly meeting. Thank you @yizhuoliang for joining us.

Here's how were breaking this down to proceed

libc/wasmtime integration
@qianxichen233

  • add MAKE_SYSCALL stubs to libc to export mmap/munmap/brk/shmat/shmdt calls to wasmtime/rawposix
  • Figure out how to initialize memory so we can use the all address space on cage startup. This probably involves increasing linear memory to max and then mapping unused PROT_NONE.

RawPOSIX integration

Its important that any use of the vmmap occurs in the dispatcher step and not in the actual syscalls, ie mmap finds a hole address and sends that address w/ MAP_FIXED into mmap_syscall, or write() checks the address in the dispatcher before calling a valid write_syscall.

@JustinCappos
Copy link
Member

JustinCappos commented Oct 31, 2024

I spoke with Dennis a bit about mmap yesterday and our thoughts are that there could be a separate memory region for each process for mmaps. We could statically allocate a larger part of the address space than mmap needs and then grow / shrink in response to requests. Happy to discuss if this isn't clear.

Let me know if this is similar to your thoughts. I'm more trying to understand what the options are, rather than to push hard for a specific solution.

@rennergade
Copy link
Contributor

In theory this is how the VMMap would handle things on a pretty basic level. I think there's several reasons why using the VMMap instead of a "greedy" implementation is preferrable/necessary:

  1. Not very straightforward to manage fragmentation, specifically in scenarios with large mappings. I could see this being potentially a problem with something like postgres.

  2. If we don't track what addresses are valid we can potentially fault in trusted code. IE. write() sends an invalid buffer to an in-memory pipe and segfaults. This becomes less of an issue if these things are done in grates but I believe there are still trusted operations on memory in 3i that could be affected by this.

  3. Managing mappings on fork(). In NaCl when we copy the address space we have to separately copy SHARED and unshared mappings. We handle shared mappings via mremap, while we copy unshared mappings via process_vm_writev. I'm not sure without tracking we know which mappings are shared or how to handle them.

@rennergade
Copy link
Contributor

The VMMap port for Rust is like 95% finished and looks like it is implemented in a way that should be much more performant than NaCl's implementation. So I believe we have a path forward here.

@rennergade
Copy link
Contributor

Added in #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants