Adds priority-inheritance futexes #131584

ruihe774 · 2024-10-12T06:50:35Z

This PR uses FUTEX_LOCK_PI and FUTEX_UNLOCK_PI on Linux for priority-inheritance futexes to implement Mutex.

Quoted from man 2 futex:

Priority inversion is the problem that occurs when a high-priority task is blocked waiting to acquire a lock held by a low-priority task, while tasks at an intermediate priority continuously preempt the low-priority task from the CPU. Consequently, the low-priority task makes no progress toward releasing the lock, and the high-priority task remains blocked.

Priority inheritance is a mechanism for dealing with the priority-inversion problem. With this mechanism, when a high- priority task becomes blocked by a lock held by a low-priority task, the priority of the low-priority task is temporarily raised to that of the high-priority task, so that it is not preempted by any intermediate level tasks, and can thus make progress toward releasing the lock. To be effective, priority inheritance must be transitive, meaning that if a high-priority task blocks on a lock held by a lower-priority task that is itself blocked by a lock held by another intermediate-priority task (and so on, for chains of arbitrary length), then both of those tasks (or more generally, all of the tasks in a lock chain) have their priorities raised to be the same as the high-priority task.

I'm still working on an implementation of PI-futex on FreeBSD.

rustbot · 2024-10-12T06:50:43Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

ruihe774 · 2024-10-12T07:43:22Z

After some investigation, I think it is not worth implementing pi futexes using UMUTEX_PRIO_INHERIT on FreeBSD. Drawbacks are:

They are not small primitives (u32).
We cannot spin but have to immediately enter kernel space. (https://github.com/freebsd/freebsd-src/blob/release/14.1.0/lib/libthr/thread/thr_umtx.c#L93)

Giving pthread mutexes on FreeBSD implement priority inheritance, a possible solution is to switch to pthread backend on FreeBSD.

slanterns · 2024-10-12T08:13:24Z

cc @joboet

rustbot · 2024-10-12T11:16:23Z

The Miri subtree was changed

cc @rust-lang/miri

RalfJung

Thanks for implementing this in Miri! However, the implementation is unfortunately quite hard to follow -- this definitely needs more comments. You cannot assume that the reader of this code knows the futex API by heart.

src/tools/miri/src/shims/unix/linux/sync.rs

RalfJung · 2024-10-12T12:54:05Z

src/tools/miri/src/shims/unix/linux/sync.rs

-            // It's not uncommon for `addr` to be passed as another type than `*mut i32`, such as `*const AtomicI32`.
-            let futex_val = this.read_scalar_atomic(&addr, AtomicReadOrd::Relaxed)?.to_i32()?;
-            if val == futex_val {
+            let futex_val = this.read_scalar_atomic(&addr, AtomicReadOrd::SeqCst)?.to_u32()?;


There is a huge comment above why we are doing a fence here, and now you just replaced the fence by something else. Why?

Please stick to the original implementation. It will be very hard to review this if you make deep fundamental changes like this. SeqCst writes and SeqCst fences are very much not equivalent.

SeqCst writes and SeqCst fences are very much not equivalent.

I have no idea. Could you please explain it or provide some materials?

I'm afraid a full introduction to the C++ memory model is beyond the scope of this thread. Mara wrote a book about it, available at https://marabos.nl/atomics/, but I don't know if it goes into the fact that SeqCst fences + relaxed accesses are not equivalent to SeqCst accesses -- that is really advanced, and I don't know any place that thoroughly explains it.

Please keep the fence, and make all the reads/writes Relaxed, like it was before. Carefully read the comment to ensure the fence is put in the right place, given that there are some new accesses being added here.

What about write_scalar_atomic to &addr? Do I need to put a fence after it?

I don't know. The old logic was carefully figured out by a bunch of people together. Any adjustment of it will require someone to really dig into this, understand the logic, and make sure the adjustment makes sense.

Yeah sorry it's complicated. :/ I can try to help, but my time is very limited.

If there is some way to land this without doing the Miri change, that would also work. E.g. we could keep using futex::Mutex instead of pi_futex::Mutex when cfg(miri) is set.

I think it's correct now in my latest commit. Someone can report that if it's not, but for now I assume it is as it passed tests.

Tests passing doesn't mean it is correct, it just means it is good enough to not blow up. ;) Concurrency primitives are prone to subtle bugs.

If you want to land the Miri changes, we'll definitely need a test as requested here, in particular a version of concurrent_wait_wake for PI futexes.

If you want to land the Miri changes, we'll definitely need a test as requested #131584 (comment), in particular a version of concurrent_wait_wake for PI futexes.

I've added one in the latest commit. Plz have a look 😃

src/tools/miri/src/shims/unix/linux/sync.rs

cuviper · 2024-10-12T15:19:52Z

I don't think I'm the best person to review this... maybe:

r? m-ou-se

src/tools/miri/src/shims/unix/linux/sync.rs

ruihe774 · 2024-10-12T19:04:47Z

I modified the (internal) interface of pal sys::sync::Mutex to return a MutexState. Linux poisons pi futexes by itself, so an return value is needed to propagate poison state to the outer wrapper. Most line changes in source files of unrelated platform are caused by this.

RalfJung · 2024-10-12T19:13:29Z

When does this poisoning happen? Is that also something Miri should (eventually) emulate?

ruihe774 · 2024-10-12T20:28:46Z

When does this poisoning happen? Is that also something Miri should (eventually) emulate?

From man futex(2):

[If] the owner of the futex/RT-mutex dies unexpectedly, then the kernel cleans up the RT-mutex and hands it over to the next waiter. This in turn requires that the user-space value is updated accordingly. To indicate that this is required, the kernel sets the FUTEX_OWNER_DIED bit in the futex word along with the thread ID of the new owner. User space can detect this situation via the presence of the FUTEX_OWNER_DIED bit and is then responsible for cleaning up the stale state left over by the dead owner.

Linux automatically unlocks the futex and sets the FUTEX_OWNER_DIED bit if the owner of futex dies. Sure we have MutexGuard that poisons Mutex when panicking.

rust/library/std/src/sync/mutex.rs

Lines 546 to 554 in e200c7f

    
           impl<T: ?Sized> Drop for MutexGuard<'_, T> { 
        
               #[inline] 
        
               fn drop(&mut self) { 
        
                   unsafe { 
        
                       self.lock.poison.done(&self.poison); 
        
                       self.lock.inner.unlock(); 
        
                   } 
        
               } 
        
           }

However, self.poison is no-op when panic = "abort" (IDK why; this is another topic); a thread can die when panic = "abort" as well. And the thread can die between poison.done() and inner.unlock() (poison.done() stores the flag using Relaxed, so it's possible more operations are shuffled in between; IDK why it is implemented in this way; this is also another topic). So there are cases that Mutex is not properly poisoned and have to rely on the FUTEX_OWNER_DIED bit returned by Linux.

P.S. the simplest way to die a thread at arbitrary point might be to set up a signal handler and call pthread_exit() in it.

Is that also something Miri should (eventually) emulate?

It's hard. BTW Linux also implements deadlock detection (EDEADLK). I have no idea how to implement them in miri.

RalfJung · 2024-10-12T21:09:52Z

What does it mean for the "owner to die"? Does it mean the thread finishes, the process gets killed, or what? Anyway this does not have to be in the first implementation, but if this lands we should open an issue to track filling the gaps. For now, I will wait for a sign from t-libs that they are planning to accept a change like this before I spend the time of doing a more thorow review.

…

On October 12, 2024 8:29:08 PM UTC, Rui He ***@***.***> wrote: > When does this poisoning happen? Is that also something Miri should (eventually) emulate? From `man futex(2)`: > [If] the owner of the futex/RT-mutex dies unexpectedly, then the kernel cleans up the RT-mutex and hands it over to the next waiter. This in turn requires that the user-space value is updated accordingly. To indicate that this is required, the kernel sets the FUTEX_OWNER_DIED bit in the futex word along with the thread ID of the new owner. User space can detect this situation via the presence of the FUTEX_OWNER_DIED bit and is then responsible for cleaning up the stale state left over by the dead owner. Linux automatically unlocks the futex and set the `FUTEX_OWNER_DIED` bit if the owner of futex dies. Sure we have `MutexGuard` that poisons `Mutex` when panicking and the thread dies: https://github.com/rust-lang/rust/blob/e200c7f2e1a1ec7691a24539cf191f4c4d9d2a2c/library/std/src/sync/mutex.rs#L546-L554 However, `self.poison` is no-op when `panic = "abort"` (IDK why; this is another topic). And the thread can die between `poison.done()` and `inner.unlock()` (`poison.done()` stores the flag using `Relaxed`, so it's possible more operations are shuffled in between; IDK why it is implemented in this way; this is also another topic). So there are cases that `Mutex` is not properly poisoned and have to rely on the `FUTEX_OWNER_DIED` bit returned by Linux. > Is that also something Miri should (eventually) emulate? It's hard. BTW Linux also implements deadlock detection (`EDEADLK`). I have no idea how to implement them in miri. -- Reply to this email directly or view it on GitHub: #131584 (comment) You are receiving this because you are on a team that was mentioned. Message ID: ***@***.***>

ruihe774 · 2024-10-12T21:20:55Z

What does it mean for the "owner to die"? Does it mean the thread finishes, the process gets killed, or what?

The thread terminates (normally, killed, or whatever) with mutex held.

ruihe774 · 2024-10-12T21:38:34Z

FWIW I'm also working on a Condvar implementation (ruihe774@53155b2) based on futex requeue to avoid thundering-herd formation.

Futex requeue is also available on OpenBSD and Fuchsia.

RalfJung · 2024-10-12T22:00:56Z

One thing that would definitely be good for the Miri side is extending src/tools/miri/tests/pass-dep/concurrency/linux-futex.rs to invoke the new APIs. See in particular the concurrent_wait_wake test there which is related to the SeqCst fence.

bjorn3 · 2024-10-12T22:21:26Z

If the kernel unlocked the mutex because the owner died, regular lock poisoning is not enough. It is safe to ignore lock poisoning. Instead you did have to consider the mutex permanently locked and make all future attempts at locking it either block forever or panic rather than return a poison error. If a mutex guard is forgotten, it should never be exposed in unlocked state again as unsafe code may depend on it staying locked permanently. Also rustc itself for example has a place where it leaks an RwLock reader to ensure nobody locks it with write permissions again as doing that could cause miscompilations or other bugs.

ruihe774 · 2024-10-13T10:12:19Z

If the kernel unlocked the mutex because the owner died, regular lock poisoning is not enough. It is safe to ignore lock poisoning. Instead you did have to consider the mutex permanently locked and make all future attempts at locking it either block forever or panic rather than return a poison error. If a mutex guard is forgotten, it should never be exposed in unlocked state again as unsafe code may depend on it staying locked permanently. Also rustc itself for example has a place where it leaks an RwLock reader to ensure nobody locks it with write permissions again as doing that could cause miscompilations or other bugs.

Make sense 👍. I've updated my code.

library/std/src/sys/sync/mutex/pi_futex.rs

ruihe774 · 2024-10-15T02:59:51Z

@m-ou-se @joboet I'm looking forward to hearing from you 😃

bors · 2024-10-15T12:46:03Z

☔ The latest upstream changes (presumably #131727) made this pull request unmergeable. Please resolve the merge conflicts.

RalfJung · 2024-10-16T08:46:01Z

I wonder how I can have different miri test case expected outputs for different platform.

Seems like you figured it out, but please add comments next to the ignore/only explaining that there is another test covering this case, and where that test can be found.

RalfJung

I haven't had the time to look at the new PI code yet (also still waiting for a signal from t-libs that they want to pursue this), but here's some comment on the other part. As style comments they also apply to the PI code, if similar patterns occur there.

If it comes down to it, we don't have to block this PR on finessing the PI shims, we can always improve them later.

RalfJung · 2024-10-16T08:49:51Z

src/tools/miri/src/shims/unix/linux/sync.rs

@@ -145,13 +175,21 @@ pub fn futex<'tcx>(
            // It's not uncommon for `addr` to be passed as another type than `*mut i32`, such as `*const AtomicI32`.
            let futex_val = this.read_scalar_atomic(&addr, AtomicReadOrd::Relaxed)?.to_i32()?;
            if val == futex_val {
+                // Check that the top waiter (if exists) is waiting using FUTEX_WAIT_*.


Please explain why we do this. Is there a test covering this case?

According to the manpage:

EINVAL (FUTEX_WAKE, FUTEX_WAKE_OP, FUTEX_WAKE_BITSET, FUTEX_REQUEUE, FUTEX_CMP_REQUEUE) The kernel detected an inconsistency between the user-space state at uaddr and the kernel state—that is, it detected a waiter which waits in FUTEX_LOCK_PI or FUTEX_LOCK_PI2 on uaddr. EINVAL (FUTEX_LOCK_PI, FUTEX_LOCK_PI2, FUTEX_TRYLOCK_PI, FUTEX_UNLOCK_PI) The kernel detected an inconsistency between the user-space state at uaddr and the kernel state. This indicates either state corruption or that the kernel found a waiter on uaddr which is waiting via FUTEX_WAIT or FUTEX_WAIT_BITSET.

It results in EINVAL if these two series of op are mix-used. So I add detection for such scenario.

RalfJung · 2024-10-16T08:50:58Z

src/tools/miri/src/shims/unix/linux/sync.rs

+            if this.futex_waiter_count(addr_usize) != 0 {
+                if this.futex_top_waiter_extra(addr_usize).is_some() {


Please use &&. Also, like above: please explain why, and make sure we have a test.

Given that this takes up N waiters, why are we only testing the top waiter?

If the top waiter has no extra, all waiters have no extra. If it has, all waiters have. Because this is checked each time we add a waiter, so it is guaranteed that all waiters are the same wrt whether have extra or not.

Ah, okay. Please document this invariant in the FutexWaiter type.

RalfJung · 2024-10-16T08:52:20Z

src/tools/miri/src/shims/unix/linux/sync.rs

+    // Ok(None) for EINVAL set, Ok(Some(None)) for no timeout (infinity), Ok(Some(Some(...))) for a timeout.
+    // Forgive me, I don't want to create an enum for this return value.


I think this becomes cleaner if you don't pass in dest, and return an InterpResult<'tcx, Result<Option<(...)>, IoError>>. Then the caller should do set_last_error.

RalfJung · 2024-10-16T08:52:43Z

src/tools/miri/src/shims/unix/linux/sync.rs

+                    this.set_last_error(LibcError("EINVAL"))?;
+                    this.write_scalar(Scalar::from_target_isize(-1, this), dest)?;
+                    return interp_ok(());


This can become one line via return this.set_last_error_and_return(...).

RalfJung · 2024-10-16T18:18:12Z

src/tools/miri/src/concurrency/sync.rs

@@ -128,6 +128,8 @@ struct FutexWaiter {
    thread: ThreadId,
    /// The bitset used by FUTEX_*_BITSET, or u32::MAX for other operations.
    bitset: u32,
+    /// Extra info stored for this waiter.
+    extra: Option<u32>,


So this field encodes whether the waiter is PI waiter? Just calling it extra doesn't make that very clear.

If it is a PI waiter, this field stores Some(tid), where tid is the result of gettid(); if it is not, this field stores None.

We need to store tid here because we need to write the tid of the top waiter to futex when waking it.

Yes, the naming is not clear. I'll change it later.

We need to store tid here because we need to write the tid of the top waiter to futex when waking it.

Why can't the thread-that-is-woken-up do that itself in the wakeup callback?

Miri isn't an OS kernel so some things are a bit nicer. When a thread blocks it registers a callback that will be invoked on wakeup so it can do whatever it needs to at that moment, "atomically" as part of the wakeup.

Why can't the thread-that-is-woken-up do that itself in the wakeup callback?

It's possible; however it requires some tight coupling logic in this.futex_wait() that checks whether we are a PI waiter, grabs the tid, and writes the futex. I'd prefer the current implementation.

I'll try to keep this in mind for the review of the PI paths, to see if I can find an elegant alternative.

Amanieu

I reviewed the implementation of PI futexes in the kernel. It seems to be using a fair unlock protocol where, on unlock, ownership of the mutex is passed directly to the next waiting thread.

While fair unlocking has theoretical benefits, in practice it tends to be much slower than unfair locking which leaves the mutex in an unlocked state and just wakes up a waiter. The reason is that waking up a thread has a long latency but it's very common for a single thread to repeatedly lock and unlock the same mutex. If lock ownership is forcibly transferred to a waiting thread then this prevents any other thread from acquiring the mutex until that thread wakes up.

As such, I don't think PI futexes are suitable for use as the default mutex implementation for Rust. They should be provided in a crate for specialized use cases where PI is needed and the performance cost of fair unlocking is acceptable.

Amanieu · 2024-11-21T10:23:23Z

library/std/src/sys/pal/unix/pi_futex.rs

+    }
+
+    pub fn locked() -> State {
+        (unsafe { libc::gettid() }) as _


There are 2 issues with gettid:

It always performs a syscall, which we really don't want in the uncontended fast path.

It's only available from glibc 2.30, which is newer than our minimum (2.17).

It always performs a syscall, which we really don't want in the uncontended fast path.

Do you mean to use a thread-local storage to cache the tid?

I'm afraid that it may cause bugs in some corner cases.

We can store it in the Thread structure.

We could additionally install a MADV_WIPEONFORK page containing an atomic function pointer that resolves either to "get the cached result" or "do a syscall" functions... or just a flag.
Though that requires 4.14. An atfork handler would work too but adding one might have other exciting consequences.

Though that requires 4.14. An atfork handler would work too but adding one might have other exciting consequences.

We cannot use atfork. e.g. clone and clone3 do not call atfork handlers. We cannot assume the programmers always use syscall wrappers from glibc.

We can store it in the Thread structure.

Same issue. The programmers can call raw syscalls and have Thread not updated to reflect a new thread. I have no idea whether this is defined to be an UB in Rust. Giving calling raw syscalls is "unsafe", maybe we can assume the tid stored in Thread is valid?

We could additionally install a MADV_WIPEONFORK page containing an atomic function pointer that resolves either to "get the cached result" or "do a syscall" functions... or just a flag.

Somewhat complicated. I can implement this, but I'm afraid it's not zero-cost (we need one memory page per thread to store tid.)

We cannot use atfork. e.g. clone and clone3 do not call atfork handlers. We cannot assume the programmers always use syscall wrappers from glibc.

libc authors have said that not using the wrappers and then calling libc functions after the clone is UB. Since the standard library on those targets relies on libc that's also Rust UB.

Somewhat complicated. I can implement this, but I'm afraid it's not zero-cost (we need one memory page per thread to store tid.)

I don't think so. A shared flag or a sort of generation marker for all threads should be sufficient.

Thread id is cached in thread-local storage in the latest commit. Would you plz review it?

ruihe774 · 2024-11-21T12:56:50Z

I reviewed the implementation of PI futexes in the kernel. It seems to be using a fair unlock protocol where, on unlock, ownership of the mutex is passed directly to the next waiting thread.

While fair unlocking has theoretical benefits, in practice it tends to be much slower than unfair locking which leaves the mutex in an unlocked state and just wakes up a waiter. The reason is that waking up a thread has a long latency but it's very common for a single thread to repeatedly lock and unlock the same mutex. If lock ownership is forcibly transferred to a waiting thread then this prevents any other thread from acquiring the mutex until that thread wakes up.

As such, I don't think PI futexes are suitable for use as the default mutex implementation for Rust. They should be provided in a crate for specialized use cases where PI is needed and the performance cost of fair unlocking is acceptable.

Yes, there may be overhead. However, fair unlocking only happen when the futex is in contended state.

pub unsafe fn unlock(&self) {
    if self.futex.compare_exchange(pi::locked(), pi::unlocked(), Release, Relaxed).is_err() {
        // We only wake up one thread. When that thread locks the mutex,
        // the kernel will mark the mutex as contended automatically
        // (futex != pi::locked() in this case),
        // which makes sure that any other waiting threads will also be
        // woken up eventually.
        self.wake();
    }
}

So, it is not true for "waking up a thread has a long latency but it's very common for a single thread to repeatedly lock and unlock the same mutex". In this case, we do not enter the kernel space.

RalfJung · 2024-11-21T13:06:11Z

The macos lock we use is fair, right? How does it deal with this?

ruihe774 · 2024-11-21T14:42:08Z

The macos lock we use is fair, right? How does it deal with this?

Yes. And as the result it is relatively slower than other platforms.

I think @joboet is more familiar with the macOS implementation.

joboet · 2024-11-21T18:03:40Z

macOS uses unfair locking by default and the thread IDs are stored in a thread-local variable, so it avoids most of the cost of priority-inheritance.

ruihe774 · 2024-11-21T19:03:48Z

macOS uses unfair locking by default and the thread IDs are stored in a thread-local variable, so it avoids most of the cost of priority-inheritance.

Is it mainlined? I find the current std still uses pthread on macOS.

bjorn3 · 2024-11-21T19:09:58Z

#122408 (which hasn't been merged yet) switches macOS to futexes.

ruihe774 · 2024-11-21T20:46:48Z

#122408 (which hasn't been merged yet) switches macOS to futexes.

Right. If macOS uses a thread-local variable to store thread ID, we can do in the same way in Linux as well to avoid fetching tid every time.

This uses FUTEX_LOCK_PI and FUTEX_UNLOCK_PI on Linux.

ruihe774 · 2024-11-21T21:49:54Z

I dropped the miri implementation due to merge conflicts. It can be added back later.

Amanieu · 2024-11-21T22:09:10Z

Yes, there may be overhead. However, fair unlocking only happen when the futex is in contended state.

So, it is not true for "waking up a thread has a long latency but it's very common for a single thread to repeatedly lock and unlock the same mutex". In this case, we do not enter the kernel space.

The contended case is specifically the one that I am concerned about. If you have 2 threads both contending on a single lock, an unfair mutex allows one thread to re-acquire the lock while the other thread is still in the process of waking up. A fair mutex keeps the mutex lock and transfers ownership to the other thread directly.

The problem is that waking up a sleeping thread is slow and it may take a while until it is scheduled if the system is contended. It's very common for threads to repeatedly lock & unlock the same lock in a sequence (e.g. calling a method that acquires a lock in a loop). If that happens with a fair mutex then neither threads are making progress for the full duration of wakeup, and this delay is incurred on every unlock.

This is the reason why today most mutex implementations are unfair. For example, quoting from https://webkit.org/blog/6161/locking-in-webkit/ (which is the post that inspired the creation of parking_lot):

However, allowing barging instead of enforcing FIFO allows for much higher throughput when a lock is heavily contended. Heavy contention in systems like WebKit that use very fine-grained locks implies that multiple threads are repeatedly locking and unlocking the same lock. In the worst case, a thread will make very little progress between two critical sections protected by the same lock. In a barging lock, if a thread unlocks a lock that had threads parked then it is still eligible to immediately reacquire it if it gets to the next critical section before the unparked thread gets scheduled. Barging permits threads engaged in microcontention to take turns acquiring the lock many times per turn. On the other hand, FIFO locks force contenders to form a convoy where they only get to hold the lock once per turn. This makes the program run much slower than with a barging lock because of the huge number of context switches – one per lock acquisition!

PI futexes as currently implemented in the Linux kernel enforce the use of fair unlocking and thus suffer from this performance penalty. This is the reason why they are not used in glibc by default. As such, I think they are unsuitable as the default implementation of Mutex in Rust.

rust-log-analyzer · 2024-11-21T22:28:16Z

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

tests/pass/float_nan.rs ... ok
tests/pass/0weak_memory_consistency.rs ... ok

FAILED TEST: tests/pass/concurrency/sync.rs (revision `stack`)
command: MIRI_ENV_VAR_TEST="0" MIRI_TEMP="/tmp/miri-uitest-mvfqeD" RUST_BACKTRACE="1" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage1/bin/miri" "--error-format=json" "--sysroot=/checkout/obj/build/x86_64-unknown-linux-gnu/miri-sysroot" "-Dwarnings" "-Dunused" "-Ainternal_features" "-Zui-testing" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/miri_ui/tests/pass/concurrency" "tests/pass/concurrency/sync.rs" "--cfg=stack" "-Zmiri-disable-isolation" "-Zmiri-strict-provenance" "-Zmiri-preemption-rate=0" "--edition" "2021"
error: test got exit status: 1, but expected 0
 = note: compilation failed, but was expected to succeed

error: actual output differed from expected
error: actual output differed from expected
Execute `./miri test --bless` to update `tests/pass/concurrency/sync.stack.stderr` to the actual output
--- tests/pass/concurrency/sync.stack.stderr
+++ <stderr output>
+error: unsupported operation: Miri does not support `futex` syscall with op=6
+  --> RUSTLIB/std/src/sys/pal/PLATFORM/pi_futex.rs:LL:CC
+   |
+LL | /                 libc::syscall(
+LL | |                     libc::SYS_futex,
+LL | |                     ptr::from_ref(futex.deref()),
+LL | |                     libc::FUTEX_LOCK_PI | libc::FUTEX_PRIVATE_FLAG,
+LL | |                     // remaining args are unused
+LL | |                 )
+   | |_________________^ Miri does not support `futex` syscall with op=6
+   |
+   |
+   = help: this is likely not a bug in the program; it indicates that the program performed an operation that Miri does not support
+   = note: BACKTRACE on thread `unnamed-ID`:
+   = note: inside `std::sys::pal::PLATFORM::pi_futex::linux::futex_lock` at RUSTLIB/std/src/sys/pal/PLATFORM/pi_futex.rs:LL:CC
+   = note: inside `std::sys::sync::mutex::pi_futex::Mutex::lock_contended` at RUSTLIB/std/src/sys/sync/mutex/pi_futex.rs:LL:CC
+   = note: inside `std::sys::sync::mutex::pi_futex::Mutex::lock` at RUSTLIB/std/src/sys/sync/mutex/pi_futex.rs:LL:CC
+   = note: inside `std::sync::Mutex::<i32>::lock` at RUSTLIB/std/src/sync/mutex.rs:LL:CC
+  --> tests/pass/concurrency/sync.rs:LL:CC
+   |
+LL |             let mut data = data.lock().unwrap();
+   |                            ^^^^^^^^^^^
---

Location:
   /cargo/registry/src/index.crates.io-6f17d22bba15001f/ui_test-0.26.5/src/lib.rs:357

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
error: test failed, to rerun pass `--test ui`
Caused by:
  process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/x86_64-unknown-linux-gnu/release/deps/ui-de50b20aa7a9761c --quiet` (exit status: 1)
  process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/x86_64-unknown-linux-gnu/release/deps/ui-de50b20aa7a9761c --quiet` (exit status: 1)
Command has failed. Rerun with -v to see more details.
  local time: Thu Nov 21 22:28:06 UTC 2024
  network time: Thu, 21 Nov 2024 22:28:06 GMT
##[error]Process completed with exit code 1.
Post job cleanup.

ruihe774 · 2024-11-23T09:45:00Z

Yes, there may be overhead. However, fair unlocking only happen when the futex is in contended state.
So, it is not true for "waking up a thread has a long latency but it's very common for a single thread to repeatedly lock and unlock the same mutex". In this case, we do not enter the kernel space.

The contended case is specifically the one that I am concerned about. If you have 2 threads both contending on a single lock, an unfair mutex allows one thread to re-acquire the lock while the other thread is still in the process of waking up. A fair mutex keeps the mutex lock and transfers ownership to the other thread directly.

The problem is that waking up a sleeping thread is slow and it may take a while until it is scheduled if the system is contended. It's very common for threads to repeatedly lock & unlock the same lock in a sequence (e.g. calling a method that acquires a lock in a loop). If that happens with a fair mutex then neither threads are making progress for the full duration of wakeup, and this delay is incurred on every unlock.

This is the reason why today most mutex implementations are unfair. For example, quoting from https://webkit.org/blog/6161/locking-in-webkit/ (which is the post that inspired the creation of parking_lot):

However, allowing barging instead of enforcing FIFO allows for much higher throughput when a lock is heavily contended. Heavy contention in systems like WebKit that use very fine-grained locks implies that multiple threads are repeatedly locking and unlocking the same lock. In the worst case, a thread will make very little progress between two critical sections protected by the same lock. In a barging lock, if a thread unlocks a lock that had threads parked then it is still eligible to immediately reacquire it if it gets to the next critical section before the unparked thread gets scheduled. Barging permits threads engaged in microcontention to take turns acquiring the lock many times per turn. On the other hand, FIFO locks force contenders to form a convoy where they only get to hold the lock once per turn. This makes the program run much slower than with a barging lock because of the huge number of context switches – one per lock acquisition!

PI futexes as currently implemented in the Linux kernel enforce the use of fair unlocking and thus suffer from this performance penalty. This is the reason why they are not used in glibc by default. As such, I think they are unsuitable as the default implementation of Mutex in Rust.

Make sense.

I hope Linux can provide an unfair PI futex in the future.

rustbot assigned cuviper Oct 12, 2024

rustbot added O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 12, 2024

This comment has been minimized.

Sign in to view

ruihe774 marked this pull request as ready for review October 12, 2024 11:16

RalfJung reviewed Oct 12, 2024

View reviewed changes

rustbot assigned m-ou-se and unassigned cuviper Oct 12, 2024

RalfJung reviewed Oct 12, 2024

View reviewed changes

src/tools/miri/src/shims/unix/linux/sync.rs Outdated Show resolved Hide resolved

RalfJung reviewed Oct 12, 2024

View reviewed changes

src/tools/miri/src/shims/unix/linux/sync.rs Outdated Show resolved Hide resolved

ruihe774 requested a review from RalfJung October 12, 2024 18:54

ruihe774 force-pushed the pi-futex branch from b0eb348 to 04ed075 Compare October 12, 2024 18:57

ruihe774 force-pushed the pi-futex branch from 04ed075 to 14dcdd6 Compare October 13, 2024 10:10

bjorn3 reviewed Oct 13, 2024

View reviewed changes

library/std/src/sys/sync/mutex/pi_futex.rs Outdated Show resolved Hide resolved

ruihe774 force-pushed the pi-futex branch from 6364ccd to af02bb7 Compare October 15, 2024 20:12

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

RalfJung reviewed Oct 16, 2024

View reviewed changes

Amanieu reviewed Nov 21, 2024

View reviewed changes

ruihe774 added 2 commits November 22, 2024 05:21

Adds priority-inheritance futexes for mutexex

a1731ad

This uses FUTEX_LOCK_PI and FUTEX_UNLOCK_PI on Linux.

Adds PI futex for FreeBSD

a360d54

ruihe774 force-pushed the pi-futex branch from 0d6c4c5 to bcf0b88 Compare November 21, 2024 21:48

ruihe774 requested a review from Amanieu November 21, 2024 21:49

This comment has been minimized.

Sign in to view

Cache tid in futex tls

8f33536

ruihe774 force-pushed the pi-futex branch from bcf0b88 to 8f33536 Compare November 21, 2024 21:59

ruihe774 closed this Nov 23, 2024

the8472 mentioned this pull request Nov 23, 2024

Should Mutex and Condvar respect priorities? #128231

Open

the8472 mentioned this pull request Feb 9, 2025

Fallback to Parking in Bounded and Unbounded Channels crossbeam-rs/crossbeam#1105

Open

		if this.futex_waiter_count(addr_usize) != 0 {
		if this.futex_top_waiter_extra(addr_usize).is_some() {

		// Ok(None) for EINVAL set, Ok(Some(None)) for no timeout (infinity), Ok(Some(Some(...))) for a timeout.
		// Forgive me, I don't want to create an enum for this return value.

Adds priority-inheritance futexes #131584

Adds priority-inheritance futexes #131584

Conversation

ruihe774 commented Oct 12, 2024

rustbot commented Oct 12, 2024

This comment has been minimized.

ruihe774 commented Oct 12, 2024 • edited Loading

slanterns commented Oct 12, 2024

This comment has been minimized.

rustbot commented Oct 12, 2024

RalfJung left a comment

Choose a reason for hiding this comment

RalfJung Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cuviper commented Oct 12, 2024

ruihe774 commented Oct 12, 2024 • edited Loading

RalfJung commented Oct 12, 2024 via email

ruihe774 commented Oct 12, 2024 • edited Loading

RalfJung commented Oct 12, 2024 via email

ruihe774 commented Oct 12, 2024

ruihe774 commented Oct 12, 2024 • edited Loading

RalfJung commented Oct 12, 2024

bjorn3 commented Oct 12, 2024

ruihe774 commented Oct 13, 2024

ruihe774 commented Oct 15, 2024

bors commented Oct 15, 2024

This comment was marked as resolved.

This comment has been minimized.

RalfJung commented Oct 16, 2024

RalfJung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amanieu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the8472 Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

ruihe774 Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

the8472 Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruihe774 commented Nov 21, 2024

RalfJung commented Nov 21, 2024 via email

ruihe774 commented Nov 21, 2024

joboet commented Nov 21, 2024

ruihe774 commented Nov 21, 2024

bjorn3 commented Nov 21, 2024

ruihe774 commented Nov 21, 2024

ruihe774 commented Nov 21, 2024

This comment has been minimized.

Amanieu commented Nov 21, 2024

rust-log-analyzer commented Nov 21, 2024

ruihe774 commented Nov 23, 2024

ruihe774 commented Oct 12, 2024 •

edited

Loading

RalfJung Oct 12, 2024 •

edited

Loading

RalfJung Oct 13, 2024 •

edited

Loading

ruihe774 commented Oct 12, 2024 •

edited

Loading

ruihe774 commented Oct 12, 2024 •

edited

Loading

ruihe774 commented Oct 12, 2024 •

edited

Loading

RalfJung Oct 16, 2024 •

edited

Loading

the8472 Nov 21, 2024 •

edited

Loading

ruihe774 Nov 21, 2024 •

edited

Loading

the8472 Nov 21, 2024 •

edited

Loading