-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pthread_cond_wait: Add mutex to protect the waiter count #15566
base: master
Are you sure you want to change the base?
Conversation
@hujun260 and @xiaoxiang781216 can you please check this ? |
769a28f
to
262e5dd
Compare
Does the NuttX atomic library not offer atomic_load() ? |
use atomic_read |
atomic_read requires atomic_t type, which is volatile int. Are we supposed to use atomic_t inside the kernel, even if the toolchain implements atomic_load et al. ? The atomic.h API is a bit unclear to me after the recent modifications. |
The recent improvement ensure atomic api can be used on all arch even without toolchain support. |
sem_getvalue returns ERROR and sets errno if it fails, we don't want to return OK in this case, we want to return the non-negated error number.
sem_getvalue returns ERROR and sets errno if it fails, we don't want to return OK in this case, we want to return the non-negated error number.
Should we add atomic_load ? Since atomic_read does not ensure memory ordering ? |
the interface follow Linux kernel design |
Ok. In my case using atomic_read is OK since reading cond->wait_count does not need to be ordered, so I'll just change to atomic_read. In any case, if ever needed / wanted, atomic_load can be added very simply:
|
216afde
to
b7e7f21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested OK on rv-virt:knsh64
. Thanks :-)
https://gist.github.com/lupyuen/03e520876f9ba15a64918d5faba663b4
nsh> uname -a
NuttX 10.4.0 b7e7f216c2 Jan 16 2025 18:29:57 risc-v rv-virt
nsh> ostest
ostest_main: Exiting with status 0
Is someone capable of interpreting the sim target error? I tried running the target / citest locally but it's giving me |
b7e7f21
to
fcbe648
Compare
I'm running with Docker, might take a while... sudo docker run \
-it \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
/bin/bash
cd
git clone https://github.com/tiiuae/nuttx --branch fix_pthread_cond
git clone https://github.com/apache/nuttx-apps apps
pushd nuttx ; echo NuttX Source: https://github.com/apache/nuttx/tree/$(git rev-parse HEAD) ; popd
pushd apps ; echo NuttX Apps: https://github.com/apache/nuttx-apps/tree/$(git rev-parse HEAD) ; popd
cd nuttx/tools/ci
./cibuild.sh -c -A -N -R testlist/sim-01.dat
## TODO: Dump the CI Test Log
ls -l ~/nuttx/boards/sim/sim/sim/configs/citest/logs/sim/sim
cat ~/nuttx/boards/sim/sim/sim/configs/citest/logs/sim/sim/*
## Repeat for risc-v-05
./cibuild.sh -c -A -N -R testlist/risc-v-05.dat
ls -l ~/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu
cat ~/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/* |
Looks like some sort of deadlock, let me figure it out. |
Hmmm strange...
|
The problem is related to cond->mutex somehow. If I remove locking / unlocking it from pthread_cond_broadcast, sim:citest boots and runs. Another thing I noticed, is if I remove C++ support, the system boots and the tests pass. The first thing the system does in flat mode is it runs the static C++ constructors:
So the issue must be somewhere there. I still don't understand what's wrong with the lock, nothing even calls pthread_cond_broadcast as far as I can see. |
fcbe648
to
c58794a
Compare
Using compare&exchange seems to have done the trick. Last thing that remains is to verify this is still POSIX compliant, I guess CI (sim:citest) runs the ltp test cases ? |
Yep. LTP runs on |
Ok, if the tests pass I consider this issue resolved. All my local tests show green (and my original problem is now gone). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested OK on rv-virt:knsh64
. Thanks :-)
https://gist.github.com/lupyuen/8f8aa014ddaf9b3623e6c834fe861764
nsh> uname -a
NuttX 10.4.0 c58794a955 Jan 17 2025 17:13:46 risc-v rv-virt
nsh> ostest
ostest_main: Exiting with status 0
c58794a
to
39387e1
Compare
The load/compare and RMW to wait_count need protection. Using atomic operations should resolve both issues. NOTE: The assumption that the user will call pthread_cond_signal / pthread_cond_broadcast with the mutex given to pthread_cond_wait held is simply not true. It MAY hold it, but it is not forced. Thus, using the user space lock for protecting the wait counter as well is not valid! The pthread_cond_signal() or pthread_cond_broadcast() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast(). [1] https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cond_signal.html
39387e1
to
9bd3f30
Compare
Summary
The load/compare and RMW to wait_count need protection. Bumping the counter can be done by atomic_fetch_add as there is no race here. However reading the counter when signaling needs to take a lock so that if multiple threads signal at the same time, only one has exclusive access to the counter.
NOTE:
The assumption that the user will call pthread_cond_signal / pthread_cond_broadcast with the mutex given to pthread_cond_wait held is simply not true. It MAY hold it, but it is not forced. Thus, using the user space lock for protecting the wait counter as well is not valid!
Impact
This fixes regression from #14581 and #14786
Testing
MPFS with multiple threads using pthread_cond.
rv-virt:smp64
Direct reference from POSIX:
The pthread_cond_signal() or pthread_cond_broadcast() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast().
[1] https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cond_signal.html