Skip to content

Conversation

dvdsk
Copy link
Member

@dvdsk dvdsk commented Sep 25, 2025

cpal::Stream is not send on Mac. I need microphone to be send. This 'solves' the non-send stream by parking it on a separate thread.

@roderickvd
Copy link
Member

roderickvd commented Sep 25, 2025

Not directly related to scope of this PR, but as I was reviewing it I thought we can do better than sleeping for 5 ms in the iterator implementation. Here's a condvar-based approach (untested) that eliminates polling wakeup overhead and latency:

pub struct Microphone {
    data_signal: Arc<(Mutex<()>, Condvar)>,
}

fn next(&mut self) -> Option<Self::Item> {
    loop {
        if let Ok(sample) = self.buffer.pop() {
            return Some(sample);
        } else if self.error_occurred.load(Ordering::Relaxed) {
            return None;
        } else {
            // Block until notified instead of sleeping 5ms
            let (lock, cvar) = &*self.data_signal;
            let guard = lock.lock().unwrap();
            let _ = cvar.wait(guard).unwrap();
        }
    }
}

and in the callback:

move |data, _info| {
    let was_empty = tx.slots() == 0;
    for sample in /* ... */ {
        let _ = tx.push(sample);
    }
    if was_empty && tx.slots() > 0 {
        let (_lock, cvar) = &*data_signal;
        cvar.notify_one();
    }
}

and to wake up the iterator on error:

let error_callback = {
    let error_occurred = error_occurred.clone();
    move |source| {
        error_occurred.store(true, Ordering::Relaxed);
        let (_lock, cvar) = &*data_signal;
        cvar.notify_one();
        error_callback(source);
    }
};

@dvdsk
Copy link
Member Author

dvdsk commented Sep 26, 2025

that eliminates polling wakeup overhead and latency

I thought about this and I might be wrong however I think we do not actually incur extra latency here. Since we need to buffer at the output anyway.

Once ever 5ms we will drag out every sample from the Ringbuffer through the source chain into the output buffer.

Lets work through an example to be sure, lets say we have a sample rate of 100 and the delay is instead 1 second. The output cpal period is 100 samples too:

our pipeline and timings:

  • 1 delay from Microphone is a delay of 100 samples
  • all kinds of audio effects are applied, they do not incur an extra delay
  • some effect internally buffers 100 samples

Now cpal is going to fill its buffer calling next 100 times. It takes 1second before it gets more samples from the microphone. At that point (with this sample rate) the microphone has emitted 100 samples so it gets the 100 samples 'instantly'. The final latency is min(the cpal period, the microphone sleep period). In this case even if the sleep period was 0 the latency would not drop.

What the sleep gains us:

  • It might (I have no idea if it actually helps) be nice to the OS, it gets a little more predictability regarding scheduling.
  • Simple to read and understand

What convar could give us:

  • Eliminates any extra latency in case the cpal period is really small.
  • Could only notify after cpal period of samples have been emitted (very nice) would need some extra logic.

I want to zoom in to that last point. I think our audio pipeline might need to know more about the output so parameters like this can be auto tuned. Its similar to the situation with spanless sources: we want the source to know the output samplerate. Or the pausable draft/architecture: the source must be able to invert control flow and control sources following it and through them eventually the output.

Last weekend I spoke to Billy (author of Firewheel) for the next few weekends I'll see if I can build a rodio like API around firewheel. If that succeeds I want to gradually add a second experimental 'source' build on top of firewheel. My proposal is to land that and see if we can use firewheel to handle these concerns. They've had ultra low latency as a design goal.


Sorry for the long post, not time to cut it down :)

@roderickvd
Copy link
Member

You’re right that the burst-filling behavior mitigates some of the latency concerns, but I think your analysis assumes data is transferred to cpal exactly when the microphone thread wakes up. In reality, if cpal needs samples 2ms after your thread went to sleep, it waits 3ms for no reason. This creates jitter.

Also, you mentioned sleep being “nice to the OS” but it’s the other way around: polling with sleep is actually worse for the CPU. Your thread wakes up every 5ms regardless of whether there’s data, checks the buffer, then sleeps again. A condvar just parks the thread completely until the producer signals there’s actual data available. Much more efficient.

@dvdsk
Copy link
Member Author

dvdsk commented Sep 27, 2025

In reality, if cpal needs samples 2ms after your thread went to sleep, it waits 3ms for no reason. This creates jitter.

Good point, thanks for bringing that up. I might also be overly cautious with CPU cyles here, I tend to forget how damn powerful and energy efficient modern hardware is. Though I'd miss the simplicity of the sleep.

I'm left wondering if a condvar is the best approach, they are always a bit akward.... Part of me just wants to swap everything out for an mpsc::channel. But that probably would impact perf.... (Take a look at the impl of mpsc it is not trivial).

`cpal::Stream` is not send on Mac. I need microphone to be send.
This 'solves' the non-send stream by parking it on a separate thread.
@roderickvd
Copy link
Member

Part of me just wants to swap everything out for an mpsc::channel. But that probably would impact perf.... (Take a look at the impl of mpsc it is not trivial).

Then how about using ringbuf instead of rtrb?

@dvdsk
Copy link
Member Author

dvdsk commented Sep 27, 2025

What we mostly need is a benchmark for latency/jitter. There is throughput benchmarking of most (ring)buffers here: mgeier/rtrb#39

I believe ringbuf came out as quite slow.

Anyways I do not think its worth it to put too much effort in this now. Since it might get re-implemented once we overhaul/switch audio engine. I'm gonna go back to that now

@roderickvd
Copy link
Member

Out of all the solutions, with a condvar you'll have lowest jitter, lowest system load, without the performance penalty of MPSC/SPSC synchronization 🤷‍♂️

@dvdsk
Copy link
Member Author

dvdsk commented Sep 27, 2025

Out of all the solutions, with a condvar you'll have lowest jitter, lowest system load, without the performance penalty of MPSC/SPSC synchronization 🤷‍♂️

Feel free to go ahead to do it 👍. Merge this first or after?

@roderickvd
Copy link
Member

I’ll add it before merge, probably this evening.

To your point of readability, ringbuf will definitely be easier on the eyes, but I’ll remember to add some code comments what we’re doing and why. I think low-latency is a highly desirable trait that’s worth a bit of wizardry.

Block iterator until new audio data arrives, reducing CPU usage and
jitter. Notify waiting threads from the audio callback when new data is
available.
@roderickvd
Copy link
Member

Here you go. This also adds the logic to always write full blocks (buffer periods) if we know the buffer size.

By the way, I think it should be possible to make cpal::Stream Send on macOS: RustAudio/cpal#1021

This logic still is an improvement even when that PR is merged.

This simplifies notification logic and avoids issues when the ring
buffer is smaller than the CPAL period or partially full.
@roderickvd
Copy link
Member

roderickvd commented Sep 28, 2025

This also adds the logic to always write full blocks (buffer periods) if we know the buffer size.

Sleeping over this, I simplified the logic to just notify on each callback because there are all sorts of edge cases that could cause this to hang otherwise. Like a buffer smaller than the cpal period size (wrong, but not impossible particularly because the cpal period size isn't guaranteed to be exact) or a buffer that's partially full and cannot accept a full period.

Because the callback is anyway aligned with a cpal period, it still bursts as we want to.

@dvdsk
Copy link
Member Author

dvdsk commented Sep 28, 2025

By the way, I think it should be possible to make cpal::Stream Send on macOS: RustAudio/cpal#1021

That would allow us to get rid of the ugly separate sendable microphone entirely. Very nice!

@roderickvd
Copy link
Member

Would you help me review that PR over at cpal? That community is pretty dormant. I’m trying to revive it, but for now, I don’t expect much help there.

When we get that merged I still recommend taking the condvar bits from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants