Skip to content

Understanding Android audio towards achieving low latency response

Igor Zinken edited this page Dec 28, 2019 · 8 revisions

The latency issue

Latency is the perceived delay between an audio-triggering action and hearing the actual output, for instance : if you'd press a key on a keyboard you'd expect to hear a sound immediately rather than after a noticeable delay.

The reason is that after pressing the key we're likely to start synthesizing (or playing back a sample) by writing it into an audio buffer. This buffer is queued to be played by the hardware as soon as the previous buffer has finished playing. As such these buffers must be small enough in size to ensure a quick response. However, setting the buffer size too small might result in overloading the CPU as it has to work harder to deliver more buffers in the same amount of time. This would result in glitches during playback. The golden rule is : the higher the buffer size, the more stable the audio becomes, but also the higher the latency. The key is to find the right buffer size for an acceptable and stable environment!

Ideally for a "live instrument" feel the latency should be below 70 ms, though most musicians wouldn't deem a latency over 20 ms as acceptable (ideally even preferring <5 ms!) we must remember that we're dealing with consumer hardware here! However, there is a lot to be gained by making the right decisions when programming audio applications.

So, why not use the Android SDK's AudioTrack ?

As the Google Android SDK provides an elegant API for developing using the high level Java / Kotlin languages, the interface and model of the MikroWave application was developed entirely in Java. The initial MWEngine was as such written in Java, relying on the AudioTrack class for providing us with a resource for audio output.

While this gave satisfying results when used with the sequencer, the latency was too high (measured at around 250 ms!) when using the on-screen keyboard or when adjusting audio properties during playback, rendering it pretty much useless as it lacked an instantaneous response.

Going the native route using C++ and hardware API's

This led to porting the audio engine to C++ to run natively on the Android device ( in other words: outside of the JVM, omitting the AudioTrack API and the hit of garbage collection ).

Using OpenSL all audio buffers are written directly to the audio hardware, greatly reducing latency on newer Android versions ( on older Androids the performance will be at least similar to using AudioTrack, but different hardware configurations can still benefit from a greatly increased performance using this native workaround, and read on... there is a lot of benefit to be had on older devices / operating systems as well ! ).

When Google released AAudio, it could easily be implemented within MWEngine as the programming language and toolchain was now equal.

So OpenSL/AAudio means instant low latency?

Not really... this is a topic where Google provides little information on... and has most developers stumped. There are several considerations to take into account before reaching low latency. Some of these are:

Render audio in a non-locking thread

use of a circular / ring buffer omits locking of the render thread and having other threads being scheduled in a higher priority. This allows for buffer queuing and a continuous read / write cycle.

While Android is essentially a Linux platform, the FIFO scheduling priority is unavailable as it might interfere with keeping battery consumption to a minimum!

The right sample rate

certain Android devices have a native sample rate of 48 kHz, while you were perhaps synthesizing your audio at 44.1 kHz. This means that audio output is routed through the system resampler for upsampling to 48 kHz, with the result that this added route in the output path adds to in an increase in overall latency!

The right buffer size

As mentioned before: larger values improve stability, while lower values lower latency. However, it's not just a matter of choosing the right size for the minimum amount of latency and maximum stability, but also making it a multiple of the devices native buffer size.

For instance : some devices may report a recommended 4800 samples per buffer-size, which is apart from being quite large, also an unusual number. By dividing using multiples of 4800 we can however reach a low, stable buffer size of 75 samples per buffer. Other devices may however report a recommended buffer size of 512 samples, the lowest usable buffer size ( when keeping the above rule of multiples in mind ) could be 64 samples. Using a value outside of the multiple range ( for instance using 75 samples per buffer on a device that requires 64 samples per buffer ) may cause glitches as occasionally the buffer callback is called twice per timeslice. This can go by unnoticed if you have CPU cycles to spare, but it's more likely that this will slowly but surely accumulate to a clusterfuck where audio will drop out / buffers are skipped.

You can query the devices native sample rate and buffer size using the AudioManager class.

Small buffer sizes

It has been noted that devices that report a very low buffer size may not necessarily be honest about their performance. For MikroWave, recommended sample rates that are below 256 samples are doubled until the value exceeds the threshold of 256 samples. When in doubt, provide a facility to adjust the sample rate within your application.

Clone this wiki locally