-
Notifications
You must be signed in to change notification settings - Fork 898
Performance Testing
Aeron is designed to provide high-throughput and low-latency message transport from publishers to subscribers.
A number of tests are provided in the samples module to test both latency and throughput.
Comparisons of Aeron with other algorithms for RTT IPC latency can be found here.
A guide for running tests on AWS can be found here.
For some context, Todd co-authored a paper in a previous life.
Adjust socket limits:
Linux
$ sudo sysctl net.core.rmem_max=2097152
$ sudo sysctl net.core.wmem_max=2097152
FreeBSD / Darwin
$ sudo sysctl -w kern.ipc.maxsockbuf=4194304
$ sudo sysctl -w net.inet.tcp.sendspace=2097152
$ sudo sysctl -w net.inet.tcp.recvspace=2097152
Note: Make sure that the Aeron directory is created on a RAM disk. See Best-Practices-Guide#macdarwin on how to do that.
Windows
Due to the higher system call overhead with Windows it helps to use larger socket buffers than on Linux, e.g. try 2-4x larger. As Windows does not have a /dev/shm
is necessary to install a RAM disk such as http://www.radeonramdisk.com/ for the Aeron directory. A RAM disk will avoid the disk write latency for the memory mapped files used to communicate between the clients and driver.
Explanation of configuration options:
- -XX:UseBiasedLocking: The driver has no contended locks so can benefit from avoiding the CAS operations to take a lock.
- -XX:BiasedLockingStartupDelay=0: The Aeron driver can easily be running before the default delay of biased locking is passed.
- -XX:+UnlockDiagnosticVMOptions -XX:GuaranteedSafepointInterval=300000: To reduce the frequency of the JVM bringing all threads to a safepoint.
- -XX:+UseParallelOldGC: Use parallel garbage collection for the full collections.
- -Djava.net.preferIPv4Stack=true: The IPv4 stack can be more a efficient path than IPv6 within the Java JNI implementation.
- -Daeron.mtu.length=8k: Increase the size of the maximum transmission unit to reduce system calls in a throughput scenario.
- -Daeron.socket.so_sndbuf=2m: Increase the size of OS socket send buffer (SO_SNDBUF) to account for Bandwidth Delay Product (BDP) on a high bandwidth network.
- -Daeron.socket.so_rcvbuf=2m: Increase the size of OS socket receive buffer (SO_RCVBUF) to account for Bandwidth Delay Product (BDP) on a high bandwidth network.
- -Daeron.rcv.initial.window.length=2m: Set the initial window for flow control to account for BDP.
- -Daeron.term.buffer.sparse.file=false: Do not use sparse files for the term buffers to avoid page faults.
- -Daeron.pre.touch.mapped.memory=true: Pre-touch memory mapped files to fault the pages into client processes.
- -Dagrona.disable.bounds.checks=true: Disable bounds checking to reduce instruction path on private secure networks.
Sample scripts are available to make the following more convenient in the aeron-samples module.
Run the media driver:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-XX:+UnlockDiagnosticVMOptions \
-XX:GuaranteedSafepointInterval=300000 \
-XX:+UseBiasedLocking \
-XX:BiasedLockingStartupDelay=0 \
-XX:+UseParallelOldGC \
-Djava.net.preferIPv4Stack=true \
-Daeron.mtu.length=8k \
-Daeron.socket.so_sndbuf=2m \
-Daeron.socket.so_rcvbuf=2m \
-Daeron.rcv.initial.window.length=2m \
-Dagrona.disable.bounds.checks=true \
-Daeron.term.buffer.sparse.file=false \
-Daeron.pre.touch.mapped.memory=true \
io.aeron.samples.LowLatencyMediaDriver
Run the Subscriber:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-XX:+UseParallelOldGC \
-Dagrona.disable.bounds.checks=true \
-Daeron.sample.frameCountLimit=256 \
io.aeron.samples.RateSubscriber
Run the Publisher:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-XX:+UseParallelOldGC \
-Daeron.sample.messageLength=32 \
-Daeron.sample.messages=500000000 \
-Dagrona.disable.bounds.checks=true \
io.aeron.samples.StreamingPublisher
IPC throughput via Shared Memory that bypasses the network:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-Dagrona.disable.bounds.checks=true \
-Daeron.sample.messageLength=32 \
io.aeron.samples.EmbeddedIpcThroughput
IPC throughput via Shared Memory that bypasses the network and uses exclusive publications:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-Dagrona.disable.bounds.checks=true \
-Daeron.sample.messageLength=32 \
io.aeron.samples.EmbeddedExclusiveIpcThroughput
Currently no specific changes required.
Run the media driver:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-XX:+UnlockDiagnosticVMOptions \
-XX:GuaranteedSafepointInterval=300000 \
-XX:+UseBiasedLocking \
-XX:BiasedLockingStartupDelay=0 \
-XX:+UseParallelOldGC \
-Djava.net.preferIPv4Stack=true \
-Dagrona.disable.bounds.checks=true \
io.aeron.samples.LowLatencyMediaDriver
Run the Subscriber:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-XX:+UnlockDiagnosticVMOptions \
-XX:GuaranteedSafepointInterval=300000 \
-XX:+UseParallelOldGC \
-Daeron.pre.touch.mapped.memory=true \
-Dagrona.disable.bounds.checks=true \
io.aeron.samples.Pong
Run the Publisher:
$ java -cp aeron-all/build/libs/aeron-all-<version>.jar \
-XX:+UnlockDiagnosticVMOptions \
-XX:GuaranteedSafepointInterval=300000 \
-XX:+UseParallelOldGC \
-Daeron.sample.messages=100000 \
-Daeron.sample.messageLength=32 \
-Daeron.pre.touch.mapped.memory=true \
-Dagrona.disable.bounds.checks=true \
io.aeron.samples.Ping
Aeron supports the recording and replay of live streams from persistent storage. Samples to test performance can give a good feel for what your hardware is capable of. Further details of the Aeron Archive can be found on wiki.
- Java Flight Recorder:
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
- Get the hotspot compiler log:
-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintAssembly
(see also http://mechanical-sympathy.blogspot.co.uk/2013/06/printing-generated-assembly-code-from.html)
With the Java Driver it is possible to disable bounds checking by using a system property, however with the C driver and clients it is not as straight forward. If a user needs the extra performance boost and is willing to take the associated risk, the the DISABLE_BOUNDS_CHECK
define can be used at compile time. This is not set by default.