Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cancel() runs completion handlers immediately but should post/defer them instead #210

Open
dkl opened this issue Aug 30, 2023 · 3 comments

Comments

@dkl
Copy link

dkl commented Aug 30, 2023

Hi,

socket::cancel() appears to call pending completion handlers immediately, instead of delaying their execution until after the call returns. As a result there are dead locks when calling more socket operations (such as async_send()) from the completion handlers, because the socket uses a non-recursive mutex internally. This differs from other boost::asio objects such as boost::asio::steady_timer, which allow this case, so for example you can restart a timer from inside the operation_aborted completion handler.

An example to show the issue:

// Build: g++ -Wall -g azmq_cancel_timing.cpp -lzmq -lboost_filesystem -o azmq_cancel_timing

#include <azmq/socket.hpp>
#include <azmq/version.hpp>
#include <boost/asio/io_context.hpp>
#include <boost/asio/ip/udp.hpp>
#include <boost/asio/steady_timer.hpp>
#include <chrono>
#include <stdio.h>
#include <zmq.hpp>

int main()
{
	printf("boost version: %i\n", BOOST_VERSION);
	printf("azmq version: %i\n", AZMQ_VERSION);

	boost::asio::io_context ioctx;

	boost::asio::steady_timer timer(ioctx);
	timer.expires_from_now(std::chrono::seconds(1));
	timer.async_wait(
		[](boost::system::error_code const& ec)
		{
			printf("timer async_wait completion handler, ec = %s\n", ec.message().c_str());
		}
	);

	boost::asio::ip::udp::socket udpsocket(ioctx, boost::asio::ip::udp::endpoint(boost::asio::ip::make_address("127.0.0.1"), 0));
	std::array<uint8_t, 1000> buffer1;
	udpsocket.async_receive(boost::asio::buffer(buffer1),
		[](boost::system::error_code const& ec, size_t)
		{
			printf("udp socket async_receive completion handler, ec = %s\n", ec.message().c_str());
		}
	);

	azmq::socket azmqsocket(ioctx, ZMQ_PULL);
	azmqsocket.set_option(azmq::socket::linger(0));
	azmqsocket.connect("tcp://127.0.0.1:12345");
	std::array<uint8_t, 1000> buffer2;
	azmqsocket.async_receive(boost::asio::buffer(buffer2),
		[](boost::system::error_code const& ec, size_t)
		{
			printf("azmq socket async_receive completion handler, ec = %s\n", ec.message().c_str());
		}
	);

	printf("timer.cancel()...\n");
	timer.cancel();
	printf("timer.cancel()... done\n");

	printf("udpsocket.cancel()...\n");
	udpsocket.cancel();
	printf("udpsocket.cancel()... done\n");

	printf("azmqsocket.cancel()...\n");
	azmqsocket.cancel();
	printf("azmqsocket.cancel()... done\n");

	printf("io_context.run()...\n");
	ioctx.run();
	printf("io_context.run()... done\n");

	return 0;
}

Actual output, azmq socket competion handler called during cancel(), instead of later like the others:

boost version: 108200
azmq version: 10002
timer.cancel()...
timer.cancel()... done
udpsocket.cancel()...
udpsocket.cancel()... done
azmqsocket.cancel()...
azmq socket async_receive completion handler, ec = Operation canceled
azmqsocket.cancel()... done
io_context.run()...
timer async_wait completion handler, ec = Operation canceled
udp socket async_receive completion handler, ec = Operation canceled
io_context.run()... done

Expected output: All completion handlers are called later through the io_service.

@Degoah
Copy link

Degoah commented Oct 17, 2023

Any plans to work on this finding? I have also been caught by this one.

@sehe
Copy link

sehe commented Oct 27, 2023

This behaviour absolutely needs to be fixed because it violates Asio guarantees:

"Asynchronous completion handlers will only be called from threads that are currently calling io_context::run()

@Degoah conjectures here simply skipping cancel_ops() appears to maybe fix the surface symptom:

https://github.com/zeromq/azmq/blob/master/azmq/detail/socket_service.hpp#L483

However, I'm not qualified to judge whether this is okay w.r.t. lifetimes (as the descriptor is being unregistered - which may or may not be okay?).

Also, note it doesn't solve the issue that the executor is not being honored as it should according to the Asio specs:

https://compiler-explorer.com/z/W83xMGqve, which (with the cancel_ops() removed as described) outputs

image

That's still wrong, as the azmqsocket callback should be invoked ON the strand.

@Degoah
Copy link

Degoah commented Oct 30, 2023

However, I'm not qualified to judge whether this is okay w.r.t. lifetimes (as the descriptor is being unregistered - which may or may not be okay?).

What I have understood from the implementation is, that de-registering the descriptor is ok, as it gets implicitly registered again in context of a new call to async_receive/async_send and an internal call to schedule(...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants