io_context destructor hangs in zmq_ctx_term() if async socket operation is still pending #207

dkl · 2023-07-20T17:15:55Z

Hi,

It seems that the io_context destructor will hang in its internal shutdown() function in the call to zmq_ctx_term(), if there still are pending azmq::socket operations/completion handlers and the azmq::socket object still exists too. This can happen if the program extends the lifetime of the azmq::socket object into the completion handler by using shared_ptr/shared_from_this(), etc., and then exits the io_context.run() by doing io_context.stop() (for example as reaction to receiving SIGINT/SIGTERM).

Is this normal/expected? One "obvious" solution is to call socket.cancel() to abort and destroy all the pending completion handlers and then also destroy all the azmq::socket objects, instead of using io_context.stop(). This comment in a cppzmq issue suggests that this is even necessary: zeromq/cppzmq#139 (comment)

However, other boost::asio objects do not seem to have such requirements (though I don't know whether that's intentional or just coincidence). Should there perhaps be some sort of auto-close mechanism to avoid blocking the io_context destructor?

Small example:

// Build: g++ -Wall -g azmq_shutdown_hang.cpp -lzmq -lboost_filesystem -o azmq_shutdown_hang

#include <azmq/socket.hpp>
#include <zmq.hpp>

#include <array>
#include <memory>
#include <stdio.h>

int main()
{
	boost::asio::io_context ioctx;
	auto socket = std::make_shared<azmq::socket>(ioctx, ZMQ_PULL);

	socket->set_option(azmq::socket::linger(0));
	socket->connect("tcp://127.0.0.1:0");
	std::array<uint8_t, 1> buffer;

	// Capturing the shared_ptr<socket> into the completion handler lambda extends the socket's life-time beyond that of the io_context.
	// Usually the socket would be destroyed first (if it or the shared_ptr is declared after the io_context), but in this case it is not.
	// The io_context destructor should destroy the pending operation and its completion handler (without calling it),
	// which would also finally destroy the socket, but apparently the io_context hangs instead.
	socket->async_receive(boost::asio::buffer(buffer),
		[socket](boost::system::error_code const& ec, size_t)
		{
			printf("async_receive completion handler, ec = %s\n", ec.message().c_str());
		}
	);

	// Calling cancel() removes the pending async operation, so the socket is destroyed before the io_service again, then it does not hang.
	//socket->cancel();

	printf("destroying io_context, does it hang?...\n");
	return 0;
}

Backtrace of the hang:

#0  0x00007ffff7b3fd7f in __GI___poll (fds=0x7fffffffd7a0, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffff7f34dde in zmq::signaler_t::wait(int) const () from /usr/local/lib/libzmq.so.5
#2  0x00007ffff7f11d72 in zmq::mailbox_t::recv(zmq::command_t*, int) () from /usr/local/lib/libzmq.so.5
#3  0x00007ffff7f0321f in zmq::ctx_t::terminate() () from /usr/local/lib/libzmq.so.5
#4  0x00007ffff7f5575e in zmq_ctx_term () from /usr/local/lib/libzmq.so.5
#5  0x00005555555829a6 in std::_Sp_counted_deleter<void*, int (*)(void*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5555555d0ae0) at /usr/include/c++/11/bits/shared_ptr_base.h:442
#6  0x000055555556e7d7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5555555d0ae0)
    at /usr/include/c++/11/bits/shared_ptr_base.h:168
#7  0x000055555556bdbd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffda28, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#8  0x000055555556566c in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffda20, 
    __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#9  0x000055555556cb72 in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::reset (this=0x5555555d0338)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1272
#10 0x0000555555567bbe in azmq::detail::socket_service::shutdown_service (this=0x5555555d0310)
    at /usr/local/include/azmq/detail/socket_service.hpp:206
#11 0x0000555555565171 in boost::asio::io_context::service::shutdown (this=0x5555555d0310)
    at /usr/local/include/boost/asio/impl/io_context.ipp:148
#12 0x0000555555560637 in boost::asio::detail::service_registry::shutdown_services (this=0x5555555d0180)
    at /usr/local/include/boost/asio/detail/impl/service_registry.ipp:44
#13 0x0000555555560b9b in boost::asio::execution_context::shutdown (this=0x7fffffffdb00)
    at /usr/local/include/boost/asio/impl/execution_context.ipp:41
#14 0x00005555555650a4 in boost::asio::io_context::~io_context (this=0x7fffffffdb00, __in_chrg=<optimized out>)
    at /usr/local/include/boost/asio/impl/io_context.ipp:58
#15 0x000055555555c1be in main () at azmq_shutdown_hang.cpp:35

Tested with boost 1.82.0, libzmq master, azmq master.

The text was updated successfully, but these errors were encountered:

Degoah · 2023-10-17T05:43:25Z

Any plans to work on this finding? I have also been caught by this one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io_context destructor hangs in zmq_ctx_term() if async socket operation is still pending #207

io_context destructor hangs in zmq_ctx_term() if async socket operation is still pending #207

dkl commented Jul 20, 2023

Degoah commented Oct 17, 2023

io_context destructor hangs in zmq_ctx_term() if async socket operation is still pending #207

io_context destructor hangs in zmq_ctx_term() if async socket operation is still pending #207

Comments

dkl commented Jul 20, 2023

Degoah commented Oct 17, 2023