You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the io_context destructor will hang in its internal shutdown() function in the call to zmq_ctx_term(), if there still are pending azmq::socket operations/completion handlers and the azmq::socket object still exists too. This can happen if the program extends the lifetime of the azmq::socket object into the completion handler by using shared_ptr/shared_from_this(), etc., and then exits the io_context.run() by doing io_context.stop() (for example as reaction to receiving SIGINT/SIGTERM).
Is this normal/expected? One "obvious" solution is to call socket.cancel() to abort and destroy all the pending completion handlers and then also destroy all the azmq::socket objects, instead of using io_context.stop(). This comment in a cppzmq issue suggests that this is even necessary: zeromq/cppzmq#139 (comment)
However, other boost::asio objects do not seem to have such requirements (though I don't know whether that's intentional or just coincidence). Should there perhaps be some sort of auto-close mechanism to avoid blocking the io_context destructor?
Small example:
// Build: g++ -Wall -g azmq_shutdown_hang.cpp -lzmq -lboost_filesystem -o azmq_shutdown_hang
#include <azmq/socket.hpp>
#include <zmq.hpp>
#include <array>
#include <memory>
#include <stdio.h>
int main()
{
boost::asio::io_context ioctx;
auto socket = std::make_shared<azmq::socket>(ioctx, ZMQ_PULL);
socket->set_option(azmq::socket::linger(0));
socket->connect("tcp://127.0.0.1:0");
std::array<uint8_t, 1> buffer;
// Capturing the shared_ptr<socket> into the completion handler lambda extends the socket's life-time beyond that of the io_context.
// Usually the socket would be destroyed first (if it or the shared_ptr is declared after the io_context), but in this case it is not.
// The io_context destructor should destroy the pending operation and its completion handler (without calling it),
// which would also finally destroy the socket, but apparently the io_context hangs instead.
socket->async_receive(boost::asio::buffer(buffer),
[socket](boost::system::error_code const& ec, size_t)
{
printf("async_receive completion handler, ec = %s\n", ec.message().c_str());
}
);
// Calling cancel() removes the pending async operation, so the socket is destroyed before the io_service again, then it does not hang.
//socket->cancel();
printf("destroying io_context, does it hang?...\n");
return 0;
}
Backtrace of the hang:
#0 0x00007ffff7b3fd7f in __GI___poll (fds=0x7fffffffd7a0, nfds=1, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff7f34dde in zmq::signaler_t::wait(int) const () from /usr/local/lib/libzmq.so.5
#2 0x00007ffff7f11d72 in zmq::mailbox_t::recv(zmq::command_t*, int) () from /usr/local/lib/libzmq.so.5
#3 0x00007ffff7f0321f in zmq::ctx_t::terminate() () from /usr/local/lib/libzmq.so.5
#4 0x00007ffff7f5575e in zmq_ctx_term () from /usr/local/lib/libzmq.so.5
#5 0x00005555555829a6 in std::_Sp_counted_deleter<void*, int (*)(void*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5555555d0ae0) at /usr/include/c++/11/bits/shared_ptr_base.h:442
#6 0x000055555556e7d7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5555555d0ae0)
at /usr/include/c++/11/bits/shared_ptr_base.h:168
#7 0x000055555556bdbd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffda28,
__in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#8 0x000055555556566c in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffda20,
__in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#9 0x000055555556cb72 in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::reset (this=0x5555555d0338)
at /usr/include/c++/11/bits/shared_ptr_base.h:1272
#10 0x0000555555567bbe in azmq::detail::socket_service::shutdown_service (this=0x5555555d0310)
at /usr/local/include/azmq/detail/socket_service.hpp:206
#11 0x0000555555565171 in boost::asio::io_context::service::shutdown (this=0x5555555d0310)
at /usr/local/include/boost/asio/impl/io_context.ipp:148
#12 0x0000555555560637 in boost::asio::detail::service_registry::shutdown_services (this=0x5555555d0180)
at /usr/local/include/boost/asio/detail/impl/service_registry.ipp:44
#13 0x0000555555560b9b in boost::asio::execution_context::shutdown (this=0x7fffffffdb00)
at /usr/local/include/boost/asio/impl/execution_context.ipp:41
#14 0x00005555555650a4 in boost::asio::io_context::~io_context (this=0x7fffffffdb00, __in_chrg=<optimized out>)
at /usr/local/include/boost/asio/impl/io_context.ipp:58
#15 0x000055555555c1be in main () at azmq_shutdown_hang.cpp:35
Tested with boost 1.82.0, libzmq master, azmq master.
The text was updated successfully, but these errors were encountered:
Hi,
It seems that the io_context destructor will hang in its internal
shutdown()
function in the call tozmq_ctx_term()
, if there still are pendingazmq::socket
operations/completion handlers and theazmq::socket
object still exists too. This can happen if the program extends the lifetime of theazmq::socket
object into the completion handler by usingshared_ptr
/shared_from_this()
, etc., and then exits theio_context.run()
by doingio_context.stop()
(for example as reaction to receiving SIGINT/SIGTERM).Is this normal/expected? One "obvious" solution is to call
socket.cancel()
to abort and destroy all the pending completion handlers and then also destroy all theazmq::socket
objects, instead of usingio_context.stop()
. This comment in a cppzmq issue suggests that this is even necessary: zeromq/cppzmq#139 (comment)However, other
boost::asio
objects do not seem to have such requirements (though I don't know whether that's intentional or just coincidence). Should there perhaps be some sort of auto-close mechanism to avoid blocking the io_context destructor?Small example:
Backtrace of the hang:
Tested with boost 1.82.0, libzmq master, azmq master.
The text was updated successfully, but these errors were encountered: