Skip to content

Asynchronous channel communicator construction #6881

@AndreyKulikov2022

Description

@AndreyKulikov2022

Expected Behavior

The code passes ,hpx::collectives::create_channel_communicator on each locality is constructed.

#include <hpx/hpx_main.hpp> // IWYU pragma: keep

#include <cstddef>  // for size_t, ptrdiff_t
#include <iostream> // for cout
#include <vector>   // for vector

#include <hpx/actions/base_action.hpp>              // for HPX_REGISTER_ACTION_2
#include <hpx/actions_base/plain_action.hpp>        // for HPX_PLAIN_ACTION
#include <hpx/async_distributed/async.hpp>          // for async
#include <hpx/collectives/argument_types.hpp>       // for that_site_arg, num_sites_arg
#include <hpx/collectives/channel_communicator.hpp> // for channel_communicator, get, set
#include <hpx/hpx_finalize.hpp>            // for terminate
#include <hpx/iostream.hpp>                // for cout, ostream
#include <hpx/naming_base/id_type.hpp>     // for id_type
#include <hpx/runtime.hpp>                 // for find_all_localities, get_os_thread_count
#include <hpx/serialization/serialize.hpp> // for operator>>, operator<<

void hpx_channel_communicator_test() {
    using hpx::collectives::num_sites_arg, hpx::collectives::that_site_arg;
    constexpr auto channel_communicator_name = "test/";
    const std::uint32_t n_localities = hpx::get_initial_num_localities();
    auto comm = hpx::collectives::create_channel_communicator(hpx::launch::sync,
                    channel_communicator_name, num_sites_arg(n_localities));
}

HPX_PLAIN_ACTION(hpx_channel_communicator_test, HPXChannelCommunicatorTestAction)

int main() {
    const std::vector<hpx::id_type> localities = hpx::find_all_localities();
    std::cout << localities.size() << " localities are used.\n";
    std::vector<hpx::future<void>> futures;
    futures.reserve(localities.size());
    for (const auto& locality : localities) {
        futures.push_back(hpx::async<HPXChannelCommunicatorTestAction>(locality));
    }
    hpx::wait_all(futures);
    std::cout<<"Done\n";
    return 0;
}

Actual Behavior

Often hangs. The issue occurred when updating from HPX 1.10 to 1.11

Steps to Reproduce the Problem

Run ./my_app

Sometimes multiple runs are required to catch the issue.

Observation: the hanging seems not to occur when (by luck) the construction happens on the localities in order Loc0, Loc1, Loc2.

Changing

auto comm = hpx::collectives::create_channel_communicator(hpx::launch::sync,
                    channel_communicator_name, num_sites_arg(n_localities));

to

auto comm = hpx::collectives::create_channel_communicator(channel_communicator_name, num_sites_arg(n_localities)).get()`

solves the issue, but if this is the intended way, it does not quite follow from the documentation.

Specifications

  • HPX Version: 1.11.0
  • Platform (compiler, OS, Architecture): g++13.2.0, Ubuntu 24.04.1 LTS, x86_64
  • Platform2 : macOS 15.6.1, clang-1700.0.13.5, arm64-apple-darwin24.6.0.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions