Skip to content

ros2doctor.test.test_qos_compatibility.{test_check,test_report} are flaky #1088

@christophebedard

Description

@christophebedard

Generated by Generative AI

No response

Operating System:

Ubuntu (amd64, arm64), RHEL

ROS version or commit hash:

Rolling

RMW implementation (if applicable):

rmw_fastrtps_cpp, rmw_connextdds (rmw_zenoh_cpp is excluded from these tests)

RMW Configuration (if applicable):

No response

Client library (if applicable):

rclpy, since these are ros2cli tests

'ros2 doctor --report' output

No response

Steps to reproduce issue

ros2doctor.ros2doctor.test.test_qos_compatibility.TestROS2DoctorQoSCompatibility.test_check seems to be failing pretty consistently. See these CI jobs for #1045:

  1. https://ci.ros2.org/job/ci_linux/24559/testReport/junit/ros2doctor.ros2doctor.test/test_qos_compatibility/test_qos_compatibility/
  2. https://ci.ros2.org/job/ci_linux-aarch64/18607/testReport/junit/ros2doctor.ros2doctor.test/test_qos_compatibility/test_qos_compatibility/
  3. https://ci.ros2.org/job/ci_linux-rhel/4016/testReport/junit/ros2doctor.ros2doctor.test/test_qos_compatibility/test_qos_compatibility/
  4. https://build.ros2.org/job/Rpr__ros2cli__ubuntu_noble_amd64/268/testReport/junit/ros2doctor.ros2doctor.test/test_qos_compatibility/test_qos_compatibility/

The test expects ros2 doctor to report failed checks due to incompatible QoS before talk/listener nodes, but for some reason it doesn't correctly fail:

print('Failed modules:', *fail_category)

assert 'Failed modules' in lines_list[-1]

Expected behavior

Test passes.

This is basically what it does:

  1. generate_test_description() launches some nodes
    1. Start talker node with best effort reliability QoS
      $ python3 src/ros2/ros2cli/ros2doctor/test/fixtures/talker_node_with_best_effort_qos.py
      [WARN] [1754691758.054030373] [talker_node]: New subscription discovered on topic 'chatter', requesting incompatible QoS. No messages will be sent to it. Last incompatible policy: RELIABILITY
    2. Start listener node with reliable QoS
      $ python3 /home/christophe.bedard/ros2_ws/src/ros2/ros2cli/ros2doctor/test/fixtures/listener_node_with_reliable_qos.py
      [WARN] [1754691758.054028212] [listener]: New publisher discovered on topic 'chatter', offering incompatible QoS. No messages will be received from it. Last incompatible policy: RELIABILITY
  2. test_check runs ros2 doctor, which reports the QoS incompatibility, since a best effort publisher and a reliable subscription are not compatible: https://docs.ros.org/en/rolling/Concepts/Intermediate/About-Quality-of-Service-Settings.html#qos-compatibilities
    $ ros2 doctor
    ...
    /home/christophe.bedard/ros2_ws/build/ros2doctor/ros2doctor/api/qos_compatibility.py: 51: UserWarning: ERROR: QoS compatibility error found on topic '/incompatible_chatter': Best effort publisher and reliable subscription;
    
    1/5 check(s) failed
    
    Failed modules: middleware
  3. test_report runs ros2 doctor -r/ros2 doctor --report, which prints the full report showing the incompatible topics
    $ ros2 doctor -r
    ...
       QOS COMPATIBILITY LIST
    topic [type]            : /incompatible_chatter [std_msgs/msg/String]
    publisher node          : talker_node
    subscriber node         : listener
    compatibility status    : ERROR: Best effort publisher and reliable subscription;
    ...

Actual behavior

Test fails.

Additional information

Looking at the full test output, it looks like the pub/sub nodes with incompatible QoS settings often crash, e.g.: https://build.ros2.org/job/Rpr__ros2cli__ubuntu_noble_amd64/268/consoleText

[INFO] [launch]: All log files can be found below /home/buildfarm/.ros/log/2025-08-07-20-22-03-354398-7a761ca0605f-5887
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [daemon-stop-12]: process started with pid [6165]
[INFO] [daemon-stop-12]: process has finished cleanly [pid 6165]
[INFO] [daemon-start-13]: process started with pid [6168]
[INFO] [daemon-start-13]: process has finished cleanly [pid 6168]
[INFO] [python3-14]: process started with pid [6185]
[INFO] [python3-15]: process started with pid [6186]
[INFO] [python3-16]: process started with pid [6187]
[INFO] [python3-17]: process started with pid [6188]
[INFO] [python3-14]: sending signal 'SIGINT' to process[python3-14]
[INFO] [python3-15]: sending signal 'SIGINT' to process[python3-15]
[INFO] [python3-16]: sending signal 'SIGINT' to process[python3-16]
[INFO] [python3-17]: sending signal 'SIGINT' to process[python3-17]
[python3-14] Traceback (most recent call last):
[python3-14]   File "/tmp/ws/src/ros2cli/ros2doctor/test/fixtures/talker_node_with_best_effort_qos.py", line 15, in <module>
[python3-14]     import rclpy
[python3-14]   File "/opt/ros/rolling/lib/python3.12/site-packages/rclpy/__init__.py", line 44, in <module>
[python3-14]     from typing import Any
[python3-14]   File "/usr/lib/python3.12/typing.py", line 23, in <module>
[python3-15] Traceback (most recent call last):
[python3-15]   File "/tmp/ws/src/ros2cli/ros2doctor/test/fixtures/listener_node_with_reliable_qos.py", line 15, in <module>
[python3-15]     import rclpy
[python3-15]   File "/opt/ros/rolling/lib/python3.12/site-packages/rclpy/__init__.py", line 44, in <module>
[python3-15]     from typing import Any
[python3-15]   File "/usr/lib/python3.12/typing.py", line 23, in <module>
[python3-14]     import collections
[python3-14]   File "/usr/lib/python3.12/collections/__init__.py", line 36, in <module>
[python3-15]     import collections
[python3-15]   File "/usr/lib/python3.12/collections/__init__.py", line 38, in <module>
[python3-17] Traceback (most recent call last):
[python3-17]   File "/tmp/ws/src/ros2cli/ros2doctor/test/fixtures/listener_node_with_reliable_qos.py", line 15, in <module>
[python3-17]     import rclpy
[python3-17]   File "/opt/ros/rolling/lib/python3.12/site-packages/rclpy/__init__.py", line 44, in <module>
[python3-17]     from typing import Any
[python3-17]   File "/usr/lib/python3.12/typing.py", line 30, in <module>
[python3-17]     import re as stdlib_re  # Avoid confusion with the re we export.
[python3-17]     ^^^^^^^^^^^^^^^^^^^^^^
[python3-17]   File "/usr/lib/python3.12/re/__init__.py", line 125, in <module>
[python3-15]     from reprlib import recursive_repr as _recursive_repr
[python3-15]   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
[python3-15]   File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
[python3-15]   File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
[python3-15]   File "<frozen importlib._bootstrap_external>", line 991, in exec_module
[python3-15]   File "<frozen importlib._bootstrap_external>", line 1124, in get_code
[python3-15]   File "<frozen importlib._bootstrap_external>", line 753, in _compile_bytecode
[python3-15] KeyboardInterrupt
[python3-16] Traceback (most recent call last):
[python3-16]   File "/tmp/ws/src/ros2cli/ros2doctor/test/fixtures/talker_node_with_reliable_qos.py", line 15, in <module>
[python3-16]     import rclpy
[python3-16]   File "/opt/ros/rolling/lib/python3.12/site-packages/rclpy/__init__.py", line 44, in <module>
[python3-16]     from typing import Any
[python3-16]   File "/usr/lib/python3.12/typing.py", line 1905, in <module>
[python3-14]     from operator import eq as _eq
[python3-14]   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
[python3-14]   File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
[python3-14]   File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
[python3-14]   File "<frozen importlib._bootstrap_external>", line 991, in exec_module
[python3-14]   File "<frozen importlib._bootstrap_external>", line 1124, in get_code
[python3-14]   File "<frozen importlib._bootstrap_external>", line 753, in _compile_bytecode
[python3-14] KeyboardInterrupt
[ERROR] [python3-15]: process has died [pid 6186, exit code -2, cmd '/usr/bin/python3 /tmp/ws/src/ros2cli/ros2doctor/test/fixtures/listener_node_with_reliable_qos.py --ros-args -r chatter:=incompatible_chatter'].
[ERROR] [python3-14]: process has died [pid 6185, exit code -2, cmd '/usr/bin/python3 /tmp/ws/src/ros2cli/ros2doctor/test/fixtures/talker_node_with_best_effort_qos.py --ros-args -r chatter:=incompatible_chatter'].

Then no pub/sub pairs are found by ros2 doctor -r, so it fails to report the QoS incompatibility and the test fails:

[ros2doctor-cli-29]    QOS COMPATIBILITY LIST
[ros2doctor-cli-29] compatibility status    : No publisher/subscriber pairs found
test_check[rmw_fastrtps_cpp] (test_qos_compatibility.TestROS2DoctorQoSCompatibility.test_check[rmw_fastrtps_cpp]) ... FAIL
test_report[rmw_fastrtps_cpp] (test_qos_compatibility.TestROS2DoctorQoSCompatibility.test_report[rmw_fastrtps_cpp]) ... FAIL

======================================================================
FAIL: test_check[rmw_fastrtps_cpp] (test_qos_compatibility.TestROS2DoctorQoSCompatibility.test_check[rmw_fastrtps_cpp])
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/ros/rolling/lib/python3.12/site-packages/launch_testing/markers.py", line 61, in _wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ws/src/ros2cli/ros2doctor/test/test_qos_compatibility.py", line 157, in test_check
    assert 'Failed modules' in lines_list[-1]
AssertionError: assert 'Failed modules' in 'All 5 checks passed'

======================================================================
FAIL: test_report[rmw_fastrtps_cpp] (test_qos_compatibility.TestROS2DoctorQoSCompatibility.test_report[rmw_fastrtps_cpp])
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/ros/rolling/lib/python3.12/site-packages/launch_testing/markers.py", line 61, in _wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ws/src/ros2cli/ros2doctor/test/test_qos_compatibility.py", line 168, in test_report
    assert ('topic [type]            : /compatible_chatter [std_msgs/msg/String]\n'
AssertionError: assert 'topic [type]            : /compatible_chatter [std_msgs/msg/String]\npublisher node          : talker_node\nsubscriber node         : listener\ncompatibility status    : OK' in '\n   ACTION LIST\naction                 : none\naction server count    : 0\naction client count    : 0\n\n   ROS ENV...ntain sensitive or private data.\n================================================================================\n\n'
 +  where '\n   ACTION LIST\naction                 : none\naction server count    : 0\naction client count    : 0\n\n   ROS ENV...ntain sensitive or private data.\n================================================================================\n\n' = <launch_testing.tools.process.ProcessProxy object at 0x75485cc3d7f0>.output

----------------------------------------------------------------------
Ran 2 tests in 36.416s

FAILED (failures=2)

Some of the buildfarmer logs attribute ros2doctor.test.test_qos_compatibility.test_qos_compatibility test failures to ros2/rmw_cyclonedds#535, but I'm not 100% sure it's related, because this is not happening with rmw_cyclonedds_cpp.

We can see KeyboardInterrupt in the node output above. Are they just taking too much time to launch and get killed with SIGINT? Since SIGINT gets handled by raising KeyboardInterrupt by default.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions