diff --git a/docs/closed_set.md b/docs/closed_set.md index 03bc3b4..5a117cd 100644 --- a/docs/closed_set.md +++ b/docs/closed_set.md @@ -16,7 +16,7 @@ In some cases, a more minimal installation is desirable (e.g., containers). The 1. Add the CUDA repositories [here](https://developer.nvidia.com/cuda-downloads) by installing the `deb (network)` package or -```bash +```shell # make sure you pick the correct ubuntu version! wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb @@ -34,7 +34,7 @@ libnvinfer-dev/unknown,now 10.4.0.26-1+cuda12.6 amd64 3. Install TensorRT and CUDA if necessary: -```bash +```shell # use the corresponding version number from the previous step or omit nvcc if already installed sudo apt install libnvinfer-dev libnvonnxparsers-dev libnvinfer-plugin-dev cuda-nvcc-12-6 ``` @@ -42,13 +42,13 @@ sudo apt install libnvinfer-dev libnvonnxparsers-dev libnvinfer-plugin-dev cuda- ### Building Once the necessary dependencies are installed and this repository has been placed in a workspace, run the following: -``` -catkin build +```shell +colcon build ``` -You can run the following to make sure everything is working: -``` -catkin test semantic_inference +You can run the following to validate that `semantic_inference` built correctly: +```shell +colcon test --packages-select semantic_inference ``` ### Models @@ -61,7 +61,7 @@ By default, the code uses [this](https://drive.google.com/file/d/1XRcsyLSvqqhqNI > We recommend using models within the [dense2d](https://drive.google.com/drive/folders/17p_ZZIxI9jI_3GjjtbMijC2WFnc9Bz-a?usp=sharing) folder, which are named corresponding to the labelspace they output to. > The top-level models are deprecated as they do not follow this naming scheme (they all output to the ade20k label space). -By default, the closed set node looks under the directory `$HOME/.semantic_inference$` for models (this works on Linux or as long as `HOME` is set). +By default, the closed set node looks under the directory `$HOME/.semantic_inference` for models (this works on Linux or as long as `HOME` is set). It is possible to change this directory by specifying the `SEMANTIC_INFERENCE_MODEL_DIR` environment variable. To use a specific downloaded model, use the argument `model_file:=MODEL_FILE` when running the appropriate launch file (where `MODEL_NAME` is the filename of the model relative to the configured model directory). Specifying an absolute filepath will override the default model directory. @@ -73,37 +73,36 @@ See [here](exporting.md) for details on exporting a new model. ### Python utilities You may find it useful to set up some of the included model utilities. From the top-level of this repository, run: -``` -python3 -m virtualenv /path/to/new/environment -source /path/to/new/environment/bin/activate +```shell +python3 -m virtualenv +source /bin/activate pip install ./semantic_inference ``` ## Using closed-set segmentation online -To use the open-set segmentation as part of a larger system, include [semantic_inference.launch](../semantic_inference_ros/launch/semantic_inference.launch) in your launch file. Often this will look like this: -```xml - - - - - - - - +To use the open-set segmentation as part of a larger system, include [closed_set.launch.yaml](../semantic_inference_ros/launch/closed_set.launch.yaml) in your launch file. Often this will look like this: +```yaml +launch: + # ... rest of launch file ... + - set_remap: {from: "color/image_raw", to: "YOUR_INPUT_TOPIC_HERE"} + - include: {file: "$(find-pkg-share semantic_inference_ros)/launch/closed_set.launch.yaml"} ``` +> **Note**
+> You'll probably also want to namespace the included launch file and corresponding remap via a `group` tag and `push_ros_namespace` with the camera name as the namespace. + ## Adding New Datasets To adapt to a new dataset (or to make a new grouping of labels), you will have to: - - Make a new grouping config (see [this](config/label_groupings/ade150_outdoor.yaml) or [this](config/label_groupings/ade150_indoor.yaml) for examples) + - Make a new grouping config (see [this](../semantic_inference_ros/config/label_groupings/ade20k_outdoor.yaml) or [this](../semantic_inference_ros/config/label_groupings/ade20k_indoor.yaml) for examples) - Pass in the appropriate arguments to the launch file (`labelspace_name`) -You can view the groupings for a particular labelspace by running `semantic_inference labelspace compare`. +You can view the groupings for a particular labelspace by running `semantic-inference labelspace compare`. For a grouping of the ade20k labelspace: -```bash -source ~/path/to/environment/bin/activate -cd path/to/semantic_inference -semantic_inference labelspace compare resources/ade20k_categories.csv config/label_groupings/ade150_indoor.yaml +```shell +source /bin/activate +cd +semantic-inference labelspace compare semantic_inference/resources/ade20k_categories.csv semantic_inference_ros/config/label_groupings/ade20k_indoor.yaml ``` diff --git a/docs/exporting.md b/docs/exporting.md index e5fce10..2c8e72a 100644 --- a/docs/exporting.md +++ b/docs/exporting.md @@ -1,34 +1,15 @@ # Exporting Instructions -When exporting a new model, you will also need to write a config similar to [this](../semantic_inference/config/models/ade20k-efficientvit_seg_l2.yaml). +When exporting a new model, you will also need to write a config similar to [this](../semantic_inference_ros/config/models/ade20k-efficientvit_seg_l2.yaml). ## ADE20k baseline models You can export all models (that are compatible) from the MIT scene parsing challenge via [this script](../exporting/export_mit_semseg.py). At the moment, only `hrnetv2-c1` and `mobilenetv2dilated-c1_deepsup` are compatible. This may change as the newer onnx export method in torch becomes stable (or not, it is unclear whether or not the custom batch norm operator will ever work with the export). -To run the script, you will want to create a separate virtual environment and install dependencies lists in the [pyproject file](../semantic_inference/setup.cfg). +To run the script, you will want to create a separate virtual environment and install dependencies lists in the [pyproject file](../semantic_inference/pyproject.toml). ## EfficientViT Make a separate virtual environment (with `python3.10`) and just install efficientvit to the environment (pip installing from the git repo url worked well). Run [this script](../exporting/export_efficientvit.py). You may need other dependencies (`click`, `matplotlib`, etc.). - -## OneFormer - -Requires pytorch 1.13.1 (i.e., last release before 2.0) and CUDA 11.7. Do NOT use conda - -For CUDA: -``` -sudo apt install cuda-libraries-11-7 cuda-libraries-dev-11-7 cuda-nvrtc-dev-11-7 cuda-nvcc-11-7 -sudo update-alternatives --config cuda -``` - -Other tweaks: -- Install detectron2 directly from the repo -- Remove pinned versions from the requirements file (as well as natten) -- Install the corresponding version of natten from the SHI-labs website - -In general, the export does not seem to work. -Newer versions of pytorch lead to significantly decreased model peformance and there are too many problems with how inputs and outputs are passed around. -See [this script](../exporting/export_oneformer.py) for the closest attempt. diff --git a/docs/open_set.md b/docs/open_set.md index 945f4da..02a83df 100644 --- a/docs/open_set.md +++ b/docs/open_set.md @@ -2,33 +2,30 @@ ## Setting Up -The open-set segmentation interface works with and without ROS. For working with ROS, we assume you have already built your catkin workspace with this repository in it beforehand (i.e., by running `catkin build`). +The open-set segmentation interface works with and without ROS. For working with ROS, we assume you have already built your workspace with this repository in it beforehand (i.e., by running `colcon build`). > **Note
** > If you intend only to use the open-set segmentation interface, you may want to turn off building against TensorRT, which you can do by the following: -> ``` -> catkin config -a -DSEMANTIC_INFERENCE_USE_TRT=OFF +> ```shell +> colcon build --cmake-args --no-warn-unused-cli -DSEMANTIC_INFERENCE_USE_TRT=OFF > ``` ### Installing We assume you are using a virtual environment. You may want to install `virtualenv` (usually `sudo apt install python3-virtualenv`) if you haven't already. To set up a virtual environment for use with ROS: -``` -python3 -m virtualenv -p /usr/bin/python3 --system-site-packages /desired/path/to/environment +```shell +python3 -m virtualenv -p /usr/bin/python3 --system-site-packages ``` Otherwise, omit the ``--system-site-packages`` option: -``` -python3 -m virtualenv -p /usr/bin/python3 --download /desired/path/to/environment +```shell +python3 -m virtualenv -p /usr/bin/python3 --download ``` -> :warning: **Warning**
-> Note that newer versions of `setuptools` are incompatible with `--system-site-packages` on Ubuntu 20.04. Do not use `--download` and `--system-site-packages` and expect the installation to work (specifically with external packages specified by git url). - Then, install `semantic_inference` -```bash -cd /path/to/repo -source /path/to/environment +```shell +cd +source /bin/activate pip install ./semantic_inference[openset] # note that the openset extra is required for open-set semantic segmentation ``` You may see dependency version errors from pip if installing into an environment created with `--system-site-packages`. This is expected. @@ -40,33 +37,29 @@ It is also possible to install via an editable install (i.e., by using `-e` when Note that both CLIP and FastSAM automatically download the relevant model weights when they are first run. Running with the original SAM may require downloading the model weights. See the official SAM repository [here](https://github.com/facebookresearch/segment-anything) for more details. -## Using open-set segmentation online - -To use the open-set segmentation as part of a larger system, include [openset_segmentation.launch](../semantic_inference_ros/launch/openset_segmentation.launch) in your launch file. Often this will look like this: -```xml - +## Trying out open-set segmentation nodes - +Similar to the example [here](../README.md#usage), you can run any of the open-set launch files: - - - - +```shell +activate /bin/activate +## this example just produces an embedding vector per image +# ros2 launch semantic_inference_ros image_embedding_node.launch.yaml +ros2 launch semantic_inference_ros open_set.launch.yaml +``` +and then run +```shell +ros2 bag play PATH_TO_BAG --remap INPUT_TOPIC:=/color/image_raw ``` -Note there are some arguments you may want to specify when including `openset_segmentation.launch` that are not shown here, specifically the configuration for the model to use. -## Pre-generating semantics +You should see a single embedding vector published under `/semantic/feature` and (if running the full open-set segmenter), the segmentation results under `/semantic/image_raw` and a visualization of the results under `/semantic_color/image_raw` and `/semantic_overlay/image_raw`. -It is also possible to pre-generate semantics when working with recorded data. -To create a rosbag containing the original bag contents *plus* the resulting open-set segmentation, run the following -``` -rosrun semantic_inference_ros make_rosbag --copy /path/to/input_bag \ - /color_topic:/output_topic \ - -o /path/to/desired/output_bag -``` -replacing `/color_topic` and `/output_topic` with appropriate topic names (usually `/camera_name/color/image_raw` and `/camera_name/semantic/image_raw`). +## Using open-set segmentation online -Additional options exist. -Running without `--copy` will output just the open-set segmentation at the path specified by `-o`. -If no output path is specified, the semantics will be added in-place to the bag after a confirmation prompt (you can disable the prompt with `-y`). -Additional information and documentation is available via `--help`. +To use the open-set segmentation as part of a larger system, include [open_set.launch.yaml](../semantic_inference_ros/launch/open_set.launch.yaml) in your launch file. Often this will look like this: +```yaml +launch: + # ... rest of launch file ... + - set_remap: {from: "color/image_raw", to: "YOUR_INPUT_TOPIC_HERE"} + - include: {file: "$(find-pkg-share semantic_inference_ros)/launch/opsen_set.launch.yaml"} +``` diff --git a/semantic_inference/pyproject.toml b/semantic_inference/pyproject.toml index b3149dd..29bf976 100644 --- a/semantic_inference/pyproject.toml +++ b/semantic_inference/pyproject.toml @@ -22,12 +22,14 @@ dependencies = [ "imageio", "onnx", "pandas", + "rich", "rosbags", "ruamel.yaml", "seaborn", "torch", "torchvision", "spark_config@git+https://github.com/MIT-SPARK/Spark-Config.git", + "numpy<2", ] [tool.setuptools.packages.find] diff --git a/semantic_inference/python/semantic_inference/commands/labelspace.py b/semantic_inference/python/semantic_inference/commands/labelspace.py index 95668e2..8f76775 100644 --- a/semantic_inference/python/semantic_inference/commands/labelspace.py +++ b/semantic_inference/python/semantic_inference/commands/labelspace.py @@ -33,6 +33,8 @@ import pathlib import click +import rich.console +import rich.table from ruamel.yaml import YAML yaml = YAML(typ="safe", pure=True) @@ -58,44 +60,34 @@ def _load_catmap(filepath, cat_index=0, name_index=-1): def _load_groups(filename): with filename.expanduser().absolute().open("r") as fin: - contents = yaml.load(fin.read(), Loader=yaml.SafeLoader) + contents = yaml.load(fin.read()) - names = {index + 1: group["name"] for index, group in enumerate(contents["groups"])} - return names + info = {} + for index, group in enumerate(contents["groups"]): + for label in group["labels"]: + info[label] = (index, group["name"]) + return info -def _show_labels(groups, catmap): - print("{:<39}||{:<39}".format("orig: label → name", "grouping: label → name")) - print("-" * 80) - N_max = max(max(groups), max(catmap)) - for i in range(N_max): - orig_str = f"{i} → {catmap[i]}" if i in catmap else "n/a" - new_str = f"{i} → {groups[i]}" if i in groups else "n/a" - print(f"{orig_str:<39}||{new_str:<39}") +def _make_label_table(groups, catmap): + table = rich.table.Table(title="Label Mapping") + table.add_column("Original Label") + table.add_column("Original Name(s)", max_width=30) + table.add_column("New Label") + table.add_column("New Name") -def _get_match_str(name, index, matches): - if index in matches: - match_str = f" - {name}: {index} → {matches[index]}" - return f"{match_str:<39}" + for i in range(max(catmap)): + label = f"{i}" + orig_names = ", ".join(catmap.get(i, "--").split(";")) - match_str = f" - {name}: {index} → ?" - match_str = f"{match_str:<39}" - return click.style(match_str, fg="red") + group_info = groups.get(i) + if group_info is None: + table.add_row(label, orig_names, "--", "--") + else: + table.add_row(label, orig_names, str(group_info[0]), group_info[1]) - -def _show_matches(groups, catmap, group_matches, cat_matches): - print( - "{:<39}||{:<39}".format( - "orig name: label → match", "grouping name: label → match" - ) - ) - print("-" * 80) - N_max = max(max(groups), max(catmap)) - for i in range(N_max): - orig_str = _get_match_str(catmap[i], i, cat_matches) if i in catmap else "n/a" - new_str = _get_match_str(groups[i], i, group_matches) if i in groups else "n/a" - print(f"{orig_str}||{new_str}") + return table @click.group(name="labelspace") @@ -104,7 +96,7 @@ def cli(): pass -@click.command() +@cli.command() @click.argument("labelspace", type=click.Path(exists=True)) @click.argument("grouping", type=click.Path(exists=True)) @click.option("-n", "--name-index", default=-1, type=int, help="index for name column") @@ -117,20 +109,5 @@ def compare(labelspace, grouping, name_index, label_index): groups = _load_groups(grouping_config_path) catmap = _load_catmap(labelspace_path, cat_index=label_index, name_index=name_index) - group_matches = {} - for index, group in groups.items(): - for label, name in catmap.items(): - if group == name: - group_matches[index] = label - break - - cat_matches = {} - for index, group in catmap.items(): - for label, name in groups.items(): - if group == name: - cat_matches[index] = label - break - - _show_labels(groups, catmap) - print("") - _show_matches(groups, catmap, group_matches, cat_matches) + console = rich.console.Console() + console.print(_make_label_table(groups, catmap)) diff --git a/semantic_inference/python/semantic_inference/models/__init__.py b/semantic_inference/python/semantic_inference/models/__init__.py index 4f6348b..b6874c5 100644 --- a/semantic_inference/python/semantic_inference/models/__init__.py +++ b/semantic_inference/python/semantic_inference/models/__init__.py @@ -29,6 +29,7 @@ # import torch +from semantic_inference.models.feature_visualizers import * from semantic_inference.models.mask_functions import * from semantic_inference.models.openset_segmenter import * from semantic_inference.models.patch_extractor import * diff --git a/semantic_inference/python/semantic_inference/models/feature_visualizers.py b/semantic_inference/python/semantic_inference/models/feature_visualizers.py new file mode 100644 index 0000000..71ba754 --- /dev/null +++ b/semantic_inference/python/semantic_inference/models/feature_visualizers.py @@ -0,0 +1,37 @@ +from dataclasses import dataclass, field + +import numpy as np +import spark_config as sc + +from semantic_inference.misc import Logger +from semantic_inference.models.openset_segmenter import Results + + +class ComponentVisualizer: + """Visualize features by three components.""" + + def __init__(self, config): + self.config = config + if len(self.config.components) != 3: + Logger.error(f"Invalid components specified: {self.config.components}!") + self.config.components = [0, 1, 2] + + self._indices = np.array(self.config.components) + + def call(self, results: Results) -> np.ndarray: + colors = results.features[:, self._indices].numpy() + colors = self.config.scale * (colors + self.config.offset) + colors = 255 * np.clip(colors, 0.0, 1.0) + colors = np.vstack(([0, 0, 0], colors)) + colors = colors.astype(np.uint8) + return colors[results.instances] + + +@sc.register_config("feature_visualizer", "component", ComponentVisualizer) +@dataclass +class ComponentVisualizerConfig(sc.Config): + """Configuration for component visualizer.""" + + components: list[int] = field(default_factory=lambda: [-1, -2, -3]) + offset: float = 0.5 + scale: float = 0.5 diff --git a/semantic_inference/python/semantic_inference/models/openset_segmenter.py b/semantic_inference/python/semantic_inference/models/openset_segmenter.py index 36b2223..eaa5daa 100644 --- a/semantic_inference/python/semantic_inference/models/openset_segmenter.py +++ b/semantic_inference/python/semantic_inference/models/openset_segmenter.py @@ -36,9 +36,9 @@ import numpy as np import torch import torch.nn.functional as F +from spark_config import Config, config_field from torch import nn -from semantic_inference.config import Config, config_field from semantic_inference.models.mask_functions import ConstantMask from semantic_inference.models.patch_extractor import ( PatchExtractor, diff --git a/semantic_inference/python/semantic_inference/models/patch_extractor.py b/semantic_inference/python/semantic_inference/models/patch_extractor.py index 2a6229e..bffb48e 100644 --- a/semantic_inference/python/semantic_inference/models/patch_extractor.py +++ b/semantic_inference/python/semantic_inference/models/patch_extractor.py @@ -35,10 +35,10 @@ import torch import torch.nn.functional as F import torchvision.ops +from spark_config import Config from torch import nn from torchvision.transforms import v2 -from semantic_inference.config import Config from semantic_inference.misc import Logger from semantic_inference.models.mask_functions import ConstantMask diff --git a/semantic_inference/python/semantic_inference/models/segment_refinement.py b/semantic_inference/python/semantic_inference/models/segment_refinement.py index 8b96599..3a6acc7 100644 --- a/semantic_inference/python/semantic_inference/models/segment_refinement.py +++ b/semantic_inference/python/semantic_inference/models/segment_refinement.py @@ -33,10 +33,9 @@ import torch import torchvision +from spark_config import Config from torch import nn -from semantic_inference.config import Config - @dataclass class SegmentRefinementConfig(Config): diff --git a/semantic_inference/python/semantic_inference/models/wrappers.py b/semantic_inference/python/semantic_inference/models/wrappers.py index bfb1202..2457ae1 100644 --- a/semantic_inference/python/semantic_inference/models/wrappers.py +++ b/semantic_inference/python/semantic_inference/models/wrappers.py @@ -35,9 +35,9 @@ import torch import torch.nn as nn import torchvision +from spark_config import Config, register_config from semantic_inference import root_path -from semantic_inference.config import Config, register_config def models_path(): diff --git a/semantic_inference/python/test/test_config.py b/semantic_inference/python/test/test_config.py deleted file mode 100644 index 4a22a5c..0000000 --- a/semantic_inference/python/test/test_config.py +++ /dev/null @@ -1,119 +0,0 @@ -# BSD 3-Clause License -# -# Copyright (c) 2021-2024, Massachusetts Institute of Technology. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, this -# list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from -# this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# -"""Test that configuration structs work as expected.""" - -from dataclasses import dataclass -from typing import Any - -import semantic_inference - - -@semantic_inference.register_config("test", name="foo") -@dataclass -class Foo(semantic_inference.Config): - """Test configuration struct.""" - - a: float = 5.0 - b: int = 2 - c: str = "hello" - - -@semantic_inference.register_config("test", name="bar") -@dataclass -class Bar(semantic_inference.Config): - """Test configuration struct.""" - - bar: str = "world" - d: int = 15 - - -@dataclass -class Parent(semantic_inference.Config): - """Test configuration struct.""" - - child: Any = semantic_inference.config_field("test", default="foo") - param: float = -1.0 - - -def test_dump(): - """Make sure dumping works as expected.""" - foo = Foo() - result = foo.dump() - expected = {"a": 5.0, "b": 2, "c": "hello"} - assert result == expected - - bar = Bar() - result = bar.dump() - expected = {"bar": "world", "d": 15} - assert result == expected - - parent = Parent() - result = parent.dump() - expected = {"child": {"type": "foo", "a": 5.0, "b": 2, "c": "hello"}, "param": -1.0} - assert result == expected - - -def test_update(): - """Test that update works as expected.""" - foo = Foo() - assert foo == Foo() - - # empty update does nothing - foo.update({}) - assert foo == Foo() - - foo.update({"a": 10.0}) - assert foo == Foo(a=10.0) - - foo.update({"b": 1, "c": "world"}) - assert foo == Foo(a=10.0, b=1, c="world") - - parent = Parent() - parent.update({"child": {"b": 1, "c": "world"}, "param": -2.0}) - assert parent == Parent(child=Foo(a=5.0, b=1, c="world"), param=-2.0) - - -def test_save_load(tmp_path): - """Test that saving and loading works.""" - filepath = tmp_path / "config.yaml" - parent = Parent(child=Bar(bar="hello", d="2.0"), param=-2.0) - parent.save(filepath) - result = semantic_inference.Config.load(Parent, filepath) - assert parent == result - - -def test_factory(): - """Test that the factory works.""" - registered = semantic_inference.ConfigFactory.registered() - assert len(registered) > 0 - assert "test" in registered - registered_names = [x[0] for x in registered["test"]] - assert "foo" in registered_names - assert "bar" in registered_names diff --git a/semantic_inference_ros/CMakeLists.txt b/semantic_inference_ros/CMakeLists.txt index d4c3ca4..e71d1b7 100644 --- a/semantic_inference_ros/CMakeLists.txt +++ b/semantic_inference_ros/CMakeLists.txt @@ -74,8 +74,8 @@ install( LIBRARY DESTINATION lib RUNTIME DESTINATION lib/${PROJECT_NAME} ) -install(PROGRAMS app/clip_publisher_node app/language_embedding_node app/make_rosbag - app/merge_rosbag app/open_set_node DESTINATION lib/${PROJECT_NAME} +install(PROGRAMS app/image_embedding_node app/open_set_node app/text_embedding_node + DESTINATION lib/${PROJECT_NAME} ) install(DIRECTORY include/${PROJECT_NAME}/ DESTINATION include/${PROJECT_NAME}/) install(DIRECTORY launch DESTINATION share/${PROJECT_NAME}) diff --git a/semantic_inference_ros/app/clip_publisher_node b/semantic_inference_ros/app/image_embedding_node similarity index 84% rename from semantic_inference_ros/app/clip_publisher_node rename to semantic_inference_ros/app/image_embedding_node index 647d855..b03bbbf 100755 --- a/semantic_inference_ros/app/clip_publisher_node +++ b/semantic_inference_ros/app/image_embedding_node @@ -4,18 +4,18 @@ from dataclasses import dataclass, field import rclpy +import spark_config as sc import torch from rclpy.node import Node import semantic_inference.models as models import semantic_inference_ros -from semantic_inference import Config from semantic_inference_msgs.msg import FeatureVectorStamped from semantic_inference_ros import Conversions, ImageWorkerConfig @dataclass -class ClipPublisherConfig(Config): +class ClipPublisherConfig(sc.Config): """Configuration for ClipPublisherNode.""" worker: ImageWorkerConfig = field(default_factory=ImageWorkerConfig) @@ -28,7 +28,10 @@ class ClipPublisherNode(Node): def __init__(self): """Start subscriber and publisher.""" super().__init__("clip_publisher_node") - config = semantic_inference_ros.load_from_ros(ClipPublisherConfig, "~") + config = sc.Config.loads( + ClipPublisherConfig, + self.declare_parameter("config", "").get_parameter_value().string_value, + ) self.get_logger().info(f"Initializing with {config}") self._device = models.default_device() @@ -38,9 +41,10 @@ class ClipPublisherNode(Node): self._transform = self._transform.to(self._device) self.get_logger().info("Finished initializing!") - self._pub = self.create_publisher(FeatureVectorStamped, "~/feature", 1) + self._embedder = semantic_inference_ros.PromptEncoder(self, self._model) + self._pub = self.create_publisher(FeatureVectorStamped, "semantic/feature", 1) self._worker = semantic_inference_ros.ImageWorker( - self, config.worker, "~/image", self._spin_once + self, config.worker, "color/image_raw", self._spin_once ) def _spin_once(self, header, img): diff --git a/semantic_inference_ros/app/language_embedding_node b/semantic_inference_ros/app/language_embedding_node deleted file mode 100755 index c453b83..0000000 --- a/semantic_inference_ros/app/language_embedding_node +++ /dev/null @@ -1,53 +0,0 @@ -#!/usr/bin/env python3 -"""Node to encode language prompts as embeddings.""" - -from dataclasses import dataclass -from typing import Any - -import rospy - -import semantic_inference_ros -from semantic_inference import Config, config_field -from semantic_inference.models import default_device - - -@dataclass -class LanguageEmbeddingNodeConfig(Config): - """Configuration for LanguageEmbeddingNode.""" - - model: Any = config_field("clip", default="clip") - use_cuda: bool = True - - -class LanguageEmbeddingNode: - """Node implementation.""" - - def __init__(self): - """Construct a feature encoder node.""" - self.config = semantic_inference_ros.load_from_ros( - LanguageEmbeddingNodeConfig, ns="~" - ) - - rospy.loginfo(f"'{rospy.get_name()}': Initializing with {self.config.show()}") - self._model = self.config.model.create().to( - default_device(self.config.use_cuda) - ) - rospy.loginfo(f"'{rospy.get_name()}': finished initializing!") - self._embedder = semantic_inference_ros.PromptEncoder(self._model) - - def spin(self): - """Wait until ROS shuts down.""" - rospy.spin() - - -def main(): - """Start the node.""" - rospy.init_node("language_embedding_node") - semantic_inference_ros.setup_ros_log_forwarding() - - node = LanguageEmbeddingNode() - node.spin() - - -if __name__ == "__main__": - main() diff --git a/semantic_inference_ros/app/make_rosbag b/semantic_inference_ros/app/make_rosbag deleted file mode 100755 index 3a77ef4..0000000 --- a/semantic_inference_ros/app/make_rosbag +++ /dev/null @@ -1,193 +0,0 @@ -#!/usr/bin/env python3 -"""Add openset image segmentation for all color images.""" - -import pathlib -import shutil -import tempfile -import time - -import click -import rosbag -import torch -import tqdm - -from semantic_inference import Config -from semantic_inference.models import ( - OpensetSegmenter, - OpensetSegmenterConfig, - default_device, -) -from semantic_inference_ros.ros_conversions import Conversions - - -def _split_topics(topic_str): - if ":" not in topic_str: - return topic_str, topic_str - - parts = topic_str.split(":") - return parts[0], parts[1] - - -def _get_type_map(bag): - info = bag.get_type_and_topic_info()[1] - topics = list(info.keys()) - types = [x[0] for x in info.values()] - return {k: v for k, v in zip(topics, types)} - - -def _get_clip_topic(topic): - new_topic = pathlib.Path(topic) - return str(new_topic.parent / "clip_vector") - - -def _show_topic_map(topic_map, include_clip_vector): - if len(topic_map) == 0: - return "{}" - - contents = "" - for t_in, t_out in topic_map.items(): - msg = f" - {t_in} (in) → " - offset = len(msg) - msg += f"{t_out} (out)" - if include_clip_vector: - msg += "\n" + " " * offset + f"{_get_clip_topic(t_out)} (clip)" - - contents += msg + "\n" - - return contents[:-1] - - -def _write_bag(model, path_in, topic_map, samples, add_clip_vector): - with tempfile.NamedTemporaryFile(suffix=".bag", delete=False) as fout: - path_out = fout.name - click.secho(f"Writing to {path_out}", fg="green") - with rosbag.Bag(str(path_in), "r") as bag, rosbag.Bag(path_out, "w") as bag_out: - topic_type_map = _get_type_map(bag) - - topics = [x for x in topic_map] - N = bag.get_message_count(topic_filters=topics) - - for topic, msg, t in tqdm.tqdm(bag.read_messages(topics=topics), total=N): - img = Conversions.to_image(msg, msg_type=topic_type_map.get(topic)) - - tic = time.perf_counter_ns() - with torch.no_grad(): - ret = model.segment(img, is_rgb_order=False).cpu() - - toc = time.perf_counter_ns() - sample_ns = toc - tic - samples.append(sample_ns) - - msg_out = Conversions.to_feature_image(msg.header, ret) - bag_out.write(topic_map[topic], msg_out, t) - if not add_clip_vector: - continue - - clip_topic = _get_clip_topic(topic_map[topic]) - clip_msg = Conversions.to_stamped_feature( - msg.header, ret.image_embedding - ) - bag_out.write(clip_topic, clip_msg, t) - - return path_out - - -def _copy_bag_contents(path_in, path_out): - click.secho(f"Writing messages to '{path_out}' from '{path_in}'", fg="green") - path_in = str(path_in) - path_out = str(path_out) - with rosbag.Bag(path_in, "r") as bag, rosbag.Bag(path_out, "a") as bag_out: - N = bag.get_message_count() - for topic, msg, t in tqdm.tqdm(bag.read_messages(), total=N): - bag_out.write(topic, msg, t) - - -def _load_model(config_path): - if config_path is not None: - config_path = pathlib.Path(config_path).expanduser().absolute() - - if config_path is None or not config_path.exists(): - click.secho("No config provided, using default!", fg="yellow") - config = OpensetSegmenterConfig() - else: - config = Config.load(OpensetSegmenterConfig, config_path) - click.secho(f"Using segmenter config from '{config_path}'", fg="green") - - click.secho("Initializing segmenter...", fg="green") - model = OpensetSegmenter(config).to(default_device()) - model.eval() - click.secho("Finished initializing segmenter!", fg="green") - return model - - -@click.command() -@click.argument("input_path", type=click.Path(exists=True)) -@click.argument("topics", type=str, nargs=-1) -@click.option("--output", "-o", default=None, type=click.Path()) -@click.option("--copy/--no-copy", default=False, help="copy bag to output") -@click.option("--clip-vec/--no-clip-vec", default=True, help="add image clip vector") -@click.option( - "--config-path", "-c", default=None, type=click.Path(), help="segmentation config" -) -@click.option("--yes", "-y", is_flag=True, help="skip prompts") -@click.option( - "--timing-log-path", - "-t", - default=None, - type=click.Path(), - help="timing log path save", -) -def main(input_path, topics, output, copy, clip_vec, config_path, yes, timing_log_path): - """ - Parse a rosbag and compute openset segmentation for each image. - - Args: - input_path: Input rosbag to read RGB images from - topics: RGB Image topics to read - - Usage: - make_rosbag /path/to/bag [INPUT_TOPIC[:TOPIC_REMAP]...] - """ - input_path = pathlib.Path(input_path).expanduser().absolute() - if output is not None: - output = pathlib.Path(output).expanduser().absolute() - if output.exists() and output != input_path: - click.secho(f"Output bag '{output}' already exists!", fg="red") - if not yes: - click.confirm("Overwrite?", abort=True, default=False) - - overwriting = output is None or output == input_path - if overwriting: - click.secho(f"Writing to input bag '{input_path}'!", fg="yellow") - if not yes: - click.confirm("Proceed?", abort=True, default=False) - - click.secho(f"Reading from '{input_path}'", fg="green") - click.secho("Topics:", fg="green") - topics = [_split_topics(x) for x in topics] - topic_map = {t_in: t_out for t_in, t_out in topics} - click.secho(_show_topic_map(topic_map, clip_vec), fg="green") - - model = _load_model(config_path) - - timing_samples = [] - temp_path = _write_bag(model, input_path, topic_map, timing_samples, clip_vec) - - if overwriting: - _copy_bag_contents(temp_path, input_path) - else: - if copy: - click.secho(f"Copying input '{input_path}' → '{output}'", fg="green") - shutil.copy2(input_path, output) - _copy_bag_contents(temp_path, output) - else: - shutil.move(temp_path, output) - - if timing_log_path is not None: - with open(timing_log_path, "w") as f: - for sample in timing_samples: - f.write(f"{sample}\n") - - -if __name__ == "__main__": - main() diff --git a/semantic_inference_ros/app/merge_rosbag b/semantic_inference_ros/app/merge_rosbag deleted file mode 100755 index e26e6dd..0000000 --- a/semantic_inference_ros/app/merge_rosbag +++ /dev/null @@ -1,71 +0,0 @@ -#!/usr/bin/env python3 -"""Add the contents of one bag to another.""" - -import pathlib - -import click -import rosbag -import tqdm - - -@click.command() -@click.argument("from-bag", type=click.Path(exists=True)) -@click.argument("to-bag", type=click.Path(exists=True)) -@click.argument("topics", nargs=-1, type=str) -@click.option( - "-d", - "--dry-run", - is_flag=True, - default=False, - help="show information without running", -) -def main(from_bag, to_bag, topics, dry_run): - """ - Copy messages from FROM_BAG to TO_BAG. - - \b - FROM_BAG: Path to bag to copy from - TO_BAG: Path to bag to copy to - TOPICS: Topics to copy from FROM_BAG to TO_BAG. Defaults to all topics - - Note that topics can be specified with an optional remapping (from_name[:to_name]) - """ - from_bag = str(pathlib.Path(from_bag).expanduser().absolute()) - to_bag = str(pathlib.Path(to_bag).expanduser().absolute()) - name_pairs = [ - (x, x) if ":" not in x else (x.split(":")[0], x.split(":")[1]) for x in topics - ] - topic_map = {x[0]: x[1] for x in name_pairs} - - click.secho(f"FROM_BAG: {from_bag}", fg="green") - click.secho(f"TO_BAG: {to_bag}", fg="green") - if len(topics) == 0: - click.secho("Copying all message", fg="green") - else: - click.secho("Copying the following topics:", fg="green") - for topic, name in topic_map.items(): - if topic == name: - click.secho(f" - {topic} (no change)", fg="green") - else: - click.secho(f" - {topic} → {name}", fg="green") - - if dry_run: - return - - click.echo("Opening bags...") - - with rosbag.Bag(from_bag, "r") as in_bag, rosbag.Bag(to_bag, "a") as out_bag: - click.echo("Starting copy") - - topic_filter = None if len(topics) == 0 else [x for x in topic_map] - bag_iter = in_bag.read_messages(topics=topic_filter) - num_messsages = in_bag.get_message_count(topic_filters=topic_filter) - progress = tqdm.tqdm(bag_iter, total=num_messsages, desc="Copying messages") - - for topic, msg, t in progress: - new_topic = topic_map.get(topic, topic) - out_bag.write(new_topic, msg, t) - - -if __name__ == "__main__": - main() diff --git a/semantic_inference_ros/app/open_set_node b/semantic_inference_ros/app/open_set_node index 98ed6c5..263615d 100755 --- a/semantic_inference_ros/app/open_set_node +++ b/semantic_inference_ros/app/open_set_node @@ -1,27 +1,33 @@ #!/usr/bin/env python3 """Node that runs openset segmentation.""" +import pathlib from dataclasses import dataclass, field +from typing import Any import rclpy +import spark_config as sc import torch from rclpy.node import Node +from sensor_msgs.msg import Image import semantic_inference.models as models import semantic_inference_ros -from semantic_inference import Config from semantic_inference_msgs.msg import FeatureImage, FeatureVectorStamped from semantic_inference_ros import Conversions, ImageWorkerConfig @dataclass -class OpenSetNodeConfig(Config): +class OpenSetNodeConfig(sc.Config): """Configuration for ClipPublisherNode.""" worker: ImageWorkerConfig = field(default_factory=ImageWorkerConfig) model: models.OpensetSegmenterConfig = field( default_factory=models.OpensetSegmenterConfig ) + visualizer: Any = sc.config_field( + "feature_visualizer", default="component", required=False + ) class OpenSetNode(Node): @@ -30,7 +36,15 @@ class OpenSetNode(Node): def __init__(self): """Start subscriber and publisher.""" super().__init__("open_set_node") - self.config = semantic_inference_ros.load_from_ros(OpenSetNodeConfig, ns="~") + config_path = ( + self.declare_parameter("config_path", "").get_parameter_value().string_value + ) + config_path = pathlib.Path(config_path).expanduser().absolute() + if not config_path.exists() and config_path != "": + self.get_logger().warn(f"config path '{config_path}' does not exist!") + self.config = OpenSetNodeConfig() + else: + self.config = sc.Config.load(OpenSetNodeConfig, config_path) self.get_logger().info(f"Initializing with {self.config.show()}") device = models.default_device() @@ -38,14 +52,19 @@ class OpenSetNode(Node): self._model.eval() self.get_logger().info("Finished initializing!") - self._pub = self.create_publisher(FeatureImage, "~/semantic/image_raw", 1) + self._pub = self.create_publisher(FeatureImage, "semantic/image_raw", 1) self._clip_pub = self.create_publisher( - FeatureVectorStamped, "~/semantic/feature", 1 + FeatureVectorStamped, "semantic/feature", 1 ) self._worker = semantic_inference_ros.ImageWorker( - self, self.config.worker, "~/color/image_raw", self._spin_once + self, self.config.worker, "color/image_raw", self._spin_once ) self._embedder = semantic_inference_ros.PromptEncoder(self, self._model.encoder) + self._visualizer = self.config.visualizer.create() + if self._visualizer is not None: + self._color_pub = self.create_publisher( + Image, "semantic_color/image_raw", 1 + ) def _spin_once(self, header, img): with torch.no_grad(): @@ -57,6 +76,12 @@ class OpenSetNode(Node): Conversions.to_stamped_feature(header, ret.image_embedding) ) + if self._visualizer is not None: + color_img = self._visualizer.call(ret) + self._color_pub.publish( + Conversions.to_image_msg(header, color_img, encoding="rgb8") + ) + def stop(self): """Stop the underlying image worker.""" self._worker.stop() diff --git a/semantic_inference_ros/app/text_embedding_node b/semantic_inference_ros/app/text_embedding_node new file mode 100755 index 0000000..84403aa --- /dev/null +++ b/semantic_inference_ros/app/text_embedding_node @@ -0,0 +1,55 @@ +#!/usr/bin/env python3 +"""Node to encode language prompts as embeddings.""" + +from dataclasses import dataclass +from typing import Any + +import rclpy +import spark_config as sc +from rclpy.node import Node + +import semantic_inference_ros +from semantic_inference.models import default_device + + +@dataclass +class TextEmbeddingNodeConfig(sc.Config): + """Configuration for TextEmbeddingNode.""" + + model: Any = sc.config_field("clip", default="clip") + use_cuda: bool = True + + +class TextEmbeddingNode(Node): + """Node implementation.""" + + def __init__(self): + """Construct a feature encoder node.""" + super().__init__("text_embedding_node") + self.config = sc.Config.loads( + TextEmbeddingNodeConfig, + self.declare_parameter("config", "").get_parameter_value().string_value, + ) + + self.get_logger().info(f"Initializing with {self.config.show()}") + self._model = self.config.model.create().to( + default_device(self.config.use_cuda) + ) + self.get_logger().info("finished initializing!") + self._embedder = semantic_inference_ros.PromptEncoder(self, self._model) + + +def main(): + """Start the node.""" + rclpy.init() + + try: + node = TextEmbeddingNode() + semantic_inference_ros.setup_ros_log_forwarding(node) + rclpy.spin(node) + finally: + rclpy.try_shutdown() + + +if __name__ == "__main__": + main() diff --git a/semantic_inference_ros/config/openset_segmentation.yaml b/semantic_inference_ros/config/openset/fastsam-clip_vit14l.yaml similarity index 100% rename from semantic_inference_ros/config/openset_segmentation.yaml rename to semantic_inference_ros/config/openset/fastsam-clip_vit14l.yaml diff --git a/semantic_inference_ros/include/semantic_inference_ros/segmentation_nodelet.h b/semantic_inference_ros/include/semantic_inference_ros/segmentation_nodelet.h deleted file mode 100644 index 0c9e7eb..0000000 --- a/semantic_inference_ros/include/semantic_inference_ros/segmentation_nodelet.h +++ /dev/null @@ -1,84 +0,0 @@ -/* ----------------------------------------------------------------------------- - * BSD 3-Clause License - * - * Copyright (c) 2021-2024, Massachusetts Institute of Technology. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright notice, this - * list of conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. - * - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived from - * this software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" - * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE - * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR - * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER - * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, - * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - * * -------------------------------------------------------------------------- */ - -#pragma once - -#include -#include -#include -#include - -#include -#include -#include -#include - -#include -#include -#include - -#include "semantic_inference_ros/output_publisher.h" -#include "semantic_inference_ros/ros_log_sink.h" -#include "semantic_inference_ros/worker.h" - -namespace semantic_inference { - -class SegmentationNode : public rclcpp::Node { - public: - using ImageWorker = Worker; - - struct Config { - Segmenter::Config segmenter; - OutputPublisher::Config output; - WorkerConfig worker; - ImageRotator::Config image_rotator; - }; - - explicit SegmentationNode(const rclcpp::NodeOptions& options); - - virtual ~SegmentationNode(); - - void start(); - - private: - void runSegmentation(const sensor_msgs::msg::Image::ConstSharedPtr& msg); - - Config config_; - std::unique_ptr segmenter_; - ImageRotator image_rotator_; - std::unique_ptr worker_; - - std::unique_ptr transport_; - std::unique_ptr output_pub_; - image_transport::Subscriber sub_; -}; - -} // namespace semantic_inference diff --git a/semantic_inference_ros/launch/clip_publisher.launch.yaml b/semantic_inference_ros/launch/clip_publisher.launch.yaml deleted file mode 100644 index a6ea96b..0000000 --- a/semantic_inference_ros/launch/clip_publisher.launch.yaml +++ /dev/null @@ -1,11 +0,0 @@ ---- -launch: - - arg: {name: model_name, default: ViT-L/14, description: Model to use for segmentation} - - arg: {name: min_period_s, default: '0.0', description: Minimum time between inputs} - - node: - pkg: semantic_inference_ros - exec: clip_publisher_node - name: clip_publisher_node - param: - - {name: model/model_name, value: $(var model_name)} - - {name: worker/min_separation_s, value: $(var min_period_s)} diff --git a/semantic_inference_ros/launch/image_embedding_node.launch.yaml b/semantic_inference_ros/launch/image_embedding_node.launch.yaml new file mode 100644 index 0000000..4b0ce44 --- /dev/null +++ b/semantic_inference_ros/launch/image_embedding_node.launch.yaml @@ -0,0 +1,24 @@ +--- +launch: + - arg: {name: model_name, default: ViT-L/14, description: Model to use for segmentation} + - arg: {name: min_period_s, default: '0.0', description: Minimum time between inputs} + - arg: {name: compressed_rgb, default: 'false', description: Triggers decompression for RGB stream} + - node: + if: $(var compressed_rgb) + pkg: image_transport + exec: republish + name: decompress_rgb + param: + - {name: in_transport, value: compressed} + - {name: out_transport, value: raw} + remap: + - {from: in/compressed, to: color/image_raw/compressed} + - {from: out, to: color/image_raw} + - node: + pkg: semantic_inference_ros + exec: image_embedding_node + name: image_embedding_node + param: + - name: config + value: '{model: {model_name: $(var model_name)}, worker: {min_separation_s: $(var min_period_s)}}' + type: str diff --git a/semantic_inference_ros/launch/open_set.launch.yaml b/semantic_inference_ros/launch/open_set.launch.yaml index 84601b9..4e0fd9b 100644 --- a/semantic_inference_ros/launch/open_set.launch.yaml +++ b/semantic_inference_ros/launch/open_set.launch.yaml @@ -1,19 +1,22 @@ --- launch: - - arg: {name: config_path, default: $(find-pkg-share semantic_inference_ros)/config/openset_segmentation.yaml, description: Configuration file for object detector} - - arg: {name: min_period_s, default: '0.0', description: Minimum time between inputs} + - arg: {name: config_path, default: $(find-pkg-share semantic_inference_ros)/config/openset/fastsam-clip_vit14l.yaml, description: Configuration file for object detector} - arg: {name: compressed_rgb, default: 'false', description: Triggers decompression for RGB stream} - node: if: $(var compressed_rgb) pkg: image_transport exec: republish name: decompress_rgb - args: compressed in:=semantic_inference/color/image_raw raw out:=semantic_inference/color/image_raw - - configured_node: + param: + - {name: in_transport, value: compressed} + - {name: out_transport, value: raw} + remap: + - {from: in/compressed, to: color/image_raw/compressed} + - {from: out, to: color/image_raw} + - node: pkg: semantic_inference_ros exec: open_set_node name: semantic_inference on_exit: shutdown - config: - - {from: $(var config_path)} - - {name: worker/min_separation_s, value: $(var min_period_s)} + param: + - {name: config_path, value: $(var config_path), type: str} diff --git a/semantic_inference_ros/launch/text_embedding_node.launch.yaml b/semantic_inference_ros/launch/text_embedding_node.launch.yaml new file mode 100644 index 0000000..ade2d22 --- /dev/null +++ b/semantic_inference_ros/launch/text_embedding_node.launch.yaml @@ -0,0 +1,12 @@ +--- +launch: + - arg: {name: model_name, default: ViT-L/14, description: Language encoder to use} + - arg: {name: use_cuda, default: 'true', description: use GPU} + - node: + pkg: semantic_inference_ros + exec: text_embedding_node + name: text_embedding_node + param: + - name: config + value: '{model: {model_name: $(var model_name)}, use_cuda: $(var use_cuda)}' + type: str diff --git a/semantic_inference_ros/scripts/make_legacy_rosbag.py b/semantic_inference_ros/scripts/make_legacy_rosbag.py deleted file mode 100644 index f83c7fe..0000000 --- a/semantic_inference_ros/scripts/make_legacy_rosbag.py +++ /dev/null @@ -1,206 +0,0 @@ -#!/usr/bin/env python3 -# BSD 3-Clause License -# -# Copyright (c) 2021-2024, Massachusetts Institute of Technology. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, this -# list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from -# this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# -"""Script to make a new rosbag with semantics for a given color image topic.""" - -import pathlib - -import click -import cv2 -import numpy as np -import rosbag -import tqdm -import yaml -import zmq -from cv_bridge import CvBridge - - -def _get_config_dir(): - path_to_script = pathlib.Path(__file__).absolute() - return path_to_script.parent.parent / "config" - - -def _to_color(rgb_float_values): - return (255 * np.array([float(x) for x in rgb_float_values])).astype(np.uint8) - - -def _make_color_map(config_path, colors): - with (config_path / "colors" / f"{colors}.yaml").open("r") as fin: - contents = yaml.load(fin.read(), Loader=yaml.SafeLoader) - - class_colors = {} - class_labels = {} - all_labels = [] - default_color = ( - _to_color(contents["default_color"]) - if "default_color" in contents - else np.zeros(3, dtype=np.uint8) - ) - for key in contents: - parts = key.split("/") - if len(parts) < 3: - continue - - if parts[2] == "color": - color = _to_color(contents[key]) - class_colors[int(parts[1])] = color - continue - - if parts[2] == "labels": - for x in contents[key]: - class_labels[int(x)] = int(parts[1]) - all_labels.append(int(x)) - continue - - all_labels = sorted(all_labels) - N = max(all_labels) - table = np.zeros((N + 1, 3), dtype=np.uint8) - for i in range(0, N + 1): - if i in class_labels: - table[i, :] = class_colors[class_labels[i]] - else: - table[i, :] = default_color - - return table.T - - -def _apply_color(labels, colormap): - semantic_img = np.take(colormap, labels, mode="raise", axis=1) - return np.transpose(semantic_img, [1, 2, 0]) - - -def _run_inference(socket, image): - socket.send_pyobj(image) - return socket.recv_pyobj() - - -@click.command() -@click.argument("path_to_bag", type=click.Path(exists=True)) -@click.argument("topic") -@click.option("--labels-topic", default="/semantic_color/labels/image_raw") -@click.option("--semantics-topic", default="/semantic_color/semantics/image_raw") -@click.option("--colors", default="ade20k_mp3d") -@click.option("--compression", default="bz2") -@click.option("-p", "--port", default=5555, type=int) -@click.option("-u", "--url", default="127.0.0.1", type=str) -@click.option("-n", "--total", default=None, type=int) -@click.option("--is-compressed", is_flag=True, default=False) -@click.option("--write-colors", is_flag=True, default=False) -@click.option("-w", "--write-every-n", default=0, type=int) -@click.option("-e", "--infer-every-n", default=0, type=int) -@click.option("-a", "--overlay-alpha", default=0.5, type=float) -@click.option("-y", "--force-overwrite", is_flag=True, default=False) -@click.option("-o", "--output", default=None, type=str) -def main( - path_to_bag, - topic, - labels_topic, - semantics_topic, - colors, - compression, - port, - url, - total, - is_compressed, - write_colors, - write_every_n, - infer_every_n, - overlay_alpha, - force_overwrite, - output, -): - """Run everything.""" - path_to_bag = pathlib.Path(path_to_bag).expanduser().absolute() - config_path = _get_config_dir() - colormap = _make_color_map(config_path, colors) - bridge = CvBridge() - - context = zmq.Context() - socket = context.socket(zmq.REQ) - socket.connect(f"tcp://{url}:{port}") - - if output is None: - new_path = path_to_bag.parent / f"{path_to_bag.stem}_semantics.bag" - else: - new_path = pathlib.Path(output).expanduser().absolute() - - if new_path.exists() and not force_overwrite: - click.confirm(f"output path {new_path} exists! overwrite: ", abort=True) - - bag_out = rosbag.Bag(str(new_path), "w", compression=compression) - num_written = 0 - num_read = 0 - with rosbag.Bag(str(path_to_bag), "r") as bag: - N_msgs = bag.get_message_count(topic) - for _, msg, t in tqdm.tqdm(bag.read_messages(topics=[topic]), total=N_msgs): - if is_compressed: - img = bridge.compressed_imgmsg_to_cv2( - msg, desired_encoding="passthrough" - ) - else: - img = bridge.imgmsg_to_cv2(msg, desired_encoding="passthrough") - - num_read += 1 - if infer_every_n != 0 and (num_read - 1) % infer_every_n != 0: - continue - - labels = _run_inference(socket, img) - labels = labels.astype(np.uint16) - labels_msg = bridge.cv2_to_imgmsg(labels, encoding="mono16") - labels_msg.header = msg.header - bag_out.write(labels_topic, labels_msg, t) - - semantics = _apply_color(labels, colormap) - - if write_colors: - semantics_msg = bridge.cv2_to_imgmsg(semantics, encoding="rgb8") - semantics_msg.header = msg.header - bag_out.write(semantics_topic, semantics_msg, t) - - num_written += 1 - if total and num_written >= total: - break - - if write_every_n != 0 and num_written % write_every_n == 0: - rgb_img = img[:, :, ::-1] - cv2.imwrite(f"rgb_{num_written:06d}.png", img) - cv2.imwrite(f"semantics_{num_written:06d}.png", semantics[:, :, ::-1]) - a = overlay_alpha - a_inv = 1 - overlay_alpha - overlay = (a_inv * rgb_img + a * semantics).astype(np.uint8) - cv2.imwrite( - f"semantics_overlay_{num_written:06d}.png", overlay[:, :, ::-1] - ) - - bag_out.close() - - -if __name__ == "__main__": - main() diff --git a/semantic_inference_ros/semantic_inference_ros/__init__.py b/semantic_inference_ros/semantic_inference_ros/__init__.py index e89d1ef..702d18e 100644 --- a/semantic_inference_ros/semantic_inference_ros/__init__.py +++ b/semantic_inference_ros/semantic_inference_ros/__init__.py @@ -30,7 +30,6 @@ """Useful ROS utility functions.""" from semantic_inference_ros.image_worker import ImageWorkerConfig, ImageWorker # NOQA -from semantic_inference_ros.ros_config import load_from_ros # NOQA from semantic_inference_ros.ros_conversions import Conversions # NOQA from semantic_inference_ros.ros_logging import setup_ros_log_forwarding # NOQA from semantic_inference_ros.prompt_encoder import PromptEncoder # NOQA diff --git a/semantic_inference_ros/semantic_inference_ros/image_worker.py b/semantic_inference_ros/semantic_inference_ros/image_worker.py index 16603ac..af53316 100644 --- a/semantic_inference_ros/semantic_inference_ros/image_worker.py +++ b/semantic_inference_ros/semantic_inference_ros/image_worker.py @@ -36,8 +36,8 @@ import rclpy import sensor_msgs.msg +from spark_config import Config -from semantic_inference import Config from semantic_inference_ros.ros_conversions import Conversions diff --git a/semantic_inference_ros/semantic_inference_ros/ros_config.py b/semantic_inference_ros/semantic_inference_ros/ros_config.py deleted file mode 100644 index 25b197a..0000000 --- a/semantic_inference_ros/semantic_inference_ros/ros_config.py +++ /dev/null @@ -1,87 +0,0 @@ -# BSD 3-Clause License -# -# Copyright (c) 2021-2024, Massachusetts Institute of Technology. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are met: -# -# 1. Redistributions of source code must retain the above copyright notice, this -# list of conditions and the following disclaimer. -# -# 2. Redistributions in binary form must reproduce the above copyright notice, -# this list of conditions and the following disclaimer in the documentation -# and/or other materials provided with the distribution. -# -# 3. Neither the name of the copyright holder nor the names of its -# contributors may be used to endorse or promote products derived from -# this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -# -"""Module containing ros config parsing infrastructure.""" - -import pathlib - -from semantic_inference import Config - - -# TODO(nathan) test this to make sure list and dictionary behavior is correct -def load_ros_params(ns=""): - """Load all params.""" - - def _relative_to(ns, name): - if ns == "": - return name - - return str(pathlib.Path(name).relative_to(pathlib.Path(ns))) - - def _insert_nested(params, name, value): - if len(name) and name[0] == "/": - name = name[1:] - - if "/" not in name: - if name not in params: - params[name] = value - elif isinstance(value, list): - params[name] += value - elif isinstance(value, dict): - params[name].update(value) - else: - params[name] = value - - return - - parts = name.split("/") - curr, rest = parts[0], parts[1:] - if curr not in params: - params[curr] = {} - - _insert_nested(params[curr], "/".join(rest), value) - - # resolved = "" if ns == "" else rospy.resolve_name(ns) - # param_names = rospy.get_param_names() - params = {} - # for name in param_names: - # if resolved == "" or name.find(resolved) == 0: - # _insert_nested(params, _relative_to(resolved, name), rospy.get_param(name)) - - return params - - -def load_from_ros(cls, ns=""): - """Populate a config from ros params.""" - assert issubclass(cls, Config), f"{cls} is not a config!" - - instance = cls() - # params = load_ros_params(ns) - # instance.update(params) - return instance diff --git a/semantic_inference_ros/semantic_inference_ros/ros_conversions.py b/semantic_inference_ros/semantic_inference_ros/ros_conversions.py index bd34d49..78199ce 100644 --- a/semantic_inference_ros/semantic_inference_ros/ros_conversions.py +++ b/semantic_inference_ros/semantic_inference_ros/ros_conversions.py @@ -71,6 +71,12 @@ def to_stamped_feature(header, feature): msg.feature = Conversions.to_feature(feature) return msg + @classmethod + def to_image_msg(cls, header, img, encoding="passthrough"): + msg = cls.bridge.cv2_to_imgmsg(img, encoding=encoding) + msg.header = header + return msg + @classmethod def to_feature_image(cls, header, results): """ diff --git a/semantic_inference_ros/semantic_inference_ros/ros_logging.py b/semantic_inference_ros/semantic_inference_ros/ros_logging.py index f36f81a..245e13c 100644 --- a/semantic_inference_ros/semantic_inference_ros/ros_logging.py +++ b/semantic_inference_ros/semantic_inference_ros/ros_logging.py @@ -51,8 +51,8 @@ def __init__(self, node, **kwargs): def emit(self, record): """Send message to ROS.""" - level = record.levelno if record.levelno in self.level_map else logging.CRITICAL - self._level_map[level](f"{record.name}: {record.msg}") + lno = record.levelno if record.levelno in self._level_map else logging.CRITICAL + self._level_map[lno](f"{record.name}: {record.msg}") def setup_ros_log_forwarding(node, level=logging.INFO):