Paper: LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
LEO-RobotAgent is a general-purpose robotic intelligent agent framework based on Large Language Models (LLMs). Under this framework, LLMs can operate different types of robots in various scenarios to complete unpredictable complex tasks, demonstrating high generalizability and robustness.
The LLM-based general-purpose robotic agent framework, LEO-RobotAgent, is shown in the figure above. The large model can autonomously think, plan, and act within this clear framework. We provide a modular and easily registrable collection of tools, enabling the LLM to flexibly invoke various tools according to different needs. At the same time, the framework provides a human-computer interaction mechanism, allowing the algorithm to collaborate with humans like a partner.
The LLM relies on preset prompts and user tasks to output information, actions, and action parameters. The tool collection can cover various domains based on actual situations, requiring basic information such as enable status, tool name, corresponding function, and tool description. Observations provide varied feedback content depending on the tool. During the loop, the History is continuously accumulated for subsequent operations by the LLM.
The figure above shows an application system designed around LEO-RobotAgent. We built this complete system based on ROS and Web technologies. Users can directly operate the visual interface to configure existing tools, converse and interact with the Agent, monitor topics, etc. The system is easy to extend and get started with in terms of tool registration and node management.
The demonstration video above presents four sets of experiments, namely basic features verification, real UAV experiment, UAV urban searching experiment, and long-horizon task experiment with the wheeled robot equipped with the robotic arm.
The simulation and corresponding real-world experiments are shown above. An example of the Agent's operation process and output during a task can be found in this file.
Our framework has been verified for feasibility on UAVs, custom-made wheeled mobile robots (with robotic arms), and mechanical dogs. The project contains their relevant ready-made control nodes.
The maps provided in src/agent/world are based on gazebo_models_worlds_collection.
- Development Environment: Ubuntu 20.04 + ROS Noetic. The core framework works in other environments, but robots may need self-adaptation. The following installation steps may omit some detailed libraries; they are for reference only. Supplements are welcome.
- First, download this repository to your workspace.
- Install Python dependencies (Python 3.8 confirmed to work):
pip install -r requirements.txt- Install web_video_server and rosbridge:
sudo apt install ros-noetic-rosbridge-suite ros-noetic-web-video-server-
Install gazebo_models_worlds_collection.
-
Configure LLM API:
echo 'export OPENAI_API_KEY="your key"' >> ~/.bashrc
echo 'export OPENAI_BASE_URL="your url"' >> ~/.bashrc
source ~/.bashrcNote: This project uses the Qwen3 series models, including Qwen-VL, so adaptation is only ensured for these models. If there are conflicts in the LLM output format, you can modify it in src/agent/src/api_agent.py.
Next are the dependencies required for the corresponding robots. Below are the robots already adapted to LEO-RobotAgent. Configure as needed. Remember to catkin_make and source at the end.
# Download
git clone https://github.com/PX4/PX4-Autopilot.git --recursive
# Finish remaining downloads
cd PX4-Autopilot/
git submodule update --init --recursive
# Execute script
cd ..
bash ./PX4-Autopilot/Tools/setup/ubuntu.sh
# If errors occur, execute:
bash ./PX4-Autopilot/Tools/setup/ubuntu.sh --fix-missing
# Environment variables: Use nano or gedit to enter bashrc and add to the end
sudo gedit ~/.bashrc
source ~/PX4-Autopilot/Tools/simulation/gazebo-classic/setup_gazebo.bash ~/PX4-Autopilot ~/PX4-Autopilot/build/px4_sitl_default
export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:~/PX4-Autopilot
export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:~/PX4-Autopilot/Tools/simulation/gazebo-classic/sitl_gazebo-classic
# Finally
source ~/.bashrc- Mavros
sudo apt-get install ros-noetic-mavros ros-noetic-mavros-extras- MoveIt
sudo apt install ros-noetic-moveit
sudo apt install ros-noetic-moveit-setup-assistant- Plugins:
sudo apt install ros-noetic-gazebo-ros-pkgs ros-noetic-gazebo-plugins
sudo apt install ros-noetic-ros-control ros-noetic-ros-controllers
# Gripper plugin, can be installed in your workspace or globally
cd ~/catkin_ws/src
git clone https://github.com/JenniferBuehler/gazebo-pkgs.git
cd ..
rosdep install --from-paths src --ignore-src -r -y
catkin_makeUnitree GO1 (others are also fine), install the following content yourself: unitree_guide, unitree_ros, unitree_ros_to_real.
Then execute the following to adapt the topics to the LEO-RobotAgent framework:
cp src/agent/utils/KeyBoard.cpp src/unitree_guide/unitree_guide/src/interface/KeyBoard.cppOur application system is Web-based. The interface is shown above. The System panel in the top left can start various preset terminal commands (including but not limited to roslaunch, rosrun). You can also add your own. LEO-RobotAgent is our core architecture; all buttons will actually open a terminal (closing the corresponding terminal will shut down the node), facilitating debugging output. The Camera panel allows switching and viewing Image format topics.
The Tools panel allows you to set the tools available to the Agent. You can check tasks to enable them, or double-click to change the configuration like in Excel. You can also add new tools via the bottom button (not visible in the image). Any changes must be saved by clicking Save. The LEO-RobotAgent node must be restarted for changes to take effect.
On the right is the Chat Interface. Entering commands can issue tasks. You can also input during task execution to interrupt, temporarily modify tasks, or point out errors. After the task for the current stage is completed, the Agent will output the final answer in a green bubble. You can continue to issue tasks afterwards (memory is retained). Blue bubbles indicate tool calling Actions, and yellow indicates Observation results.
Preset questions, tool configurations, and preset terminal commands are saved under src/agent/config. They are automatically loaded every time the web interface is opened. You can perform more detailed additions, deletions, and modifications there, or check which program file is running and develop it yourself.
-
First, start the server:
python3 src/agent/webui/server.py. -
Next, open in a browser:
src/agent/webui/web_ui.html. Then start RosBridge and VideoServer (if you want camera feed) in the System panel. -
Then, depending on the robot:
-
UAV:
- Configure and save needed tools in the Tools panel (
uav_flyis necessary). - Sequentially start via buttons: QGC, UAV sim, UAV fly (wait for gazebo to load fully), Vision, LEO-RobotAgent.
- Configure and save needed tools in the Tools panel (
-
Wheeled Robot with Arm:
- Configure and save needed tools in the Tools panel (
car_run,arm_graspare necessary). - Sequentially start via buttons: Car sim, Car ctrl (wait for gazebo to load fully), Arm ctrl, Vision, LEO-RobotAgent.
- Configure and save needed tools in the Tools panel (
-
Mechanical Dog:
- Configure and save needed tools in the Tools panel (
dog_runis necessary). - Sequentially start via buttons: Dog sim, Dog joint (wait for gazebo to load fully), Dog ctrl, Vision, LEO-RobotAgent.
- Configure and save needed tools in the Tools panel (
-
-
Finally, input commands in the chat interface to issue tasks for automatic execution.
- The Vision node provides VLM and object detection as visual tools. The implementation method of VLM can be rewritten in
vision.pyaccording to different model interface implementations; object detection usesyolov8l-worldv2. You can choose and download models from Ultralytics and place them insrc/agent/weights. - You can fill in
uav,car, ordoginsrc/agent/config/vision_device.txtto adapt to cameras and other topics.
If you want to develop new tools based on this project, here is an example of creating the simplest tool.
- First, define a new function
addunderAgentToolsinsrc/agent/src/tools.py:
def add(self, nums):
return nums['a'] + nums['b']- Then add a tool in the Web Tools panel, fill in the corresponding content, check and save it, for example:
Name: add, Function: add, Description: Input a dictionary with a, b. Return: the result of a + b.
- It is now ready for use. From this, you can also implement complex algorithms in your own project and then register them into
tools.pyby setting ROS topics as interfaces.
src/agent/src/api_agent.py contains the core code of this framework. The prompts within can be modified according to your own tasks. Tools and Vision also use LLM and VLM, which can be modified independently.
- You can still manually open multiple terminals to run these commands.
- Vision Node
source ./devel/setup.bash && rosrun agent vision.py- Agent Node
source ./devel/setup.bash && rosrun agent api_agent.py- QGC Ground Station (in home directory)
./QGroundControl.AppImage- Mavros + PX4 launch file
roslaunch px4 mavros_posix_sitl.launch
# Choose your own world
roslaunch px4 mavros_posix_sitl.launch world:=/path/to/your.world
# Takeoff/Land commands
commander takeoff
commander land- UAV Control Node
source ./devel/setup.bash && rosrun agent fly.py- Car Launch
source ./devel/setup.bash
# Without Arm
roslaunch agent gazebo_car.launch
# With Arm
roslaunch armcar_moveit_config demo_gazebo.launch
# No GUI
roslaunch armcar_moveit_config demo_gazebo.launch gazebo_gui:=false- Car Nodes
# Car Control Node
source ./devel/setup.bash && rosrun agent car_ctrl.py
# Arm Control Node
source ./devel/setup.bash && rosrun agent arm_ctrl.py- Dog Launch
source ./devel/setup.bash && roslaunch unitree_guide gazeboSim.launch - Dog Nodes
# Dog Joint Control
./devel/lib/unitree_guide/junior_ctrl
# Dog Control Node
source ./devel/setup.bash && rosrun agent dog_ctrl.pyIf you find this project useful in your research, please consider citing:
@article{chen2025leorobotagent,
title={LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator},
author={Chen, Lihuang and Luo, Xiangyu and Meng, Jun},
journal={arXiv preprint arXiv:2512.10605},
year={2025}
}





