Production-ready Linux Bridge for Official OSWorld Benchmark
The Linux implementation of AxonBridge, enabling AxonHub to control Ubuntu desktop environments for official OSWorld 369-task benchmarking.
AXONBRIDGE-Linux is the key component that enables AxonHub to run official OSWorld benchmarks:
Mac (AxonHub Brain)
↓ gRPC
Ubuntu VM (AXONBRIDGE-Linux)
↓ xdotool, X11, wmctrl
Ubuntu Desktop & Apps
↓ LibreOffice, GIMP, Chrome, etc.
Official OSWorld 369 Tasks
↓ xlang-ai/OSWorld evaluators
VERIFIED RESULTS ✅
- ✅ Keyboard injection (xdotool)
- ✅ Mouse clicks, movements, drags
- ✅ Modifier keys (Ctrl, Shift, Alt, Super)
- ✅ Special keys (Return, Escape, Arrows, F-keys)
- ✅ Text typing with natural delays
- ✅ Retry logic for reliability
- ✅ Screenshot capture (PNG format)
- ✅ JPEG encoding with quality control
- ✅ Multiple fallback methods (scrot, import, gnome-screenshot)
- ✅ Window-specific screenshots
- ✅ Performance optimized (<100ms capture)
- ✅ Window list (all visible windows)
- ✅ Process list (running applications)
- ✅ Active window detection
- ✅ Window management (focus, close)
- ✅ Desktop/workspace info
- ✅ System information
- ✅ Comprehensive error handling
- ✅ Structured logging (tracing)
- ✅ Automatic retries
- ✅ Graceful degradation
- ✅ Unit tests
- ✅ Documentation
Ubuntu 22.04 LTS (recommended for OSWorld compatibility)
# System dependencies
sudo apt update
sudo apt install -y \
xdotool \
wmctrl \
scrot \
x11-utils \
xdpyinfo \
imagemagick \
build-essential \
curl \
git
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env# Clone repository
git clone https://github.com/TheMailmans/AXONBRIDGE
cd AXONBRIDGE/linux
# Build release version
cargo build --release
# Binary location
./target/release/axonbridge# Run Bridge (listens on 0.0.0.0:50051)
./target/release/axonbridge
# Output:
# [INFO] AXONBRIDGE-Linux v1.0.0
# [INFO] Starting gRPC server on 0.0.0.0:50051
# [INFO] Ready to receive commands from AxonHubimport grpc
import agent_pb2
import agent_pb2_grpc
# Connect to Bridge (use your Ubuntu VM IP)
channel = grpc.insecure_channel('192.168.64.5:50051')
stub = agent_pb2_grpc.DesktopAgentStub(channel)
# Register
response = stub.RegisterAgent(agent_pb2.ConnectRequest())
print(f"Connected! Agent: {response.agent_id}")
# Test keyboard
stub.InjectKeyPress(agent_pb2.KeyPressRequest(
agent_id=response.agent_id,
key='space',
modifiers=['cmd']
))
# Test screenshot
screenshot = stub.CaptureScreenshot(agent_pb2.ScreenshotRequest(
agent_id=response.agent_id
))
print(f"Screenshot captured: {len(screenshot.image_data)} bytes")
# Test window list
windows = stub.GetWindowList(agent_pb2.GetWindowListRequest(
agent_id=response.agent_id
))
print(f"Windows: {list(windows.windows)}")axonbridge-linux/
├── src/
│ ├── main.rs # gRPC server entry point
│ ├── input_injection_linux.rs # Keyboard & mouse control
│ ├── screenshot_linux.rs # Screen capture
│ ├── system_queries_linux.rs # Window/process queries
│ ├── grpc_service.rs # gRPC service implementation
│ └── config.rs # Configuration management
├── proto/
│ └── agent.proto # gRPC protocol definition
├── config/
│ └── bridge.toml # Configuration file
├── Cargo.toml # Rust dependencies
└── README.md
Service: DesktopAgent
Methods:
RegisterAgent()- Register new agent connectionInjectKeyPress()- Press keyboard key with modifiersInjectMouseClick()- Click mouse buttonInjectMouseMove()- Move mouse to coordinatesCaptureScreenshot()- Capture screen imageGetWindowList()- List all visible windowsGetProcessList()- List running processesGetActiveWindow()- Get focused window title
File: config/bridge.toml
[server]
host = "0.0.0.0"
port = 50051
[input]
key_delay_ms = 10
modifier_delay_ms = 50
max_retries = 3
[screenshot]
default_format = "png"
jpeg_quality = 80
capture_timeout_ms = 5000
[logging]
level = "info"
format = "json"
output = "stdout"# Run all tests
cargo test
# Run specific module tests
cargo test input_injection
cargo test screenshot
cargo test system_queries
# Run with output
cargo test -- --nocapture# Test Bridge with Hub
cd ~/Documents/Projects/ThinkBackHub
python3 test_bridge_connection.py
# Expected output:
# ✅ Connected to Bridge
# ✅ Keyboard injection works
# ✅ Screenshot capture works
# ✅ Window queries workBenchmarks (Ubuntu 22.04, Intel i5, 8GB RAM):
| Operation | Latency | Notes |
|---|---|---|
| Key press | 10-15ms | Single key |
| Key combo | 50-70ms | With modifiers |
| Mouse click | 5-10ms | At current position |
| Mouse move | 10-15ms | To new position |
| Screenshot | 80-120ms | Full screen PNG |
| Window list | 30-50ms | All windows |
| Process list | 100-150ms | All processes |
# Check if port is in use
sudo lsof -i :50051
# Kill existing process
sudo killall axonbridge
# Check logs
journalctl -u axonbridge -f# Verify X11 display
echo $DISPLAY
# Should output: :0 or :1
# Test xdotool manually
xdotool key space
# Check xdotool is installed
which xdotool# Check available tools
which scrot
which import
which gnome-screenshot
# Install missing tools
sudo apt install scrot imagemagick
# Test screenshot manually
scrot test.png# Verify wmctrl works
wmctrl -l
# Install if missing
sudo apt install wmctrl
# Check window manager
echo $XDG_CURRENT_DESKTOP- Bridge listens on 0.0.0.0:50051 by default
- Production: Use firewall to restrict access to Hub IP only
- Development: Safe on isolated VM network
# Restrict to Hub IP only
sudo ufw allow from 192.168.64.1 to any port 50051
sudo ufw deny 50051- All inputs are validated before execution
- Command injection protection
- Path traversal prevention
- Rate limiting (configurable)
Bridge uses structured logging with tracing:
# Set log level
export RUST_LOG=info
# Available levels: trace, debug, info, warn, error
# Debug mode
export RUST_LOG=debug
./target/release/axonbridge
# JSON output (for log aggregation)
export RUST_LOG=info
export RUST_LOG_FORMAT=json
./target/release/axonbridgeFile: /etc/systemd/system/axonbridge.service
[Unit]
Description=AXONBRIDGE-Linux Desktop Agent
After=network.target graphical.target
[Service]
Type=simple
User=osworld
WorkingDirectory=/home/osworld/AXONBRIDGE/linux
ExecStart=/home/osworld/AXONBRIDGE/linux/target/release/axonbridge
Restart=always
RestartSec=5
Environment="RUST_LOG=info"
[Install]
WantedBy=multi-user.targetSetup:
# Create service file
sudo nano /etc/systemd/system/axonbridge.service
# (paste content above)
# Reload systemd
sudo systemctl daemon-reload
# Enable auto-start
sudo systemctl enable axonbridge
# Start service
sudo systemctl start axonbridge
# Check status
sudo systemctl status axonbridge
# View logs
sudo journalctl -u axonbridge -f# Generate Rust docs
cargo doc --open
# Opens browser with full API documentationThis Bridge is specifically designed for official OSWorld 369-task benchmarking:
- ✅ LibreOffice (Writer, Calc, Impress)
- ✅ GIMP
- ✅ Google Chrome
- ✅ Thunderbird
- ✅ VLC Media Player
- ✅ VS Code
- ✅ File Manager (Nautilus)
- ✅ System Settings
1. AxonHub (Mac) receives OSWorld task
2. Hub LLM (Claude) analyzes BEFORE state
3. Hub sends commands to Bridge (Ubuntu)
4. Bridge executes on Ubuntu desktop
5. Hub captures AFTER state
6. OSWorld evaluator scores result
7. Repeat for all 369 tasks
- Goal: Run official OSWorld 369 tasks
- Platform: Ubuntu 22.04 LTS
- Evaluator: xlang-ai/OSWorld (official, unmodified)
- Submission: OSWorld VERIFIED leaderboard
MIT License - see LICENSE file
Contributions welcome! Please:
- Fork the repository
- Create feature branch
- Add tests for new features
- Submit pull request
- Issues: https://github.com/TheMailmans/AXONBRIDGE/issues
- OSWorld: https://github.com/xlang-ai/OSWorld
- OSWorld Team - xlang-ai for the benchmark framework
- xdotool - Jordan Sissel for X11 automation
- wmctrl - Tomas Styblo for window management
Built for Official OSWorld Benchmarking 🚀