Automoy is a modular, GPU-accelerated autonomous agent framework that blends LLM reasoning, visual parsing, and real-world OS-level control.
Originally based on Self-Operating Computer by OthersideAI, Automoy has evolved into a full-stack, pluggable system capable of UI automation, screen parsing, and soon, robotics.
Automoy turns screenshots and objectives into action.
- Full GPU acceleration for OCR and computer vision tasks
- OmniParser-based UI analysis
- Modular support for LLMs like Deepseek, Ollama, GPT-4, Mistral
- Web UI (God Mode) for real-time monitoring, task queue, and execution feedback
- Structured plugin and script support
- Installer for one-click setup on Windows
automoy-v2/
├── config/ # Config loader and runtime config.txt/py
│ ├── config.py
│ └── config.txt
│
├── core/
│ ├── main.py # Entrypoint
│ ├── operate.py # Main Automoy execution logic
│ ├── prompts.py # Prompt crafting
│ ├── lm_interface.py # Core LLM interface (used in legacy and current modules)
│ ├── exceptions.py # Custom exception handling
│ ├── lm_interfaces/
│ │ ├── main_interface.py
│ │ └── handlers/ # LLM input/output formatters
│ ├── utils/
│ │ ├── omniparser/ # OmniParser interface and sample output
│ │ ├── operating_system/
│ │ │ └── os_interface.py
│ │ ├── region/ # Region mapping and coordinate translation
│ │ │ ├── mapper.py
│ │ │ └── area_selector.py
│ │ ├── vmware/ # VMWare controller interface
│ │ └── web_scraping/ # Scraping and HTML UI parsing
│
├── dependencies/
│ └── OmniParser-master/ # OmniParser engine directory
│
├── evaluations/ # Benchmarking + test scripts
│
├── gui/
│ ├── gui.py # FastAPI frontend + control panel
│ ├── static/ # Web assets (JS, CSS)
│ └── templates/ # HTML templates
│
├── installer/
│ ├── setup.py # Launch installer
│ ├── venv_setup.py
│ ├── cuda_setup.py
│ ├── omniparser_setup.py
│ ├── vmware_setup.py
│ ├── requirements.txt
│ ├── downloads/
│ └── aria2c.exe # Download utility
│
├── scripts/ # Custom automation routines
├── README.md
- Screenshot Taken (manually or on loop)
- OmniParser parses UI elements from image
- LLM interprets context and decides next step
- Automoy acts (mouse clicks, keyboard input, etc.)
Automoy "sees" the desktop, "thinks" about what to do, and then "acts" on the system directly.
- 2GB VRAM GPU
- 8GB RAM
- CUDA 11.8+
- 16GB RAM
- RTX-class GPU (3060+)
Automoy requires CUDA to accelerate visual tasks. If CUDA is misconfigured, Automoy will not launch.
- Windows 10+ (official)
- Linux: community support coming soon
- macOS: Not supported
CUDA is NVIDIA-exclusive, but the following projects offer possible AMD compatibility:
- SCALE Toolkit – CUDA-on-AMD translation layer
- ZLUDA – Converts CUDA binaries to Intel/AMD-compatible code
These are not officially supported, but adventurous users may test and report results.
- [IN-P] Web-based control panel (God Mode)
- [IN-P] Script/plugin loader with hot-reload capability
- CUDA installer w/ PyTorch verification for GPU acceleration
- Multi-screen support for wider operational contexts
- OCR fallback pipeline to support CPU-only environments
- Real-time frame analysis loop (stream-based parsing for dynamic UI)
- OpenCL + integrated graphics support (non-CUDA fallback and broader hardware support)
- Portable Automoy OS under
os/(USB-bootable image for appliance deployment) - MOE (Mixture of Experts) agent logic and critical reasoning support
- Advanced geometric reasoning and spatial awareness via vision model extensions
- NVIDIA Isaac Sim integration (robotics interfaces modularized under
utils/) - Circuit simulator integration for electrical prototyping (e.g. Falstad, ngspice)
- Support for additional simulation tools (physics, CAD, electronics, or virtual environments)
- Adaptive manufacturing mode: Automoy as an orchestrator of physical/digital manufacturing workflows (via robotic arm + sensors)
- Clone the repo
- Navigate to
/installer - Run
setup.pyor usearia2c.exeto fetch dependencies - Launch
main.pyfromcore/
Automoy can be deployed in custom hardware:
- Touchscreens (5"–7" HDMI + USB)
- RGB lighting (WS2812 via ESP32 or Arduino)
- 3D-printed cases for standalone or rack-mounted setups
This makes Automoy a physical, interactive AI agent appliance.
- Based on original ideas from OthersideAI
- OmniParser integration by Microsoft
- Inspired by the vision of local-first, agentic computing
Built by Technologies-AI