Automoy V2

Automoy is a modular, GPU-accelerated autonomous agent framework that blends LLM reasoning, visual parsing, and real-world OS-level control.

Originally based on Self-Operating Computer by OthersideAI, Automoy has evolved into a full-stack, pluggable system capable of UI automation, screen parsing, and soon, robotics.

Automoy turns screenshots and objectives into action.

✨ What's New in V2?

Full GPU acceleration for OCR and computer vision tasks
OmniParser-based UI analysis
Modular support for LLMs like Deepseek, Ollama, GPT-4, Mistral
Web UI (God Mode) for real-time monitoring, task queue, and execution feedback
Structured plugin and script support
Installer for one-click setup on Windows

🧠 Project Overview

automoy-v2/
├── config/               # Config loader and runtime config.txt/py
│   ├── config.py
│   └── config.txt
│
├── core/
│   ├── main.py           # Entrypoint
│   ├── operate.py        # Main Automoy execution logic
│   ├── prompts.py        # Prompt crafting
│   ├── lm_interface.py   # Core LLM interface (used in legacy and current modules)
│   ├── exceptions.py     # Custom exception handling
│   ├── lm_interfaces/
│   │   ├── main_interface.py
│   │   └── handlers/     # LLM input/output formatters
│   ├── utils/
│   │   ├── omniparser/   # OmniParser interface and sample output
│   │   ├── operating_system/
│   │   │   └── os_interface.py
│   │   ├── region/       # Region mapping and coordinate translation
│   │   │   ├── mapper.py
│   │   │   └── area_selector.py
│   │   ├── vmware/       # VMWare controller interface
│   │   └── web_scraping/ # Scraping and HTML UI parsing
│
├── dependencies/
│   └── OmniParser-master/ # OmniParser engine directory
│
├── evaluations/          # Benchmarking + test scripts
│
├── gui/
│   ├── gui.py            # FastAPI frontend + control panel
│   ├── static/           # Web assets (JS, CSS)
│   └── templates/        # HTML templates
│
├── installer/
│   ├── setup.py          # Launch installer
│   ├── venv_setup.py
│   ├── cuda_setup.py
│   ├── omniparser_setup.py
│   ├── vmware_setup.py
│   ├── requirements.txt
│   ├── downloads/
│   └── aria2c.exe        # Download utility
│
├── scripts/              # Custom automation routines
├── README.md

🤖 How Automoy Works

Screenshot Taken (manually or on loop)
OmniParser parses UI elements from image
LLM interprets context and decides next step
Automoy acts (mouse clicks, keyboard input, etc.)

Automoy "sees" the desktop, "thinks" about what to do, and then "acts" on the system directly.

📅 System Requirements

Minimum:

2GB VRAM GPU
8GB RAM
CUDA 11.8+

OS Support:

Windows 10+ (official)
Linux: community support coming soon
macOS: Not supported

🛠️ AMD GPU Support (Experimental)

CUDA is NVIDIA-exclusive, but the following projects offer possible AMD compatibility:

SCALE Toolkit – CUDA-on-AMD translation layer
ZLUDA – Converts CUDA binaries to Intel/AMD-compatible code

These are not officially supported, but adventurous users may test and report results.

🚀 Roadmap

🔧 Installation

Clone the repo
Navigate to /installer
Run setup.py or use aria2c.exe to fetch dependencies
Launch main.py from core/

✨ Bonus: Touchscreen + RGB Support (Optional)

Automoy can be deployed in custom hardware:

Touchscreens (5"–7" HDMI + USB)
RGB lighting (WS2812 via ESP32 or Arduino)
3D-printed cases for standalone or rack-mounted setups

This makes Automoy a physical, interactive AI agent appliance.

📚 License & Acknowledgments

Based on original ideas from OthersideAI
OmniParser integration by Microsoft
Inspired by the vision of local-first, agentic computing

Built by Technologies-AI

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.vscode		.vscode
__pycache__		__pycache__
config		config
core		core
data/goals		data/goals
debug		debug
dependencies		dependencies
downloads		downloads
evaluations		evaluations
gui		gui
installer		installer
playground		playground
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
gui_setup.py		gui_setup.py
gui_state.json		gui_state.json
requirements.txt		requirements.txt
test_omniparser.py		test_omniparser.py
test_omniparser_calculator_flow.py		test_omniparser_calculator_flow.py
test_region_mapper.py		test_region_mapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automoy V2

✨ What's New in V2?

🧠 Project Overview

🤖 How Automoy Works

📅 System Requirements

Minimum:

Recommended:

OS Support:

🛠️ AMD GPU Support (Experimental)

🚀 Roadmap

🔧 Installation

✨ Bonus: Touchscreen + RGB Support (Optional)

📚 License & Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

caarson/Automoy-V2

Folders and files

Latest commit

History

Repository files navigation

Automoy V2

✨ What's New in V2?

🧠 Project Overview

🤖 How Automoy Works

📅 System Requirements

Minimum:

Recommended:

OS Support:

🛠️ AMD GPU Support (Experimental)

🚀 Roadmap

🔧 Installation

✨ Bonus: Touchscreen + RGB Support (Optional)

📚 License & Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages