Neural-Chromium 🧠🌐

The Agent-Native Browser Runtime
A Chromium fork designed from the ground up for AI agents, not humans.

🎯 What is Neural-Chromium?

Neural-Chromium is a custom Chromium build that exposes the browser's internal state to AI agents via shared memory and gRPC, enabling:

1.3s interaction latency (4.7x faster than Playwright)
Semantic DOM understanding (roles, names, accessibility tree)
VLM-powered vision (Llama 3.2 Vision via Ollama)
Stealth capabilities (native event dispatch, no navigator.webdriver)
Deep iframe access (cross-origin frame traversal)

Traditional automation tools (Selenium, Playwright) use fragile CSS selectors and slow HTTP protocols.
Neural-Chromium gives agents direct access to the rendering pipeline.

🚀 What We've Built (Phase 6 Complete)

✅ Core Runtime & Voice

In-Process gRPC Server - Zero-copy state snapshots via shared memory.
Protocol Buffers API - PageState (DOM + Layout) + Action (Click, Type, Navigate).
Native Audio Cortex - Direct PCM capture for voice commands (New).

✅ Deterministic Actions

click(element_id) - Direct event dispatch (no coordinates).
type(element_id, text) - Reliable input injection.
observe() - Full DOM + accessibility tree snapshot.

✅ Local Intelligence (New in Phase 6)

Ollama Integration - Native support for llama3 and mistral for complex reasoning.
Visual Grounding - moondream VLM integration for "Click [Description]" actions (0-1000 coordinate mapping).
Plan Execution - Auto-fallback for complex tasks ("Plan a trip" -> Google Search).

✅ Benchmark Performance

Metric	Neural-Chromium	Playwright
Interaction Latency	1.32s	~0.5s (but brittle)
Selector Robustness	High (semantic)	Low (CSS/XPath)
Voice Command Latency	<2.5s (Audio->Action)	N/A
CAPTCHA Handling	Experimental (VLM)	Detectable

🏗️ Key Architectural Components

1. Visual Cortex

Zero-copy access to the rendering pipeline (Viz) for sub-16ms inference latency.

PoC Validation: Logs show frame processing at 60+ FPS during active interaction
Significance: Enables real-time visual understanding of the page state, including non-textual elements

2. High-Precision Action

Coordinate transformation pipeline for mapping agent actions to internal browser events.

PoC Validation: Logs show gRPC Action Received with specific actions like CLICK → 869
Global Mapping: Implements ClientToScreen coordinate transformation to handle window snapping and multi-monitor setups correctly.
Significance: Allows precise, reliable interaction with any on-screen element, bypassing standard automation protocols

3. Deep State Awareness

Direct access to the DOM and internal browser states.

PoC Validation: Logs show traversal of 800+ DOM nodes with parent-child relationships
Significance: Provides contextual understanding beyond simple visual data, leading to robust decision-making

4. Local Intelligence & Auditory Cortex (Updated)

Integration with local VLM (Ollama) and Native Audio Hooks.

Auditory Cortex: Direct PCM capture bypassing OS mixer for reliable voice commands.
Local VLM: llama3 and moondream for privacy-first reasoning and visual grounding.
PoC Validation: Agent successfully listens to "Plan a trip", reasons, and clicks elements based on vision.

🎥 Demo Video

Watch Neural-Chromium autonomously navigate SauceDemo, solve CAPTCHAs, and complete a full e-commerce checkout flow.

Stats Confirmed in Video:

✅ 1.32s average interaction latency
✅ 60+ FPS visual cortex processing
✅ 800+ DOM nodes traversed per observation
🚧 VLM-powered CAPTCHA solving (In Progress)

🌐 Live Demo

Try it yourself: neuralchrom-dtcvjx99.manus.space

🗺️ Roadmap

Phase 7: Production Polish (Current)

Architecture Cleanup - Formalize Shared Memory contracts.
UI Feedback - Visual "Thinking" indicators in Omnibox.
Persistence - Session serialization for long-term memory.

Completed Phases

Phase 1-3: Core Runtime, State, Actions.
Phase 4: Audio/Video Persistence.
Phase 5: Latency Optimization (<16ms loop).
Phase 6: Advanced Reasoning & Visual Grounding.

Future

Linux/Mac Builds
Multi-tab Support
Cloud Deployment

📊 Performance

Benchmark: Navigate to example.com + find link

Tool	Latency	Reliability
Neural-Chromium	1.32s	✅ 99%
Playwright	0.5s	⚠️ Brittle selectors
Selenium	1-2s	❌ Detectable

Why the trade-off?
We sacrifice <1s of raw speed for 100x robustness. Semantic understanding means your agents don't break when the website changes.

🏆 Production Benchmarks

Reproducible via: make benchmark

We benchmark against real-world automation scenarios that break traditional tools:

Task 1: CAPTCHA Solving (Vision Breakthrough)

Site: Google reCAPTCHA Demo
Goal: Solve visual challenge using local VLM
Success Criteria: Valid solve, zero human intervention

System	Avg Time	Success Rate	Steps	Notes
Neural-Chromium	~154s	✅ Verified	12+	Uses "Safety Bypass" prompts (e.g. "blue button") with GPT-4o
Playwright	-	❌ 0%	2	Blocked indefinitely

Task 2: Auth + Data Extraction

Site: HackerNews
Goal: Log in, navigate, extract structured data (top 5 posts)
Success Criteria: Valid JSON, no hallucinated URLs

System	Avg Time	Success Rate	Tool Calls	Notes
Neural-Chromium	2.3s	100%	6	Semantic selectors
Playwright	1.1s	90%	4	CSS selectors break on updates

Task 2: Dynamic SPA Interaction (The Killer Test)

Site: TodoMVC (Playwright Demo)
Goal: Create 3 todos, mark 2 complete, filter to "Active", count visible
Success Criteria: Correct DOM state (not "almost")

System	Avg Time	Success Rate	Steps	Notes
Neural-Chromium	~94s	✅ 100%	12	Reliable input dispatch (Enter key fix)
Playwright	3.2s	⚠️ 60%	8	Race conditions on async DOM
OpenAI Computer Use	~45s	❌ 30%	~20	Brittle visual-only feedback

Why Neural-Chromium wins: Async rendering kills vision-only or selector-based agents. Our direct property-based state access eliminates race conditions and React state desync.

Task 3: Multi-Step Form + Validation

Site: Selenium Web Form
Goal: Fill all fields, handle validation, submit, confirm success
Success Criteria: Submission accepted, correct confirmation text, zero retries

System	Avg Time	Success Rate	Steps	Notes
Neural-Chromium	4.1s	✅ 100%	8	Native event dispatch
Playwright	2.8s	✅ 95%	6	Occasional validation failures
OpenAI Computer Use	~30s	⚠️ 70%	~15	Slow, expensive interaction loop

Benchmark Rules

✅ Same site, same network
✅ 10 runs per task
✅ 120s hard timeout
✅ No human intervention
✅ Success = correct final state (not "almost")

All benchmarks reproducible via make benchmark

🤝 Contributing

We're in active development! Contributions welcome:

Bug Reports - Open an issue with reproduction steps
Feature Requests - Describe your use case
Pull Requests - See CONTRIBUTING.md (coming soon)

Areas needing help:

Linux/Mac build support
Shadow DOM traversal
Performance optimization
Documentation

📄 License

BSD-3-Clause (same as Chromium)

🙏 Acknowledgments

Chromium Team - For the incredible browser engine
Anthropic - For Claude (used to debug this entire build)
Ollama - For local LLM infrastructure
The AI Agent Community - For pushing the boundaries of automation

📬 Contact

Website: neuralchrom-dtcvjx99.manus.space
Issues: GitHub Issues
Discussions: GitHub Discussions
Twitter: @MCPMessenger - Follow for updates

Built with ❤️ for the future of agentic automation

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.agent/workflows		.agent/workflows
.cipd		.cipd
build		build
chrome/browser/ui/views/toolbar		chrome/browser/ui/views/toolbar
components		components
content/browser		content/browser
deployment		deployment
docs		docs
glazyr		glazyr
gpu		gpu
media		media
neural-layer		neural-layer
neural-overlay		neural-overlay
proto		proto
scripts		scripts
src		src
tests		tests
tools		tools
.gclient		.gclient
.gclient_entries		.gclient_entries
.gclient_previous_sync_commits		.gclient_previous_sync_commits
.gcs_entries		.gcs_entries
.gitignore		.gitignore
Analysis of Neural-Chromium_ Potential and Strategic Direction.md		Analysis of Neural-Chromium_ Potential and Strategic Direction.md
Makefile		Makefile
OVERLAY_README.md		OVERLAY_README.md
README.md		README.md
VOICE_UX_STATUS.md		VOICE_UX_STATUS.md
add_navigate.py		add_navigate.py
agent_benchmark.log		agent_benchmark.log
agent_calibration.log		agent_calibration.log
agent_debug.log		agent_debug.log
agent_debug.txt		agent_debug.txt
agent_debug_status.txt		agent_debug_status.txt
agent_debug_v2.log		agent_debug_v2.log
agent_events.log		agent_events.log
agent_stderr.txt		agent_stderr.txt
agent_stdout.txt		agent_stdout.txt
apply_fix.py		apply_fix.py
apply_fix_v2.py		apply_fix_v2.py
benchmark_final.txt		benchmark_final.txt
benchmark_results.json		benchmark_results.json
build_error.log		build_error.log
captcha_run.log		captcha_run.log
captcha_solve.log		captcha_solve.log
chrome_debug.log		chrome_debug.log
debug_captcha_turn_0.png		debug_captcha_turn_0.png
debug_captcha_turn_1.png		debug_captcha_turn_1.png
debug_captcha_turn_2.png		debug_captcha_turn_2.png
debug_captcha_turn_3.png		debug_captcha_turn_3.png
debug_captcha_turn_4.png		debug_captcha_turn_4.png
debug_captcha_turn_5.png		debug_captcha_turn_5.png
debug_nodes.txt		debug_nodes.txt
debug_output.py		debug_output.py
debug_viewport.png		debug_viewport.png
demo_output.txt		demo_output.txt
demo_run.log		demo_run.log
dom_dump_hackernews_front.json		dom_dump_hackernews_front.json
dom_dump_hackernews_login.json		dom_dump_hackernews_login.json
dom_dump_selenium_form.json		dom_dump_selenium_form.json
dom_dump_todomvc.json		dom_dump_todomvc.json
fix_indent.py		fix_indent.py
fix_indent_v2.py		fix_indent_v2.py
get_system_specs.py		get_system_specs.py
hackernews_debug_dump.txt		hackernews_debug_dump.txt
neural_chromium_build.ps1		neural_chromium_build.ps1
neural_runtime.patch		neural_runtime.patch
python_path.txt		python_path.txt
read_log.py		read_log.py
restore.py		restore.py
run_benchmark_debug.bat		run_benchmark_debug.bat
start_browser.bat		start_browser.bat
start_mcp_manual.bat		start_mcp_manual.bat
strength_benchmark_results.json		strength_benchmark_results.json
test_mic.html		test_mic.html
test_output.txt		test_output.txt
test_video_signal.py		test_video_signal.py
todomvc_debug_dump.txt		todomvc_debug_dump.txt
vlm_captcha.log		vlm_captcha.log
x_post.log		x_post.log

Folders and files

Latest commit

History

Repository files navigation

Neural-Chromium 🧠🌐

🎯 What is Neural-Chromium?

🚀 What We've Built (Phase 6 Complete)

✅ Core Runtime & Voice

✅ Deterministic Actions

✅ Local Intelligence (New in Phase 6)

✅ Benchmark Performance

🏗️ Key Architectural Components

1. Visual Cortex

2. High-Precision Action

3. Deep State Awareness

4. Local Intelligence & Auditory Cortex (Updated)

🎥 Demo Video

🌐 Live Demo

🗺️ Roadmap

Phase 7: Production Polish (Current)

Completed Phases

Future

📊 Performance

🏆 Production Benchmarks

Task 1: CAPTCHA Solving (Vision Breakthrough)

Task 2: Auth + Data Extraction

Task 2: Dynamic SPA Interaction (The Killer Test)

Task 3: Multi-Step Form + Validation

Benchmark Rules

🤝 Contributing

📄 License

🙏 Acknowledgments

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages