TinkerBlocks RPI is a comprehensive system that recognizes physical programming blocks and controls real robots through an interpreter pattern. The system captures images of blocks arranged on a grid, performs AI-powered OCR to read commands, maps them directly to a 16x10 grid, and executes the resulting program either in simulation or on actual hardware via API integration.
An educational tool for teaching programming concepts through physical blocks with real robot control:
- Capture: Camera captures image of arranged blocks
- Process: Image rotation and perspective transformation
- Recognize: AI-powered OCR extracts text/commands from blocks with direct grid mapping
- Execute: Interpreter runs commands either in simulation or controls actual ESP32 car hardware
- ESP32 Car API: Direct HTTP API integration for movement, rotation, and sensors
- Movement Control: Precise distance-based movement with gyroscope correction
- Sensor Integration: Ultrasonic distance, IR line detection, obstacle avoidance
- Drawing Control: Servo-controlled pen for physical drawing
- Real-time Feedback: Live sensor readings and execution status
- Mock Hardware: Complete simulation for development and testing
- Movement Tracking: Comprehensive logging of all robot actions
- Error Handling: Graceful degradation when hardware is unavailable
- Dual Mode: Switch between simulation and real hardware seamlessly
The project follows clean architecture with three main modules plus hardware integration:
Foundational infrastructure and external interfaces:
- WebSocket server for real-time communication with CLI output
- Process controller for workflow management
- Car API Client: HTTP client for ESP32 robot communication
- Enhanced Logging System: Smart message routing with DEBUG/INFO/SUCCESS/WARNING/ERROR levels
- Centralized configuration and logging
Computer vision and AI-powered image processing:
- Image capture and manipulation with timestamped file organization
- Grid detection with perspective transformation and rectification
- AI-powered OCR processing with direct grid mapping
- Comprehensive timing measurements and file tracking
Interpreter pattern implementation with hardware control:
- Command definitions and registry
- Execution state management with real robot control
- Hardware Interface: Abstraction for real vs. mock hardware
- Grid command interpretation with sensor integration
- Extensible command system
tinker-blocks-rpi/
├── src/ # Source code
│ ├── core/ # Core infrastructure
│ │ ├── tests/ # Core module tests
│ │ └── README.md # Core documentation
│ ├── vision/ # Computer vision & AI-powered OCR
│ │ ├── capture/ # Camera components
│ │ ├── grid/ # Grid detection & perspective transformation
│ │ ├── ocr/ # AI-powered OCR with unified interface
│ │ ├── tests/ # Vision module tests
│ │ └── README.md # Vision documentation
│ ├── engine/ # Command interpreter
│ │ ├── tests/ # Engine module tests
│ │ └── README.md # Engine documentation
│ ├── tests/ # End-to-end and integration tests
│ │ ├── test_e2e_workflows.py
│ │ ├── test_integration_websocket.py
│ │ ├── test_workflow_chaining.py
│ │ ├── test_error_scenarios.py
│ │ └── demo_*.py # Demo scripts
│ ├── main.py # Application entry point
│ └── conftest.py # Pytest configuration
├── assets/ # Image assets
├── output/ # Generated outputs (timestamped folders)
├── pyproject.toml # Poetry configuration
├── poetry.lock # Locked dependencies
├── .gitignore # Git ignore rules
└── README.md # This file
- Python 3.13+
- Poetry package manager
- Camera (local or remote Raspberry Pi)
- API key for LLM for AI-powered OCR
# Clone repository
git clone [repository-url]
cd tinker-blocks-rpi
# Install dependencies
poetry install
# Set up environment variables
# For OCR
export OPENAI_API_KEY="your-api-key"
export ANTHROPIC_API_KEY="your-api-key"
# For Car API
export CAR_API_URL="your-car-api-url" # from ESP32 car
# Activate environment
poetry shellpython src/main.pyThe WebSocket server starts on ws://0.0.0.0:8765 with real-time console output.
Send JSON commands to the WebSocket server (with enhanced logging for debugging):
// Run complete pipeline with real hardware (OCR → Engine → Robot)
{"command": "run", "params": {"workflow": "full", "use_hardware": true}}
// Run complete pipeline in simulation mode
{"command": "run", "params": {"workflow": "full", "use_hardware": false}}
// Run OCR only with AI-powered processing
{"command": "run", "params": {"workflow": "ocr_grid"}}
// Run OCR with automatic engine execution on real hardware
{"command": "run", "params": {"workflow": "ocr_grid", "chain_engine": true, "use_hardware": true}}
// Run engine with custom grid on real hardware
{"command": "run", "params": {"workflow": "engine", "use_hardware": true, "grid": [["MOVE", "10"], ["TURN", "RIGHT"]]}}
// Test robot movement directly
{"command": "run", "params": {"workflow": "engine", "use_hardware": true, "grid": [["PEN_DOWN"], ["LOOP", "4"], ["", "MOVE", "5"], ["", "TURN", "RIGHT"], ["PEN_UP"]]}}
// Stop current process
{"command": "stop"}Configure the robot connection in src/core/config.py:
# Car API settings
car_api_url: str = "http://192.168.1.100" # Your ESP32 IP
car_api_timeout: float = 15.0 # Request timeoutEach workflow run creates a timestamped folder with comprehensive output:
output/20250608_221634/
├── rotated_original.jpg # Original image after rotation
├── grid_overlay.jpg # Grid visualization
├── transformed_grid.jpg # Perspective-corrected image
└── grid_result.json # Complete grid data with metadata
The project has comprehensive test coverage organized by module and test type:
Located within each module:
src/core/tests/- Core infrastructure testssrc/vision/tests/- Vision processing testssrc/engine/tests/- Engine interpreter tests
Located in src/tests/ directory:
test_e2e_workflows.py- Complete workflow execution teststest_integration_websocket.py- WebSocket server integrationtest_workflow_chaining.py- Workflow data passing and chainingtest_error_scenarios.py- Error handling and edge cases
# Run all tests (with enhanced logging for debugging)
poetry run pytest
# Run with coverage
poetry run pytest --cov=core --cov=vision --cov=engine --cov-report=html
# Run specific test categories
poetry run pytest src/core/tests/ # Core unit tests
poetry run pytest src/vision/tests/ # Vision unit tests
poetry run pytest src/engine/tests/ # Engine unit tests
poetry run pytest src/tests/ # E2E and integration tests
# Run specific test file with verbose debugging
poetry run pytest src/tests/test_e2e_workflows.py -v
# Run tests matching pattern
poetry run pytest -k "websocket" -vThe src/tests/ directory contains comprehensive demo scripts:
demo_hardware_api.py- Hardware integration showcase with real vs. mock examplesdemo_engine_workflow.py- Demonstrates engine execution with sample griddemo_param_handling.py- Tests WebSocket parameter handling- Other utility scripts for manual testing
Run the hardware integration demo to see all features:
python src/tests/demo_hardware_api.pyThis demonstrates:
- Mock vs. real hardware execution
- Movement tracking and logging
- Sensor-based programming (obstacle avoidance, line following)
- Error handling and graceful degradation
- API configuration examples
Edit core/config.py for system settings:
- Robot API: ESP32 car IP address and timeout settings
- Server Settings: WebSocket and camera server IPs and ports
- Grid Detection: Corner coordinates and dimensions
- AI Models: LLM model settings for OCR
- File Paths: Output and asset directories
- ESP32 Car: Ensure your robot is connected to the same network
- IP Configuration: Update
car_api_urlwith your robot's IP address - Network Testing: Verify connectivity with
pingor browser test - API Testing: Use the demo script to test hardware integration
For detailed information about each module:
- Core Module Documentation - Infrastructure and architecture
- Vision Module Documentation - AI-powered image processing pipeline
- Engine Module Documentation - Command interpreter system
- Python 3.13 - Core language with modern async features
- OpenCV - Computer vision and image processing
- LangChain - AI model integration for OCR
- OpenAI GPT-4V/Claude - Vision-capable AI models
- HTTP/REST API - ESP32 robot communication
- WebSockets - Real-time bidirectional communication with smart log routing
- asyncio - Asynchronous programming for concurrent operations
- requests - HTTP client for robot API integration
- Poetry - Dependency management and virtual environments
- Pydantic - Data validation and structured output
- pytest - Comprehensive testing framework
The engine supports a comprehensive set of commands for programming:
MOVE- Move forward or backwardMOVE→ Move 999cm forward (effectively "move until obstacle")MOVE | 5→ Move 5cm forwardMOVE | -3→ Move 3cm backward
TURN- Rotate the car (requires LEFT, RIGHT, or degrees)TURN | LEFT→ Turn 90° leftTURN | RIGHT→ Turn 90° rightTURN | 45→ Turn 45° right (positive = right, negative = left)TURN | LEFT | 30→ Turn left by 30°TURN | RIGHT | 45→ Turn right by 45°
-
LOOP- Repeat nested commands a specified number of timesLOOP | 5→ Repeat 5 times
-
WHILE- Repeat nested commands while a condition is trueWHILE | condition→ Loop while condition is trueWHILE | TRUE→ Infinite loop
-
IF- Conditional executionIF | condition→ Execute nested commands if true- Can be followed by
ELSEblock for false case
SET- Assign values to variablesSET | X | 5→ Set X to 5SET | Y | X | + | 3→ Set Y to X + 3SET | FLAG | TRUE→ Set boolean variableSET | COUNTER | 0→ Variable names can be any alphabetic string
PEN_ON- Start drawing pathPEN_OFF- Stop drawing path
WAIT- Pause executionWAIT | 2→ Wait 2 secondsWAIT | 0.5→ Wait 0.5 seconds
- Numbers:
5,3.14,-2 - Booleans:
TRUE,FALSE - Variables: Any alphabetic string (e.g.,
X,COUNT,DISTANCE_VAR) - Sensors:
DISTANCE,OBSTACLE,BLACK_ON,BLACK_OFF - Operators:
+,-,*,/,<,>,=,!=,AND,OR,NOT
Commands are arranged on a 16x10 grid:
- Read left-to-right, top-to-bottom
- Arguments separated by
|in cells - Indentation (column > 0) creates nested blocks for loops/conditions
- Visual Output: Real-time execution visualization and robot tracking
- Block Designer: Tool for creating custom programming blocks
- Multi-robot Support: Control multiple robots simultaneously
- Web Interface: Browser-based control panel with live video feed
- Advanced Sensors: Camera vision, GPS, accelerometer integration
- Program Storage: Save, load, and share programming sequences
- Debugging Tools: Step-through execution, breakpoints, variable inspection
- Extended Math: More mathematical operations and functions
- Cloud Integration: Remote robot control and collaborative programming