Control your computer with hand gestures using ML-based gesture recognition
Features β’ Quick Start β’ Architecture β’ How It Works β’ Documentation
Ever wanted to control your computer without touching the mouse or keyboard? Hursor lets you do exactly that! It's a real-time hand gesture control system that uses machine learning to recognize your hand gestures and translate them into computer actions.
Just show your hand to the camera, and you can scroll pages, zoom in/out, move the cursor, and even click - all with simple finger gestures. No fancy hardware needed, just your webcam and some Python code!
I got tired of constantly switching between mouse and keyboard, especially when browsing or doing presentations. So I built Hursor to make computer interaction more intuitive and hands-free. The gateway-based mode switching prevents accidental mode changes (trust me, I learned this the hard way during development π ).
- π― Multi-Mode Control: Scroll, zoom, move cursor, and click - all with gestures
- π€ ML-Powered: Uses a neural network trained on your own gestures for accuracy
- β‘ Super Fast: ~10ms inference time means it feels instant (30 FPS smooth!)
- π Smart Mode Switching: Gateway system prevents accidental mode changes
- π Zone-Based Control: Intuitive neutral zones make scrolling/zooming feel natural
- π¨ Visual Feedback: See what mode you're in and how confident the model is
Here's what each gesture does:
| Gesture | Fingers | Mode | What It Does |
|---|---|---|---|
| βοΈ One | 1 (index only) | SCROLL | Scroll up/down based on finger position |
| βοΈ Peace | 2 (index + middle) | ZOOM | Zoom in/out based on finger position |
| π€ Three | 3 (index + middle + ring) | CURSOR | Move mouse cursor with your index finger |
| β Fist | 0 (in cursor mode) | CLICK | Left mouse click (only works in cursor mode) |
- Python 3.8 or higher
- A webcam (built-in or external)
- Windows, macOS, or Linux
First, clone the repo and install the dependencies:
# Clone the repository
git clone https://github.com/xampos101/Hursor.git
cd Hursor
# Install dependencies
pip install -r requirements.txtThat's it! The dependencies will install MediaPipe, TensorFlow, OpenCV, and PyAutoGUI.
# Start the application
python gesture_velocity_control.pyPoint your hand at the camera and start gesturing! Press Q or ESC to quit.
Pro tip: Make sure you have good lighting and keep your hand about 30-60cm from the camera for best results.
Here's the high-level flow of how Hursor processes your gestures:
flowchart LR
CAM[Camera Feed] --> MP[MediaPipe<br/>Hand Detection]
MP --> LM[21 Hand Landmarks]
LM --> PROC[Normalize &<br/>Flatten]
PROC --> ML[TFLite Model<br/>~10ms]
ML --> CTRL[State Machine<br/>Controller]
CTRL --> ACT[Actions<br/>Scroll/Zoom/Cursor/Click]
The camera captures your hand, MediaPipe extracts 21 landmark points, we normalize them, feed them to our ML model, and the state machine decides what action to take. All of this happens in real-time!
The cool part is the gateway-based state machine. You can only switch modes from IDLE, which prevents those annoying accidental mode switches:
stateDiagram-v2
[*] --> IDLE: Start
IDLE --> IDLE: Gesture 4 (IDLE)
IDLE --> SCROLL: Hold Gesture 1<br/>for 1.5s
IDLE --> ZOOM: Hold Gesture 2<br/>for 1.5s
IDLE --> CURSOR: Hold Gesture 3<br/>for 1.5s
SCROLL --> IDLE: Gesture 4 (IDLE)
ZOOM --> IDLE: Gesture 4 (IDLE)
CURSOR --> IDLE: Gesture 4 (IDLE)
CURSOR --> CURSOR: Gesture 0 (FIST) β Click
note right of IDLE
Gateway State
Only from IDLE can
you switch modes
end note
Before gestures reach the ML model, we process the raw landmarks:
flowchart LR
RAW[Raw Landmarks<br/>21 points Γ 3 coords] --> CENTER[Center around<br/>Wrist]
CENTER --> SCALE[Scale by<br/>Palm Size]
SCALE --> FLAT[Flatten to<br/>63 features]
FLAT --> ML[ML Model<br/>Classification]
ML --> GESTURE[Gesture ID<br/>0-4]
We center everything around the wrist and scale by palm size, so it works regardless of how far your hand is from the camera. Pretty neat, right?
Want to train your own model? Here's the process:
flowchart LR
COLLECT[Data Collection<br/>hursor_data_collector.py] --> CSV[CSV Dataset<br/>63 features + label]
CSV --> TRAIN[Train Model<br/>256β128β64β5]
TRAIN --> KERAS[Keras Model<br/>gesture_model.keras]
TRAIN --> TFLITE[TFLite Model<br/>gesture_model.tflite]
Collect your data, train the model, and it automatically converts to TFLite for fast inference!
Switching modes is simple but requires a bit of patience:
- Start in IDLE mode (show 4 fingers)
- Hold your desired gesture (1/2/3 fingers) for 1.5 seconds - you'll see a progress ring fill up
- To switch to another mode, go back to IDLE first (4 fingers), then hold the new gesture
This might seem like extra steps, but trust me, it prevents so many accidental switches! I tried direct switching first and it was chaos π
Move your index finger up and down to scroll. The higher/lower you go, the faster it scrolls:
βββββββββββββββββββββββββββ
β SCROLL UP β β Move finger here
β (Faster when higher) β
βββββββββββββββββββββββββββ€
β NEUTRAL ZONE β β Rest here, no scrolling
β (35% - 65% height) β
βββββββββββββββββββββββββββ€
β SCROLL DOWN β β Move finger here
β (Faster when lower) β
βββββββββββββββββββββββββββ
The neutral zone in the middle gives you a "resting" position where nothing happens. Super useful when you need to pause scrolling!
Similar to scroll mode, but for zooming:
βββββββββββββββββββββββββββ
β ZOOM IN β β Fingers here
β (Ctrl + Plus) β
βββββββββββββββββββββββββββ€
β NEUTRAL ZONE β β No zooming here
β (35% - 65% height) β
βββββββββββββββββββββββββββ€
β ZOOM OUT β β Fingers here
β (Ctrl + Minus) β
βββββββββββββββββββββββββββ
There's a cooldown between zooms (0.25s) to prevent zooming too fast. You can adjust this in the config if you want it faster/slower.
This is where it gets fun! Move your index finger around and the cursor follows. Make a fist to click:
- Move index finger β Cursor moves (with smoothing for stability)
- Make a fist β Left click (hold for 0.15s, needs to be stable)
The click detection has a stability check - your hand needs to stay relatively still for 5 frames before the click timer starts. This prevents accidental clicks when your hand shakes a bit.
The model is pretty straightforward - a feedforward neural network:
Input: 63 features (21 landmarks Γ 3 coords)
β
Dense(256) + BatchNorm + Dropout(0.3)
β
Dense(128) + BatchNorm + Dropout(0.3)
β
Dense(64) + BatchNorm + Dropout(0.2)
β
Dense(5) + Softmax
β
Output: 5 gesture probabilities
Nothing fancy, but it works really well! The BatchNorm and Dropout help with generalization.
Here's how the model performed during training:
As you can see, validation accuracy reached around 95% by the end. There's a bit of overfitting early on (training accuracy is higher than validation), but it eventually generalizes well. The model learns pretty quickly!
Here are the numbers:
| Metric | Value | Notes |
|---|---|---|
| Inference Speed (TFLite) | ~10ms | Super fast! This is why we use TFLite |
| Inference Speed (Keras) | ~100ms | Fallback option, still usable |
| Frame Rate | 30 FPS | Smooth as butter with TFLite |
| Model Size (TFLite) | <1 MB | Tiny! Easy to share |
| Confidence Threshold | 0.7 | Adjustable if you want stricter/looser detection |
- MediaPipe: Does the heavy lifting for hand detection (21 landmark points)
- TensorFlow/Keras: For training the gesture classifier
- TensorFlow Lite: Optimized model for real-time inference
- OpenCV: Camera capture and visualization
- PyAutoGUI: Actually controlling the mouse/keyboard
Here's what's in the repo:
Hursor/
βββ gesture_velocity_control.py # Main app - run this!
βββ hursor_data_collector.py # Collect training data
βββ hursor_train.py # Train your model
βββ gesture_model.tflite # Optimized model (fast!)
βββ gesture_model.keras # Keras model (fallback)
βββ hursor_dataset.csv # Your training data
βββ ARCHITECTURE.md # Deep dive into the architecture
βββ requirements.txt # Python dependencies
βββ README.md # This file
Most of the time you'll just run gesture_velocity_control.py. The other files are for training your own model.
Want to tweak the behavior? Edit the settings at the top of gesture_velocity_control.py:
# Zone settings - adjust the neutral zone size
NEUTRAL_ZONE_TOP = 0.35 # 35% from top
NEUTRAL_ZONE_BOTTOM = 0.65 # 65% from top
# Scroll settings - make it faster/slower
SCROLL_BASE_SPEED = 3 # Base scroll amount
SCROLL_MAX_SPEED = 20 # Maximum scroll amount
SCROLL_COOLDOWN = 0.03 # Cooldown between scrolls (seconds)
# Zoom settings
ZOOM_COOLDOWN = 0.25 # Cooldown between zooms (seconds)
# Cursor settings - adjust responsiveness
CURSOR_SMOOTHING = 0.4 # 0-1, higher = more responsive
CURSOR_FRAME_MARGIN = 0.1 # 10% margin on camera frame
# Click settings - make clicking easier/harder
CLICK_COOLDOWN = 0.3 # Prevent double clicks
CLICK_HOLD_TIME = 0.15 # Time fist must be held (seconds)
CLICK_STABILITY_FRAMES = 5 # Frames fist must be detected
CLICK_MAX_MOVEMENT = 0.05 # Max hand movement during click
# Mode switching
MODE_HOLD_TIME = 1.5 # Seconds to hold gesture for mode switch
CONFIDENCE_THRESHOLD = 0.7 # Minimum ML confidence to accept gesturePlay around with these values to find what feels best for you!
Want to train on your own gestures? Here's how:
Run the data collector:
python hursor_data_collector.py- Press
0-4to select which gesture you're recording (0=CLICK, 1=SCROLL, 2=ZOOM, 3=CURSOR, 4=IDLE) - Press
ENTERto auto-collect 30 samples - Repeat for all 5 gestures
Tip: Collect samples from different angles and lighting conditions for better generalization!
Once you have data, train the model:
python hursor_train.pyThis will:
- Load your data from
hursor_dataset.csv - Split it into train/validation/test (70/15/15)
- Train the neural network with early stopping
- Save both Keras and TFLite models
- Show you accuracy metrics and a confusion matrix
The training script will automatically convert to TFLite, which is what the main app uses for speed.
The trained model (gesture_model.tflite) will be automatically loaded when you run gesture_velocity_control.py. That's it!
- β Make sure you have good lighting (not too dark, not too bright)
- β Keep your hand 30-60cm from the camera
- β Hold fingers straight and spread them out
- β Try to avoid cluttered backgrounds
- β
Increase
CURSOR_SMOOTHINGin the config (try 0.6-0.8) - β Make sure your hand position is stable
- β Check that your camera is running at a good frame rate
- β Hold the fist for at least 0.15 seconds (you'll see a progress ring)
- β Make sure you're in Cursor mode (3 fingers) before making a fist
- β Keep your hand relatively still during the click gesture
- β
Try increasing
CLICK_STABILITY_FRAMESif clicks are too sensitive
- β You need to hold the gesture for 1.5 seconds - this is intentional!
- β Make sure you start from IDLE mode (4 fingers)
- β Ensure your gesture is clearly visible to the camera
- β Check that confidence threshold isn't too high (default 0.7)
- β
Make sure
gesture_model.tfliteorgesture_model.kerasexists - β
If missing, run
python hursor_train.pyto train a model - β Check file permissions - make sure Python can read the file
Want to dive deeper into the architecture? Check out ARCHITECTURE.md for detailed diagrams and explanations of how everything works together.
Found a bug? Have an idea for a new feature? Pull requests are welcome!
Feel free to open an issue if you run into problems or have suggestions. I'd love to hear how you're using Hursor!
This project is licensed under the MIT License - see the LICENSE file for details.
Basically, use it however you want, just don't blame me if something breaks π
Hursor wouldn't be possible without these amazing open-source projects:
- MediaPipe - For the hand landmark detection (Google)
- TensorFlow - ML framework (Google)
- OpenCV - Computer vision utilities
- PyAutoGUI - System control library
Thanks to all the contributors and maintainers of these projects!
Made with β€οΈ by the Hursor Team
β If you find this useful, consider giving it a star!
Got questions? Open an issue or check out the ARCHITECTURE.md for more details.
