Skip to content

xampos101/Hursor

Repository files navigation

Hursor - Multi-Mode Hand Gesture Control

Hursor Logo

Python License Status MediaPipe TensorFlow

Control your computer with hand gestures using ML-based gesture recognition

Features β€’ Quick Start β€’ Architecture β€’ How It Works β€’ Documentation


πŸ“– What is Hursor?

Ever wanted to control your computer without touching the mouse or keyboard? Hursor lets you do exactly that! It's a real-time hand gesture control system that uses machine learning to recognize your hand gestures and translate them into computer actions.

Just show your hand to the camera, and you can scroll pages, zoom in/out, move the cursor, and even click - all with simple finger gestures. No fancy hardware needed, just your webcam and some Python code!

Why I Built This

I got tired of constantly switching between mouse and keyboard, especially when browsing or doing presentations. So I built Hursor to make computer interaction more intuitive and hands-free. The gateway-based mode switching prevents accidental mode changes (trust me, I learned this the hard way during development πŸ˜…).


✨ Key Features

  • 🎯 Multi-Mode Control: Scroll, zoom, move cursor, and click - all with gestures
  • πŸ€– ML-Powered: Uses a neural network trained on your own gestures for accuracy
  • ⚑ Super Fast: ~10ms inference time means it feels instant (30 FPS smooth!)
  • πŸ”„ Smart Mode Switching: Gateway system prevents accidental mode changes
  • πŸ“ Zone-Based Control: Intuitive neutral zones make scrolling/zooming feel natural
  • 🎨 Visual Feedback: See what mode you're in and how confident the model is

🎯 Gesture Modes

Here's what each gesture does:

Gesture Fingers Mode What It Does
☝️ One 1 (index only) SCROLL Scroll up/down based on finger position
✌️ Peace 2 (index + middle) ZOOM Zoom in/out based on finger position
🀟 Three 3 (index + middle + ring) CURSOR Move mouse cursor with your index finger
✊ Fist 0 (in cursor mode) CLICK Left mouse click (only works in cursor mode)

πŸš€ Getting Started

What You'll Need

  • Python 3.8 or higher
  • A webcam (built-in or external)
  • Windows, macOS, or Linux

Installation

First, clone the repo and install the dependencies:

# Clone the repository
git clone https://github.com/xampos101/Hursor.git
cd Hursor

# Install dependencies
pip install -r requirements.txt

That's it! The dependencies will install MediaPipe, TensorFlow, OpenCV, and PyAutoGUI.

Running Hursor

# Start the application
python gesture_velocity_control.py

Point your hand at the camera and start gesturing! Press Q or ESC to quit.

Pro tip: Make sure you have good lighting and keep your hand about 30-60cm from the camera for best results.


πŸ—οΈ How It Works Under the Hood

System Overview

Here's the high-level flow of how Hursor processes your gestures:

flowchart LR
    CAM[Camera Feed] --> MP[MediaPipe<br/>Hand Detection]
    MP --> LM[21 Hand Landmarks]
    LM --> PROC[Normalize &<br/>Flatten]
    PROC --> ML[TFLite Model<br/>~10ms]
    ML --> CTRL[State Machine<br/>Controller]
    CTRL --> ACT[Actions<br/>Scroll/Zoom/Cursor/Click]
Loading

The camera captures your hand, MediaPipe extracts 21 landmark points, we normalize them, feed them to our ML model, and the state machine decides what action to take. All of this happens in real-time!

Mode State Machine

The cool part is the gateway-based state machine. You can only switch modes from IDLE, which prevents those annoying accidental mode switches:

stateDiagram-v2
    [*] --> IDLE: Start
    
    IDLE --> IDLE: Gesture 4 (IDLE)
    IDLE --> SCROLL: Hold Gesture 1<br/>for 1.5s
    IDLE --> ZOOM: Hold Gesture 2<br/>for 1.5s
    IDLE --> CURSOR: Hold Gesture 3<br/>for 1.5s
    
    SCROLL --> IDLE: Gesture 4 (IDLE)
    ZOOM --> IDLE: Gesture 4 (IDLE)
    CURSOR --> IDLE: Gesture 4 (IDLE)
    CURSOR --> CURSOR: Gesture 0 (FIST) β†’ Click
    
    note right of IDLE
        Gateway State
        Only from IDLE can
        you switch modes
    end note
Loading

Gesture Detection Pipeline

Before gestures reach the ML model, we process the raw landmarks:

flowchart LR
    RAW[Raw Landmarks<br/>21 points Γ— 3 coords] --> CENTER[Center around<br/>Wrist]
    CENTER --> SCALE[Scale by<br/>Palm Size]
    SCALE --> FLAT[Flatten to<br/>63 features]
    FLAT --> ML[ML Model<br/>Classification]
    ML --> GESTURE[Gesture ID<br/>0-4]
Loading

We center everything around the wrist and scale by palm size, so it works regardless of how far your hand is from the camera. Pretty neat, right?

Training Pipeline

Want to train your own model? Here's the process:

flowchart LR
    COLLECT[Data Collection<br/>hursor_data_collector.py] --> CSV[CSV Dataset<br/>63 features + label]
    CSV --> TRAIN[Train Model<br/>256β†’128β†’64β†’5]
    TRAIN --> KERAS[Keras Model<br/>gesture_model.keras]
    TRAIN --> TFLITE[TFLite Model<br/>gesture_model.tflite]
Loading

Collect your data, train the model, and it automatically converts to TFLite for fast inference!


πŸ’‘ Using Hursor

Mode Switching

Switching modes is simple but requires a bit of patience:

  1. Start in IDLE mode (show 4 fingers)
  2. Hold your desired gesture (1/2/3 fingers) for 1.5 seconds - you'll see a progress ring fill up
  3. To switch to another mode, go back to IDLE first (4 fingers), then hold the new gesture

This might seem like extra steps, but trust me, it prevents so many accidental switches! I tried direct switching first and it was chaos πŸ˜„

Scroll Mode (1 finger)

Move your index finger up and down to scroll. The higher/lower you go, the faster it scrolls:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      SCROLL UP          β”‚  ← Move finger here
β”‚   (Faster when higher)  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     NEUTRAL ZONE        β”‚  ← Rest here, no scrolling
β”‚   (35% - 65% height)    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚      SCROLL DOWN        β”‚  ← Move finger here
β”‚   (Faster when lower)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The neutral zone in the middle gives you a "resting" position where nothing happens. Super useful when you need to pause scrolling!

Zoom Mode (2 fingers)

Similar to scroll mode, but for zooming:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       ZOOM IN           β”‚  ← Fingers here
β”‚    (Ctrl + Plus)        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     NEUTRAL ZONE        β”‚  ← No zooming here
β”‚   (35% - 65% height)    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚       ZOOM OUT          β”‚  ← Fingers here
β”‚    (Ctrl + Minus)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

There's a cooldown between zooms (0.25s) to prevent zooming too fast. You can adjust this in the config if you want it faster/slower.

Cursor Mode (3 fingers)

This is where it gets fun! Move your index finger around and the cursor follows. Make a fist to click:

  • Move index finger β†’ Cursor moves (with smoothing for stability)
  • Make a fist β†’ Left click (hold for 0.15s, needs to be stable)

The click detection has a stability check - your hand needs to stay relatively still for 5 frames before the click timer starts. This prevents accidental clicks when your hand shakes a bit.


πŸ”§ Technical Stuff

ML Model Architecture

The model is pretty straightforward - a feedforward neural network:

Input: 63 features (21 landmarks Γ— 3 coords)
  ↓
Dense(256) + BatchNorm + Dropout(0.3)
  ↓
Dense(128) + BatchNorm + Dropout(0.3)
  ↓
Dense(64) + BatchNorm + Dropout(0.2)
  ↓
Dense(5) + Softmax
  ↓
Output: 5 gesture probabilities

Nothing fancy, but it works really well! The BatchNorm and Dropout help with generalization.

Training Performance

Here's how the model performed during training:

Training History

Training over 100 epochs - validation accuracy reached ~95%

As you can see, validation accuracy reached around 95% by the end. There's a bit of overfitting early on (training accuracy is higher than validation), but it eventually generalizes well. The model learns pretty quickly!

Performance Metrics

Here are the numbers:

Metric Value Notes
Inference Speed (TFLite) ~10ms Super fast! This is why we use TFLite
Inference Speed (Keras) ~100ms Fallback option, still usable
Frame Rate 30 FPS Smooth as butter with TFLite
Model Size (TFLite) <1 MB Tiny! Easy to share
Confidence Threshold 0.7 Adjustable if you want stricter/looser detection

Tech Stack

  • MediaPipe: Does the heavy lifting for hand detection (21 landmark points)
  • TensorFlow/Keras: For training the gesture classifier
  • TensorFlow Lite: Optimized model for real-time inference
  • OpenCV: Camera capture and visualization
  • PyAutoGUI: Actually controlling the mouse/keyboard

πŸ“ Project Structure

Here's what's in the repo:

Hursor/
β”œβ”€β”€ gesture_velocity_control.py  # Main app - run this!
β”œβ”€β”€ hursor_data_collector.py     # Collect training data
β”œβ”€β”€ hursor_train.py              # Train your model
β”œβ”€β”€ gesture_model.tflite         # Optimized model (fast!)
β”œβ”€β”€ gesture_model.keras          # Keras model (fallback)
β”œβ”€β”€ hursor_dataset.csv           # Your training data
β”œβ”€β”€ ARCHITECTURE.md              # Deep dive into the architecture
β”œβ”€β”€ requirements.txt             # Python dependencies
└── README.md                    # This file

Most of the time you'll just run gesture_velocity_control.py. The other files are for training your own model.


βš™οΈ Configuration

Want to tweak the behavior? Edit the settings at the top of gesture_velocity_control.py:

# Zone settings - adjust the neutral zone size
NEUTRAL_ZONE_TOP = 0.35      # 35% from top
NEUTRAL_ZONE_BOTTOM = 0.65   # 65% from top

# Scroll settings - make it faster/slower
SCROLL_BASE_SPEED = 3        # Base scroll amount
SCROLL_MAX_SPEED = 20        # Maximum scroll amount
SCROLL_COOLDOWN = 0.03       # Cooldown between scrolls (seconds)

# Zoom settings
ZOOM_COOLDOWN = 0.25         # Cooldown between zooms (seconds)

# Cursor settings - adjust responsiveness
CURSOR_SMOOTHING = 0.4       # 0-1, higher = more responsive
CURSOR_FRAME_MARGIN = 0.1    # 10% margin on camera frame

# Click settings - make clicking easier/harder
CLICK_COOLDOWN = 0.3         # Prevent double clicks
CLICK_HOLD_TIME = 0.15       # Time fist must be held (seconds)
CLICK_STABILITY_FRAMES = 5   # Frames fist must be detected
CLICK_MAX_MOVEMENT = 0.05    # Max hand movement during click

# Mode switching
MODE_HOLD_TIME = 1.5         # Seconds to hold gesture for mode switch
CONFIDENCE_THRESHOLD = 0.7   # Minimum ML confidence to accept gesture

Play around with these values to find what feels best for you!


πŸŽ“ Training Your Own Model

Want to train on your own gestures? Here's how:

Step 1: Collect Data

Run the data collector:

python hursor_data_collector.py
  • Press 0-4 to select which gesture you're recording (0=CLICK, 1=SCROLL, 2=ZOOM, 3=CURSOR, 4=IDLE)
  • Press ENTER to auto-collect 30 samples
  • Repeat for all 5 gestures

Tip: Collect samples from different angles and lighting conditions for better generalization!

Step 2: Train

Once you have data, train the model:

python hursor_train.py

This will:

  • Load your data from hursor_dataset.csv
  • Split it into train/validation/test (70/15/15)
  • Train the neural network with early stopping
  • Save both Keras and TFLite models
  • Show you accuracy metrics and a confusion matrix

The training script will automatically convert to TFLite, which is what the main app uses for speed.

Step 3: Use It!

The trained model (gesture_model.tflite) will be automatically loaded when you run gesture_velocity_control.py. That's it!


πŸ› Troubleshooting

Finger detection is inaccurate

  • βœ… Make sure you have good lighting (not too dark, not too bright)
  • βœ… Keep your hand 30-60cm from the camera
  • βœ… Hold fingers straight and spread them out
  • βœ… Try to avoid cluttered backgrounds

Cursor mode feels jittery

  • βœ… Increase CURSOR_SMOOTHING in the config (try 0.6-0.8)
  • βœ… Make sure your hand position is stable
  • βœ… Check that your camera is running at a good frame rate

Click not working

  • βœ… Hold the fist for at least 0.15 seconds (you'll see a progress ring)
  • βœ… Make sure you're in Cursor mode (3 fingers) before making a fist
  • βœ… Keep your hand relatively still during the click gesture
  • βœ… Try increasing CLICK_STABILITY_FRAMES if clicks are too sensitive

Mode switching feels slow

  • βœ… You need to hold the gesture for 1.5 seconds - this is intentional!
  • βœ… Make sure you start from IDLE mode (4 fingers)
  • βœ… Ensure your gesture is clearly visible to the camera
  • βœ… Check that confidence threshold isn't too high (default 0.7)

Model not loading

  • βœ… Make sure gesture_model.tflite or gesture_model.keras exists
  • βœ… If missing, run python hursor_train.py to train a model
  • βœ… Check file permissions - make sure Python can read the file

πŸ“š More Documentation

Want to dive deeper into the architecture? Check out ARCHITECTURE.md for detailed diagrams and explanations of how everything works together.


🀝 Contributing

Found a bug? Have an idea for a new feature? Pull requests are welcome!

Feel free to open an issue if you run into problems or have suggestions. I'd love to hear how you're using Hursor!


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Basically, use it however you want, just don't blame me if something breaks πŸ˜‰


πŸ™ Credits

Hursor wouldn't be possible without these amazing open-source projects:

  • MediaPipe - For the hand landmark detection (Google)
  • TensorFlow - ML framework (Google)
  • OpenCV - Computer vision utilities
  • PyAutoGUI - System control library

Thanks to all the contributors and maintainers of these projects!


Made with ❀️ by the Hursor Team

⭐ If you find this useful, consider giving it a star!

Got questions? Open an issue or check out the ARCHITECTURE.md for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages