ISL ONNX Model Integration Guide

Overview

This document explains the integration of the custom-trained Indian Sign Language (ISL) ONNX model into the Vite/React application.

Model Details

File: public/isl_model.onnx (6.5 MB)
Input Shape: [1, 42] (21 hand landmarks × 2 coordinates: x, y)
Output: Probability distribution over 42 ISL gesture classes
Framework: ONNX Runtime Web

Data Preprocessing

Critical Math: Landmark Normalization

For each video frame, the model expects normalized hand landmark data:

Extract Coordinates: Get x and y from all 21 MediaPipe hand landmarks
Calculate Minimums: Find min_x and min_y for the current frame

Normalize: Subtract minimums from each coordinate:

normalized_x_i = x_i - min_x
normalized_y_i = y_i - min_y

Flatten: Convert to Float32Array of length 42: [x₀, y₀, x₁, y₁, ..., x₂₀, y₂₀]

Implementation

const processLandmarks = (landmarks: any[]): Float32Array => {
    // Extract x and y coordinates
    const coords = landmarks.map(lm => ({ x: lm.x, y: lm.y }));
    
    // Calculate min_x and min_y
    const minX = Math.min(...coords.map(c => c.x));
    const minY = Math.min(...coords.map(c => c.y));
    
    // Normalize: subtract minimums and flatten
    const normalized: number[] = [];
    for (const coord of coords) {
        normalized.push(coord.x - minX);
        normalized.push(coord.y - minY);
    }
    
    return new Float32Array(normalized);
};

Dependencies Installed

npm install onnxruntime-web @mediapipe/hands @mediapipe/camera_utils

Package Versions

onnxruntime-web: ONNX Runtime for browser-based inference
@mediapipe/hands: Google MediaPipe Hands for hand landmark detection
@mediapipe/camera_utils: Camera utilities for MediaPipe

Architecture

Component Flow

User clicks Start
    ↓
Load ONNX Model (once)
    ↓
Initialize MediaPipe Hands
    ↓
Start Camera (1280×720)
    ↓
Process frames at ~30 FPS:
    1. Capture video frame
    2. MediaPipe detects 21 hand landmarks
    3. Normalize landmarks (min-subtraction)
    4. Run ONNX inference
    5. Get prediction + confidence
    6. Update UI
    ↓
Display results in real-time

Key Features Implemented

1. Model Loading

ONNX model loads on component mount
Camera starts only after model is ready
Uses WebAssembly execution provider for performance

2. Real-time Processing

30 FPS hand tracking
Live landmark visualization on canvas
Inference latency tracking (~50-150ms typical)

3. UI Updates

Large, clear prediction display
Confidence score visualization
FPS and latency metrics
Conversation history with auto-transcript

4. Hand Visualization

Green landmarks and connections drawn on canvas
21 hand keypoints tracked
Palm and finger connections rendered

ISL Gesture Classes (42 Total)

The model recognizes the following gestures:

const ISL_CLASSES = [
    'Hello', 'Thank you', 'Please', 'Help', 'Yes', 'No', 
    'Good morning', 'How are you', 'Sorry', 'Welcome',
    'Goodbye', 'I', 'You', 'We', 'They', 'What', 'When', 
    'Where', 'Why', 'How', 'Good', 'Bad', 'Happy', 'Sad',
    'Eat', 'Drink', 'Sleep', 'Work', 'Study', 'Play',
    'Family', 'Friend', 'Mother', 'Father', 'Brother', 'Sister',
    'Love', 'Like', 'Want', 'Need', 'Have', 'Go'
];

Note: Update this array to match your actual trained classes.

Performance Optimizations

1. Efficient Inference

Single hand tracking (maxNumHands: 1)
Model complexity: 1 (balanced)
Confidence thresholds: 0.5 (detection), 0.5 (tracking)

2. Canvas Rendering

Direct canvas manipulation for hand landmarks
No unnecessary re-renders
Optimized drawing with requestAnimationFrame

3. Memory Management

Proper cleanup on component unmount
Camera stream stopped when session ends
MediaPipe resources released

Usage

Starting a Session

Click the green Play button
Wait for "Position hands in frame" message
Show ISL gestures to the camera
View real-time predictions in the right panel

Stopping a Session

Click the red Pause button
Camera and processing stop immediately
All resources are cleaned up

Troubleshooting

Model Not Loading

Check browser console for errors
Ensure public/isl_model.onnx exists
Verify file is not corrupted (should be ~6.5 MB)

Camera Not Starting

Grant camera permissions in browser
Check if camera is already in use
Try refreshing the page

Low Accuracy

Ensure good lighting conditions
Position hand clearly in frame
Check if gestures match training data
Verify confidence threshold (default: 0.5)

Performance Issues

Close other browser tabs
Check CPU usage
Consider reducing video resolution
Ensure WebAssembly is enabled

Technical Notes

MediaPipe Hand Landmarks (21 points)

0: Wrist
1-4: Thumb (CMC, MCP, IP, Tip)
5-8: Index finger (MCP, PIP, DIP, Tip)
9-12: Middle finger (MCP, PIP, DIP, Tip)
13-16: Ring finger (MCP, PIP, DIP, Tip)
17-20: Pinky (MCP, PIP, DIP, Tip)

ONNX Inference Pipeline

1. Create Float32Array[42] from normalized landmarks
2. Create ONNX Tensor with shape [1, 42]
3. Run session.run() with input tensor
4. Extract output probabilities
5. Apply softmax for confidence scores
6. Return argmax as prediction

Future Enhancements

Add support for two-handed gestures
Implement gesture sequence recognition
Add custom gesture training interface
Optimize for mobile devices
Add offline mode support
Implement gesture smoothing/filtering

Credits

ONNX Runtime: Microsoft
MediaPipe: Google
Model Training: Custom ISL dataset

Last Updated: January 31, 2026 Version: 1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISL ONNX Model Integration Guide

Overview

Model Details

Data Preprocessing

Critical Math: Landmark Normalization

Implementation

Dependencies Installed

Package Versions

Architecture

Component Flow

Key Features Implemented

1. Model Loading

2. Real-time Processing

3. UI Updates

4. Hand Visualization

ISL Gesture Classes (42 Total)

Performance Optimizations

1. Efficient Inference

2. Canvas Rendering

3. Memory Management

Usage

Starting a Session

Stopping a Session

Troubleshooting

Model Not Loading

Camera Not Starting

Low Accuracy

Performance Issues

Technical Notes

MediaPipe Hand Landmarks (21 points)

ONNX Inference Pipeline

Future Enhancements

Credits

FilesExpand file tree

ISL_MODEL_INTEGRATION.md

Latest commit

History

ISL_MODEL_INTEGRATION.md

File metadata and controls

ISL ONNX Model Integration Guide

Overview

Model Details

Data Preprocessing

Critical Math: Landmark Normalization

Implementation

Dependencies Installed

Package Versions

Architecture

Component Flow

Key Features Implemented

1. Model Loading

2. Real-time Processing

3. UI Updates

4. Hand Visualization

ISL Gesture Classes (42 Total)

Performance Optimizations

1. Efficient Inference

2. Canvas Rendering

3. Memory Management

Usage

Starting a Session

Stopping a Session

Troubleshooting

Model Not Loading

Camera Not Starting

Low Accuracy

Performance Issues

Technical Notes

MediaPipe Hand Landmarks (21 points)

ONNX Inference Pipeline

Future Enhancements

Credits