Skip to content

Commit

Permalink
Merge pull request #109 from taorye/dev
Browse files Browse the repository at this point in the history
improve documentation for gesture classification
  • Loading branch information
Zepan authored Jan 22, 2025
2 parents fc20f08 + 58804a5 commit 3a9da9f
Show file tree
Hide file tree
Showing 5 changed files with 478 additions and 42 deletions.
266 changes: 243 additions & 23 deletions docs/doc/en/vision/hand_gesture_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,265 @@
title: MaixCAM MaixPy Hand Gesture Classification Based on Hand Keypoint Detection
---


## Introduction

The `MaixCAM MaixPy Hand Gesture Classification Based on Hand Keypoint Detection` can classify various hand gestures.

The current dataset used is the `14-class static hand gesture dataset` with a total of 2850 samples divided into 14 categories.
[Dataset Download Link (Baidu Netdisk, Password: 6urr)](https://pan.baidu.com/s/1Sd-Ad88Wzp0qjGH6Ngah0g)
The following describes how to preprocess features from [AI model estimated hand landmarks](./hand_landmarks.md), which are then classified using LinearSVC (Support Vector Machine). A detailed implementation is available in `MaixPy/projects/app_hand_gesture_classifier/LinearSVC.py`, and the usage example can be found in the app implementation in `MaixPy/projects/app_hand_gesture_classifier/main.py`.

![](../../assets/handposex_14class.jpg)
**Users can add any distinguishable hand gestures for training.**

## Usage

### Preprocessing
Here’s the preprocessing for the raw output `hand_landmarks` from the AI model to derive usable features:

```python
def preprocess(hand_landmarks, is_left=False, boundary=(1,1,1)):
hand_landmarks = np.array(hand_landmarks).reshape((21, -1))
vector = hand_landmarks[:,:2]
vector = vector[1:] - vector[0]
vector = vector.astype('float64') / boundary[:vector.shape[1]]
if not is_left: # mirror
vector[:,0] *= -1
return vector
```

### Import Modules
Alternatively, you can directly copy the `LinearSVC.py` implementation from the directory `target_dir`
```python
# To import LinearSVC
target_dir = '/maixapp/apps/hand_gesture_classifier/'
import sys
if target_dir not in sys.path:
sys.path.insert(0, target_dir)

from LinearSVC import LinearSVC, LinearSVCManager
```

### Classifier (LinearSVC)

Introduction to the LinearSVC classifier's functions and usage.

#### Initialization, Loading, and Exporting
```python
# Initialize
clf = LinearSVC(C=1.0, learning_rate=0.01, max_iter=500)
# Load
clf = LinearSVC.load("/maixapp/apps/hand_gesture_classifier/clf_dump.npz")
# Export
clf.save("my_clf_dump.npz")
```
*Initialization Method Parameters*
1. C=1.0 (Regularization Parameter)
- Controls the strength of the regularization in the SVM.
- A larger C value punishes misclassifications more strictly, potentially leading to overfitting.
- A smaller C value allows some misclassifications, improving generalization but potentially underfitting-
Default: 1.0, balanced regularization between accuracy and generalization.

2. learning_rate=0.01 (Learning Rate)
- Controls the step size for weight updates during each gradient descent optimization.
- Too large a learning rate may cause the optimization process to diverge
- While too small a learning rate may lead to slow convergence.
Default: 0.01, typically a moderate value to ensure gradual approach to the optimal solution.

3. max_iter=500 (Maximum Iterations)
- Specifies the maximum number of optimization rounds during training.
- More iterations allow the model more chances to converge, but too many may result in overfitting.
- A smaller max_iter value may stop optimization prematurely, leading to underfitting.
Default: 1000, sufficient iterations to ensure convergence.

*Loading and Exporting Method Parameters*
1. filename: str
- The target file path, both relative and absolute paths are supported.
- This is a required parameter.
Default: None

*Training and Prediction (Classification)*
After initializing the classifier, it needs to be trained before it can be used for classification.

If you load a previously trained classifier, it can directly be used for classification.

**Note: Every training session is a full training process, meaning previous training results will be lost. It's recommended to export the classifier backup as needed.**

```python
npzfile = np.load("/maixapp/apps/hand_gesture_classifier/trainSets.npz") # Preload features and labels (name_classes index)
X_train = npzfile["X"] # Raw features
y_train = npzfile["y"] # Labels

clf.fit(clf.scaler.fit_transform(X_train), y_train) # Train SVM after feature normalization

# Regression
y_pred = clf.predict(clf.scaler.transform(X_train)) # Predict labels after feature normalization
recall_count = len(y_train)
right_count = np.sum(y_pred == y_train)
print(f"right/recall= {right_count}/{recall_count}, acc: {right_count/recall_count}")

# Prediction
X_test = X_train[:5]
feature_test = clf.scaler.transform(X_test) # Feature normalization
# y_pred = clf.predict(feature_test) # Predict labels
y_pred, y_conf = clf.predict_with_confidence(feature_test) # Predict labels with confidence
print(f"pred: {y_pred}, conf: {y_conf}")
# Corresponding class names name_classes = ("one", "five", "fist", "ok", "heartSingle", "yearh", "three", "four", "six", "Iloveyou", "gun", "thumbUp", "nine", "pink")
```

Since every training is full, you need to manually maintain the storage of previously trained features and corresponding labels to allow dynamic addition/removal of classes.

To simplify usage and reduce extra workload, the `Classifier Manager (LinearSVCManager)` has been encapsulated, as described in the next section.


### Classifier Manager (LinearSVCManager)

Introduction to the LinearSVCManager's functions and usage.

#### Initialization, Loading, and Exporting

Both Initialization or Loading must provide valid X and Y (corresponding features and labels) inputs.

And their lengths must be equal and correspond to each other, or an error will occur.

```python
# Initialization, Loading
def __init__(self, clf: LinearSVC=LinearSVC(), X=None, Y=None, pretrained=False)

# Initialize with default LinearSVC parameters
clfm = LinearSVCManager(X=X_train, Y=y_train)
# Initialize with specific LinearSVC parameters
clfm = LinearSVCManager(LinearSVC(C=1.0, learning_rate=0.01, max_iter=100), X_train, y_train)

# Loading requires the loaded LinearSVC and setting pretrained=True to avoid unnecessary retraining
# Ensure X_train, y_train are the data previously used to train LinearSVC
clfm = LinearSVCManager(LinearSVC.load("/maixapp/apps/hand_gesture_classifier/clf_dump.npz"), X_train, y_train, pretrained=True)

# Export parameters using LinearSVC (clfm.clf)'s save
clfm.clf.save("my_clf_dump.npz")
# Export features and labels used for training
np.savez("trainSets.npz",
X = X_train,
y = y_train,
)
```

This app is implemented in `MaixPy/projects/app_hand_gesture_classifier/main.py`, and the main logic is as follows:
#### Accessing Training Data Used

clfm.samples is a Python tuple:
1. clfm.samples[0] is `X`
2. clfm.samples[1] is `Y`

**Do not modify directly, for read-only access only. To make changes, call `clfm.train()` to retrain the model.**

#### Adding or Removing

**When adding, ensure X_new and y_new have the same length and match the shape of the previous X_train and y_train.**

All are numpy arrays, and you can check their shape with the shape attribute.

```python
# Add new data
clfm.add(X_new, y_new)

# Remove data
mask_ge_4 = clfm.samples[1] >= 4 # Mask for class labels >= 4
indices_ge_4 = np.where(mask_ge_4)[0]
clfm.rm(indices_ge_4)
```

These operations mainly modify clfm.samples, but will trigger a call to clfm.train() to retrain the model.

Depending on the size of the training data, wait a few moments before directly applying the updated model.


#### Prediction

```python
y_pred, y_conf = clfm.test(X_test) # Predict labels
```

This is equivalent to:

```python
clf = clfm.clf
feature_test = clf.scaler.transform(X_test) # Feature normalization
y_pred, y_conf = clf.predict_with_confidence(feature_test) # Predict labels with confidence
```

#### Example (Simplified Version of the Resulting Video)

Note:
- Missing preprocess implementation should be copied from the Preprocessing section.
- Missing LinearSVC module should be copied from the Import Modules section.

The classification and prediction part can be run as a single file:

```python
from maix import camera, display, image, nn, app
import numpy as np

# Add below me

name_classes = ("one", "five", "fist", "ok", "heartSingle", "yearh", "three", "four", "six", "Iloveyou", "gun", "thumbUp", "nine", "pink") # Easy-to-understand class names
npzfile = np.load("/maixapp/apps/hand_gesture_classifier/trainSets.npz") # Preload features and labels (name_classes index)
X_train = npzfile["X"]
y_train = npzfile["y"]
clfm = LinearSVCManager(LinearSVC.load("/maixapp/apps/hand_gesture_classifier/clf_dump.npz"), X_train, y_train, pretrained=True) # Initialize LinearSVCManager with preloaded classifier

detector = nn.HandLandmarks(model="/root/models/hand_landmarks.mud")
cam = camera.Camera(320, 224, detector.input_format())
disp = display.Display()

# Loading screen
img = cam.read()
img.draw_string(100, 112, "Loading...\nwait up to 10s", color = image.COLOR_GREEN)
disp.show(img)

while not app.need_exit():
img = cam.read()
objs = detector.detect(img, conf_th = 0.7, iou_th = 0.45, conf_th2 = 0.8)
for obj in objs:
hand_landmarks = preprocess(obj.points[8:8+21*3], obj.class_id == 0, (img.width(), img.height(), 1)) # Preprocessing
features = np.array([hand_landmarks.flatten()])
class_idx, pred_conf = clfm.test(features) # Get predicted class
class_idx, pred_conf = class_idx[0], pred_conf[0] # Handle multiple inputs and outputs, take the first element
msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}\n{name_classes[class_idx]}({class_idx})={pred_conf*100:.2f}%'
img.draw_string(obj.points[0], obj.points[1], msg, color = image.COLOR_RED if obj.class_id == 0 else image.COLOR_GREEN, scale = 1.4, thickness = 2)
detector.draw_hand(img, obj.class_id, obj.points, 4, 10, box=True)
disp.show(img)
```

The current `X_train` is based on the "14-Class Static Hand Gesture Dataset," which consists of 2850 samples, divided into 14 classes. The dataset can be downloaded from the provided [Baidu Netdisk link (Password: 6urr)](https://pan.baidu.com/s/1Sd-Ad88Wzp0qjGH6Ngah0g).


![](../../assets/handposex_14class.jpg)

1. Load the `14-class static hand gesture dataset` processed by the **Hand Keypoint Detection** model, extracting `20` relative wrist coordinate offsets.
2. Initially train on the first `4` classes to support basic gesture recognition.
3. Use the **Hand Keypoint Detection** model to process the camera input and visualize classification results on the screen.
4. Tap the top-right `class14` button to add more samples and retrain the model for full `14-class` gesture recognition.
5. Tap the bottom-right `class4` button to remove the added samples and retrain the model back to the `4-class` mode.
6. Tap the small area between the buttons to display the last training duration at the top of the screen.
7. Tap the remaining large area to show the currently supported gesture classes on the left side—**green** for supported, **yellow** for unsupported.

## Demo Video

<video playsinline controls autoplay loop muted preload src="/static/video/hand_gesture_demo.mp4" type="video/mp4">
Classifier Result Video
</video>
The implementation of this app is located at `MaixPy/projects/app_hand_gesture_classifier/main.py`, with the main logic as follows:

1. The video demonstrates the `14-class` mode after executing step `4`, recognizing gestures `1-10` (default mapped to other meanings), **OK**, **thumbs up**, **finger heart** (requires the back of the hand, hard to demonstrate in the video but can be verified), and **pinky stretch**—a total of `14` gestures.
1. Load the `14-class static hand gesture dataset`, which consists of `20` coordinate offsets relative to the wrist after processing by `hand keypoint detection`.
2. Initially train `4` gesture classifications **or directly load pre-trained `14` classifier parameters (switchable in the source code)** to support gesture recognition.
3. Load the `hand keypoint detection` model to process the camera input and visualize the results on the screen using the classifier.
4. Click the upper right corner `class14` to add the remaining classification samples and retrain to achieve `14` gesture classifications.
5. Click the lower right corner `class4` to remove the additional classification samples from the previous step and retrain to revert to `4` gesture classifications.
6. Click the small area between the buttons to display the duration of the last classifier training at the top.
7. Click other large areas to display the currently supported classification categories on the left side—green indicates supported, yellow indicates not supported.

2. Then, step `5` is executed, reverting to the `4-class` mode, where only gestures **1**, **5**, **10** (fist), and **OK** are recognizable. Other gestures fail to produce correct results. During this process, step `7` was also executed, showing the current `4-class` mode—only the first 4 gestures are green, and the remaining 10 are yellow.
<video playsinline controls autoplay loop muted preload src="/static/video/hand_gesture_demo.mp4" type="video/mp4">
Classifier Result video
</video>

3. Step `4` is executed again, restoring the `14-class` mode, and previously unrecognized gestures in the `4-class` mode are now correctly identified.
1. The demo video shows the execution of step `4` **or the bold part of step `2`**, demonstrating the `14-class` mode. It can recognize gestures `1-10` (with default corresponding English meanings), "OK", thumbs-up, heart shape (requires the back of the hand, difficult to demonstrate in the video but can be verified), and pinky stretch, making a total of `14` gestures.
2. Then, step `5` is executed to revert to the `4-class` mode, where only gestures `1`, `5`, `10` (fist), and "OK" can be recognized, while the remaining gestures do not produce correct results. Step `7` is also executed to show the current `4-class` mode, as only the first 4 gestures are displayed in green, and the remaining 10 are shown in yellow.
3. Step `4` is executed again to restore the `14-class` mode, and the gestures that could not be recognized in the `4-class` mode are now correctly identified.
4. Finally, the recognition of gestures with both hands is demonstrated, showing that the system can correctly identify gestures from both hands simultaneously.

4. Finally, dual-hand recognition is demonstrated, and both hands' gestures are accurately recognized simultaneously.

## Others

The demo video captures the **maixvision** screen preview window in the top-right corner, matching the actual on-screen display.

For detailed implementation, please refer to the source code and the above analysis.
**The demo video is captured from the preview window in the upper right corner of MaixVision, and it is consistent with the actual screen display.**

Further development or modification can be directly done based on the source code, which includes comments for guidance.
**For more detailed usage instructions or secondary development, please refer to the source code analysis mentioned above, which includes comments.**

If you need additional assistance, feel free to leave a message on **MaixHub** or send an email to the official company address.
If you still have questions or need assistance, you can post on `maixhub` or send an `e-mail` to the company at `[email protected]`. **Please use the subject `[help][MaixPy] guesture classification: xxx`**.
2 changes: 1 addition & 1 deletion docs/doc/en/vision/hand_landmarks.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
tite: 3D Coordinate Detection of 21 Hand Keypoints with MaixPy MaixCAM
title: 3D Coordinate Detection of 21 Hand Keypoints with MaixPy MaixCAM
update:
- date: 2024-12-31
version: v1.0
Expand Down
Loading

0 comments on commit 3a9da9f

Please sign in to comment.