Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# movement: Human Pose Estimation Format Support (Peter Nelson Subrata)

## Personal details

- **Full name:** Peter Nelson Subrata
- **Email:** peterns1609@gmail.com
- **GitHub username:** PewterZz
- **Zulip username:** Peter Nelson Subrata
- **Location & time-zone:** Jakarta, Indonesia — GMT+7
- **Code contribution:** https://github.com/neuroinformatics-unit/movement/pull/914
- **Proposal discussion link:** https://github.com/neuroinformatics-unit/gsoc/pull/97

## Project proposal

### Synopsis

movement currently supports animal pose estimation formats (DeepLabCut, SLEAP,
LightningPose, Anipose) but has no support for human-focused formats, MMPose,
COCO keypoints, FreeMocap, motionBIDS, BVH, and C3D. Researchers studying
human movement in rehabilitation, sports science, or clinical gait analysis
cannot load their data into movement without writing custom preprocessing
scripts. This project implements loaders for at least three of these formats,
following the existing from_sleap_file/from_dlc_file pattern and producing
the standard xarray.Dataset output that works with all of movement's
kinematics and visualisation tools. The wanted result is that movement becomes
a unified interface across pose estimation frameworks that extends
to human data as well.

### Implementation timeline

**Minimal deliverables:**
- MMPose JSON loader (COCO-17, Halpe-26, custom schemas, multi-individual)
- COCO keypoint annotation loader (visibility-to-confidence mapping)
- FreeMocap 3D loader (first 3D format in movement)
- motionBIDS loader (BIDS directory traversal, TSV + JSON sidecar)
- Unit and integration tests for each loader
- Gallery examples for each format

**Stretch goals:**
- BVH and C3D motion capture loaders
- from_file() format detection utility

**Weekly timeline** (12 weeks, ~30 hours/week):

| Weeks | Work |
|-------|------|
| Bonding | Read codebase, finalise MMPose loader from pre-proposal prototype, discuss any API decisions with mentor |
| 1–2 | MMPose loader: complete implementation, COCO-17/Halpe-26/COCO-133 schemas, multi-individual grouping, unit tests |
| 3–4 | COCO keypoint loader, visibility-to-confidence mapping, integration tests, MMPose gallery example |
| 5–6 | FreeMocap 3D loader, verify kinematics tools handle 3D space coord, gallery example |
| 7–8 | motionBIDS loader, BIDS directory traversal, sidecar metadata extraction |
| 9–10 | BVH loader (stretch), forward kinematics via bvhio library |
| 11 | C3D loader (stretch), from_file format detection utility |
| 12 | Documentation, gallery examples, cleanup, final PR review |

### Communication plan

Weekly async updates on the movement GitHub discussions or Zulip covering
what was completed and any blockers. All work submitted as focused PRs, one
loader per PR. Available for video calls during European morning hours.

## Personal statement

### Past experience

I have a background in computer vision engineering. One area I worked on was
video capture models for collecting gameplay data — writing pipelines that
process video frames, track objects and joints across frames, and feed that
data into downstream systems. That work gave me hands-on experience with
pose estimation tools including MMPose and MediaPipe, which are the backbone
of two of the formats this project targets (MMPose JSON and FreeMocap).

On the open source side: I have a merged PR to pytorch/ignite and sktime, an approved
PR to kornia, and an open fix in movement itself (PR #914, fixing a silent
crash in compute_time_derivative on single-frame data). I also built a
proof of concept MMPose loader on a fork branch that correctly parses
MMPose JSON predictions into the movement xarray.Dataset schema.

### Motivation: why this project?

I want to contribute to serious open source scientific software and this
project sounds like a perfect fit for me. Building a common interface so researchers
don't have to reinvent the loading logic for every tool they use is something I am interested in.
Extending that to human pose formats is a natural next step and I believe I am well positioned
to do that given my background with the relevant tools.

### Match: why me?

The MMPose loader proof-of-concept is already working on my fork. I
understand the from_numpy builder pattern from reading the existing loaders
and have already made a fix to the kinematics module. The CV background means
I won't be learning the formats from scratch — I have used MMPose, COCO
keypoint format, and MediaPipe in production work and understand their
quirks (schema mismatches, multi-individual grouping, 3D coordinate handling).

### Availability

I am essentially fully available during the GSoC period with no competing
commitments.

## GSoC

### GSoC experience

I expect structured mentorship, high quality code reviews to learn from, and the experience of working in a frontier computer vision library.

### Other applications

Yes, I am also applying to Gemini CLI (Google) and MesaLLM.
My preference is GeminiCLI if there is a tie.