MammAlps PR Description #832

luciehmct · 2025-09-19T16:14:00Z

Summary

Adds the MammAlps evaluation suite, with three subtasks (animal, action, activity recognition) grounded in alpine camera-trap footage.
Provides a _cot.jsonl–based dataset_builder.py that produces Hugging Face–ready datasets (animalkingdom, mammalnet, or mammalps) with unified splits.
Uses shared utilities (mammalps_doc_to_visual, mammalps_doc_to_text, mammalps_doc_to_target, mammalps_process_results) and strict Jaccard scoring with aggregation for reproducible results.
Documents InternVL3 video evaluation defaults (OpenGVLab/InternVL3-8B, batch size 1, num_frame=32, use_temporal_context=True).

Details

Dataset specifics: MammAlps contains Swiss National Park wildlife clips annotated for species, fine-grained actions, and higher-level activities. The builder can output unified datasets with consistent directory structures and Hugging Face JSON records.
Evaluation flow:
- Each subtask config loads clips from luciehmct/mammalps.
- Predictions are parsed with the “Final answer: [...]” format.
- Per-example logs (prompt, response, parsed labels, ground truth, Jaccard score) are stored under results/<model>_<timestamp>/mammalps_<subtask>.jsonl.
- The global Jaccard metric (in lmms_eval/api/metrics.py) computes strict overlap before mean aggregation.
InternVL3 integration: Frames are timestamped when use_temporal_context=True, giving the model richer temporal cues.

Testing

# Action recognition
python -m lmms_eval \
  --model internvl3 \
  --model_args "pretrained=OpenGVLab/InternVL3-8B,modality=video,num_frame=32,use_temporal_context=True" \
  --tasks mammalps_action \
  --batch_size 1 \
  --output_path "$OUT_DIR"

# Run all three MammAlps subtasks together
python -m lmms_eval \
  --model internvl3 \
  --model_args "pretrained=OpenGVLab/InternVL3-8B,modality=video,num_frame=32,use_temporal_context=True" \
  --tasks mammalps \
  --batch_size 1 \
  --output_path "$OUT_DIR"

…lemonade_eval

… sampling

…n on alpine wildlife videos - Implements animal, action, and activity recognition subtasks for the MammAlps dataset - Includes dataset builder, YAML configs, and strict Jaccard metric evaluation - Utilities for prompt/answer extraction, result processing, and HuggingFace video download - See mammalps/README.md for details and usage instructions

kcz358 · 2025-09-26T02:16:36Z

lmms_eval/tasks/mammalps/utils.py

+        # Create model_used_date_time directory structure in results directory
+        import datetime
+
+        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M")
+        model_name = "InternVL3-8B"  # Can be made configurable if needed
+
+        # Use results directory in the lmms-eval repository
+        results_base_dir = os.path.join(os.getcwd(), "results")
+        output_dir = os.path.join(results_base_dir, f"{model_name}_{timestamp}")
+
+        # Create directory if it doesn't exist
+        os.makedirs(output_dir, exist_ok=True)


This part is hardcoded

kcz358

Hi, Thank you for the PR! I think the PR include the PR changes from your previous PR. Do you guys want to merge all the changes in one PR or you want to merge it separately

luciehmct · 2025-10-06T15:46:08Z

Putting this PR on hold; will revisit and reopen later.

TashkovskaMatea and others added 13 commits July 5, 2025 20:59

Video loader with caching and download

139a2b4

Video loader with caching and download

fe24c04

Merge branch 'lemonade_eval' of github.com:amathislab/lmms-eval into …

6a3ef2d

…lemonade_eval

black and isort formating

99af1df

clean imports

34d435f

Video loader with caching and download

a9caecd

black and isort formating

87cf67a

clean imports

0391cd4

Merge branch 'main' of github.com:amathislab/lmms-eval into main

c8c8d3c

implement coderabbitai comments

060935d

download data in cache

50d70da

InternVL3: register model, timestamp for prompt, uniform and adaptive…

2743e65

… sampling

kcz358 reviewed Sep 26, 2025

View reviewed changes

luciehmct closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MammAlps PR Description #832

MammAlps PR Description #832

Uh oh!

luciehmct commented Sep 19, 2025

Uh oh!

kcz358 Sep 26, 2025

Uh oh!

kcz358 left a comment

Uh oh!

luciehmct commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MammAlps PR Description #832

MammAlps PR Description #832

Uh oh!

Conversation

luciehmct commented Sep 19, 2025

Summary

Details

Testing

Uh oh!

kcz358 Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment

Choose a reason for hiding this comment

Uh oh!

luciehmct commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants