Real input sample testing#9
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a real-world, end-to-end pipeline test and supporting utilities to evaluate KNN classification using a legacy GameraXML training set against a new manuscript page sample. This helps quantify generalization gaps and provides tooling to inspect prediction quality visually.
Changes:
- Adds an integration-style pytest module covering smoke, determinism, and (optional) 5-fold / LOO accuracy checks.
- Adds test-support helpers to ingest the sample page, run classification, and print summary reports.
- Adds a visualization script plus a CSV vocabulary fixture used for label sanity checks.
Reviewed changes
Copilot reviewed 5 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| core/tests/test_real_input_knn.py | New end-to-end + CV/LOO tests for real sample input and legacy XML training DB |
| core/tests/sample_input/evaluate.py | Shared helpers to ingest/filter annotations, classify the sample page, and print a summary |
| core/tests/sample_input/visualize.py | Script to render annotation/prediction overlays for manual inspection |
| core/tests/sample_input/csv-square_notation_neume_level_newest.csv | Canonical vocabulary CSV fixture used for label sanity checks |
| core/tests/conftest.py | Registers the slow marker used by the new integration tests |
Comments suppressed due to low confidence (2)
core/tests/test_real_input_knn.py:94
test_real_page_smokerequests thetraining_dbfixture but never uses it (andclassify_page()reloads the XML training set internally). Keeping the unused fixture parameter forces an extra XML parse and feature extraction work; drop the fixture from the signature or refactorclassify_pageto accept a preloaded training set.
def test_real_page_smoke(page_glyphs, training_db, vocab):
classified, classifier = classify_page()
core/tests/test_real_input_knn.py:234
- The comment says “Index the full DB by id for O(1) … lookup”, but the implementation rebuilds
trainby scanningtraining_dbfor every held-out glyph ([g for g in training_db if g.id != held_out.id]), which is O(N) per iteration and will be noticeably slow for largerIC_LOO_LIMIT. Consider precomputing anid -> indexmap and using list slicing (training_db[:i] + training_db[i+1:]) or similar to avoid repeated full scans, and update the comment accordingly.
# Index the full DB by id for O(1) "everyone except this glyph" lookup.
correct = 0
for held_out in subset:
train = [g for g in training_db if g.id != held_out.id]
clf = InteractiveClassifier(k=1).fit(train)
pred = clf.predict(held_out)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
As a quick note, the training data was produced from a square notation manuscript; the manuscript page you used/that is featured in the PR is in hufnagel script, which will have contributed to the poor performance. The detection is very impressive, however! |
Yes, I noticed it and I am currently investigating the similarities and the differences in the neume shapes. I will try to look into the hufnagel and manually label a bit. |
|
Also put over on the Neon issue, but here's a quick Hufnagel sample annotation w/ neume labels |
Update: Using the Hufnagel annotation sample from Kyrie, the new prediction for NZ-Wt shows satisfiable result on a 98-glyph training set. |
Merge pull request #1 from DDMAL/ic_reimplementation
Description
Added an integration test for the pipeline using legacy XML manually labeled training data:
/core/tests/fixtures/Interactive_Classifier_GameraXML_TrainingData.xmlUsing this training data, I evaluated the model against a new real-world input page provided by Kyrie:
/core/tests/NZ-Wt MSR-03 109vResults & Observations:
The evaluation on new page yielded poor accuracy, indicating that the legacy GameraXML training data does not generalize well to this new manuscript page. However, on the LOO and n-fold testing on the training data alone, we got ~95% accuracy using k=1. To improve performance and ensure a reliable baseline, we will need to manually label the new page.
Associated Scripts & Changes
test_real_input_knn.py— Added the new pipeline/KNN test case using the real input.evaluate.py— Used to execute the evaluation pipeline.visulize.py— Used to analyze and visualize the poor classification results.conftest.py— Updated test fixtures and configurations to support the new fixture paths.Next Steps
/core/tests/NZ-Wt MSR-03 109vpage to provide accurate training data for the new manuscript type.