Real input sample testing by Yueqiao12Zhang · Pull Request #9 · DDMAL/Standalone-Interactive-Classifier

Yueqiao12Zhang · 2026-05-20T15:44:56Z

Description

Added an integration test for the pipeline using legacy XML manually labeled training data:

/core/tests/fixtures/Interactive_Classifier_GameraXML_TrainingData.xml

Using this training data, I evaluated the model against a new real-world input page provided by Kyrie:

/core/tests/NZ-Wt MSR-03 109v

Results & Observations:
The evaluation on new page yielded poor accuracy, indicating that the legacy GameraXML training data does not generalize well to this new manuscript page. However, on the LOO and n-fold testing on the training data alone, we got ~95% accuracy using k=1. To improve performance and ensure a reliable baseline, we will need to manually label the new page.

Associated Scripts & Changes

test_real_input_knn.py — Added the new pipeline/KNN test case using the real input.
evaluate.py — Used to execute the evaluation pipeline.
visulize.py — Used to analyze and visualize the poor classification results.
conftest.py — Updated test fixtures and configurations to support the new fixture paths.

Next Steps

Manually label the /core/tests/NZ-Wt MSR-03 109v page to provide accurate training data for the new manuscript type.

…he docstring.

Copilot

Pull request overview

Adds a real-world, end-to-end pipeline test and supporting utilities to evaluate KNN classification using a legacy GameraXML training set against a new manuscript page sample. This helps quantify generalization gaps and provides tooling to inspect prediction quality visually.

Changes:

Adds an integration-style pytest module covering smoke, determinism, and (optional) 5-fold / LOO accuracy checks.
Adds test-support helpers to ingest the sample page, run classification, and print summary reports.
Adds a visualization script plus a CSV vocabulary fixture used for label sanity checks.

Reviewed changes

Copilot reviewed 5 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
core/tests/test_real_input_knn.py	New end-to-end + CV/LOO tests for real sample input and legacy XML training DB
core/tests/sample_input/evaluate.py	Shared helpers to ingest/filter annotations, classify the sample page, and print a summary
core/tests/sample_input/visualize.py	Script to render annotation/prediction overlays for manual inspection
core/tests/sample_input/csv-square_notation_neume_level_newest.csv	Canonical vocabulary CSV fixture used for label sanity checks
core/tests/conftest.py	Registers the `slow` marker used by the new integration tests

Comments suppressed due to low confidence (2)

core/tests/test_real_input_knn.py:94

test_real_page_smoke requests the training_db fixture but never uses it (and classify_page() reloads the XML training set internally). Keeping the unused fixture parameter forces an extra XML parse and feature extraction work; drop the fixture from the signature or refactor classify_page to accept a preloaded training set.

def test_real_page_smoke(page_glyphs, training_db, vocab):
    classified, classifier = classify_page()

core/tests/test_real_input_knn.py:234

The comment says “Index the full DB by id for O(1) … lookup”, but the implementation rebuilds train by scanning training_db for every held-out glyph ([g for g in training_db if g.id != held_out.id]), which is O(N) per iteration and will be noticeably slow for larger IC_LOO_LIMIT. Consider precomputing an id -> index map and using list slicing (training_db[:i] + training_db[i+1:]) or similar to avoid repeated full scans, and update the comment accordingly.

    # Index the full DB by id for O(1) "everyone except this glyph" lookup.
    correct = 0
    for held_out in subset:
        train = [g for g in training_db if g.id != held_out.id]
        clf = InteractiveClassifier(k=1).fit(train)
        pred = clf.predict(held_out)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kyrieb-ekat · 2026-05-20T15:52:20Z

As a quick note, the training data was produced from a square notation manuscript; the manuscript page you used/that is featured in the PR is in hufnagel script, which will have contributed to the poor performance. The detection is very impressive, however!

Yueqiao12Zhang · 2026-05-20T15:58:07Z

As a quick note, the training data was produced from a square notation manuscript; the manuscript page you used/that is featured in the PR is in hufnagel script, which will have contributed to the poor performance. The detection is very impressive, however!

Yes, I noticed it and I am currently investigating the similarities and the differences in the neume shapes. I will try to look into the hufnagel and manually label a bit.

kyrieb-ekat · 2026-05-20T16:55:26Z

Also put over on the Neon issue, but here's a quick Hufnagel sample annotation w/ neume labels
rapid_hufnagel_annotation.csv

…verted files included.

Yueqiao12Zhang · 2026-05-21T15:45:47Z

Description

Added an integration test for the pipeline using legacy XML manually labeled training data:

/core/tests/fixtures/Interactive_Classifier_GameraXML_TrainingData.xml

Using this training data, I evaluated the model against a new real-world input page provided by Kyrie:

/core/tests/NZ-Wt MSR-03 109v

Results & Observations: The evaluation on new page yielded poor accuracy, indicating that the legacy GameraXML training data does not generalize well to this new manuscript page. However, on the LOO and n-fold testing on the training data alone, we got ~95% accuracy using k=1. To improve performance and ensure a reliable baseline, we will need to manually label the new page.

Associated Scripts & Changes

test_real_input_knn.py — Added the new pipeline/KNN test case using the real input.

evaluate.py — Used to execute the evaluation pipeline.

visulize.py — Used to analyze and visualize the poor classification results.

conftest.py — Updated test fixtures and configurations to support the new fixture paths.

Next Steps

Manually label the /core/tests/NZ-Wt MSR-03 109v page to provide accurate training data for the new manuscript type.

Update: Using the Hufnagel annotation sample from Kyrie, the new prediction for NZ-Wt shows satisfiable result on a 98-glyph training set.

…test input

Merge pull request #1 from DDMAL/ic_reimplementation

Yueqiao12Zhang added 7 commits May 20, 2026 11:30

refactor: extra pytest settings

4dbe104

feat: helper for evaluation

287fc45

feat: methods for visualization of test output

1978cee

feat: tests for real scenario input, recommended bash command is in t…

26f9ab8

…he docstring.

doc: image and glyph names for reference

12a54a3

doc: neume mapping names

d2cdf4a

test: annotated pic

bb70537

Yueqiao12Zhang requested a review from Copilot May 20, 2026 15:44

Yueqiao12Zhang self-assigned this May 20, 2026

Yueqiao12Zhang requested a review from kunfang98927 May 20, 2026 15:45

Copilot started reviewing on behalf of Yueqiao12Zhang May 20, 2026 15:45 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread core/tests/test_real_input_knn.py

Comment thread core/tests/test_real_input_knn.py

Comment thread core/tests/test_real_input_knn.py Outdated

Comment thread core/tests/test_real_input_knn.py

Comment thread core/tests/sample_input/helpers/evaluate.py

refactor: small fixes suggested by copilot

d2622e4

Yueqiao12Zhang linked an issue May 20, 2026 that may be closed by this pull request

Add a minimal non-frontend test workflow for the Interactive Classifier #7

Open

Yueqiao12Zhang added 4 commits May 21, 2026 10:40

refactor: minor fix on file system

2ea9781

refactor: new hufnagel training set added, conversion scripts and con…

48b5649

…verted files included.

refactor: predicting the NZ-Wt page

d53c1b8

test: NZ-Wt result look fine on a small dataset

dc13a98

Yueqiao12Zhang and others added 5 commits May 21, 2026 12:20

refactor: enhance visualize to support classified and non-classified …

20e91e7

…test input

tests: better visuals

a7fa979

refactor: implemented a new run_pipeline to test without a front-end

5ecd477

refactor: rename training file

3fe18ad

Merge pull request #11 from DDMAL/main

e9f741e

Merge pull request #1 from DDMAL/ic_reimplementation

Yueqiao12Zhang mentioned this pull request May 22, 2026

Adding feature to XML-io #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real input sample testing#9

Real input sample testing#9
Yueqiao12Zhang wants to merge 17 commits into
mainfrom
real-input-sample-testing

Yueqiao12Zhang commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyrieb-ekat commented May 20, 2026

Uh oh!

Yueqiao12Zhang commented May 20, 2026

Uh oh!

kyrieb-ekat commented May 20, 2026

Uh oh!

Yueqiao12Zhang commented May 21, 2026

Description

Associated Scripts & Changes

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yueqiao12Zhang commented May 20, 2026

Description

Associated Scripts & Changes

Next Steps

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyrieb-ekat commented May 20, 2026

Uh oh!

Yueqiao12Zhang commented May 20, 2026

Uh oh!

kyrieb-ekat commented May 20, 2026

Uh oh!

Yueqiao12Zhang commented May 21, 2026

Description

Associated Scripts & Changes

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants