Custom Classification model behavior #38981

galvangoh · 2024-12-24T01:59:20Z

Package Name: azure-ai-documentintelligence
Package Version: 1.0.0b2
Operating System: Windows
Python Version: 3.10.9

Describe the bug
I trained the custom model via studio on 2 different titled document which share very similar pattern in its template. My classified label from the custom model is used to decide the logic flow of my application later on. Sometimes, document A gets classified as document B and vice versa.

The documentation mentioned "Custom classification models are deep-learning-model types that combine layout and language features to accurately detect and identify documents...". I don't think the "layout" here points to the layout model because blocks of text is extracted by the layout model which the custom classification does not do that. Unless there is a way to composed prebuilt and custom models, how can I classify my documents more properly? I'm happy to remain at the current version of the API and will only upgrade if there are improvement to the classification capability of the base neural model itself.

In the screenshot below, I show 2 document which I want to classify into its own label (see top right).

Questions:

As the document are titled differently, does the custom model picks up the title as a feature to distinguish them properly?
Handwritings and stamps can appear in random locations in the document. Does the custom model picks these as features during training?
What sort of training data is sufficient so that both document can be correctly classified?

Expected behavior
Accurate classification of different titled document despite sharing the same template.

Screenshots

xiangyan99 · 2024-12-30T17:53:38Z

Thanks for the feedback, we’ll investigate asap.

galvangoh · 2025-01-10T02:09:38Z

hello there, is there any updates?

github-actions · 2025-01-11T01:13:09Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @bojunehsu @vkurpad.

YalinLi0312 · 2025-01-11T01:15:10Z

@bojunehsu can you help to explain the behavior?

bojunehsu · 2025-01-13T17:23:14Z

This is indeed a more challenging case for custom classifiers. Ideally, by providing 10-100+ diverse examples of each class (DELIVERY DOCKET, PICKING SLIP), the algorithm will learn to prioritize these words when making a classification decision. In practice, the model architecture is currently not optimized for this type of classification where classes only differ by a few key words. Though it may still work though given sufficient training examples, for this specific scenario, I would recommend running prebuilt-read and using the existence of these 2 key phrases for classification.

galvangoh · 2025-01-14T01:31:41Z

@bojunehsu thanks for the explanation

xiangyan99 added Client This issue points to a problem in the data-plane of the library. Document Intelligence and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Dec 30, 2024

xiangyan99 assigned YalinLi0312 Dec 30, 2024

github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 30, 2024

YalinLi0312 added the Service Attention Workflow: This issue is responsible by Azure service team. label Jan 11, 2025

galvangoh closed this as completed Jan 14, 2025

xiangyan99 added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Classification model behavior #38981

Custom Classification model behavior #38981

galvangoh commented Dec 24, 2024

xiangyan99 commented Dec 30, 2024

galvangoh commented Jan 10, 2025

github-actions bot commented Jan 11, 2025

YalinLi0312 commented Jan 11, 2025

bojunehsu commented Jan 13, 2025

galvangoh commented Jan 14, 2025

Custom Classification model behavior #38981

Custom Classification model behavior #38981

Comments

galvangoh commented Dec 24, 2024

xiangyan99 commented Dec 30, 2024

galvangoh commented Jan 10, 2025

github-actions bot commented Jan 11, 2025

YalinLi0312 commented Jan 11, 2025

bojunehsu commented Jan 13, 2025

galvangoh commented Jan 14, 2025