Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Passing Label Studio DOM to Predict Endpoint or Provide Documentation for Matching XPaths #6937

Open
chrisdukeLlama opened this issue Jan 20, 2025 · 6 comments

Comments

@chrisdukeLlama
Copy link

Description: I work with complex HTMLs, including intricate table structures. My ML backend predictions rely on static XPaths to highlight specific elements in Label Studio. However, the XPaths generated by my backend do not align with the Label Studio DOM, resulting in predictions that fail to display properly.

Proposed Solutions:

  1. Option to Pass Label Studio DOM to Predict Endpoint:
    Allow the serialized Label Studio DOM to be sent to the predict endpoint as part of the request. This would enable the backend to generate XPaths that align with the Label Studio DOM.
    To minimize unnecessary data traffic, this functionality could be optional and enabled only for tasks requiring it.

or
2. Provide support for dynamic XPaths to bypass the necessity of having an exact DOM match.

or
3. Provide Documentation for DOM Alignment:Alternatively, detailed documentation on how to structure or format custom DOMs in Python to match the Label Studio DOM would be immensely helpful. I’ve spent significant time trying to achieve this alignment but have faced challenges understanding the exact requirements.

Additional Note: Thank you for creating this fantastic software. It’s been incredibly helpful in streamlining my annotation workflows, and I look forward to seeing it evolve further!

@chrisdukeLlama
Copy link
Author

@heidi-humansignal Dear Heidi :)

I’d really appreciate your help with my issue, as resolving it is critical for my ability to use Label Studio in my project—something I’d really love to do. I believe this could also benefit others working with complex HTML structures.

To give some context: I’ve made the DOM available via an automatic download process and used it to create static XPaths that are highlighted in Label Studio. While this workaround demonstrates that a solution is possible (surprise! :)), it’s obviously not a long-term fix for my project.

Any guidance or solution that’s less messy than my current approach would be greatly appreciated!

Thank you so much in advance!

@chrisdukeLlama
Copy link
Author

I would also like to add that this would not only be helpful with html data, but also with complex pdfs converted to html.

@heidi-humansignal
Copy link
Collaborator

Hello,

Thank you for your feature request! We greatly appreciate your feedback and the opportunity to consider your suggestion. Your request will be evaluated and ranked alongside other roadmap items. If our product team opts to proceed with your idea, we will keep you updated throughout the process. Please understand that while we take all requests seriously, we cannot promise implementation or a specific timeframe.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@chrisdukeLlama
Copy link
Author

@heidi-humansignal

Hello Abu,

Thank you for considering this feature. I truly believe it could significantly improve HTML support in Label Studio. I really enjoy using Label Studio, but unfortunately, without this feature, it won’t be possible for me to continue my project.

I’m working with PDFs of 100+ pages, including very complex tables that have been converted to HTML. Matching the DOM by parsing it in Python just isn’t realistic in this case. If there’s anything I can do to help—whether testing or providing more examples—please let me know. I’m happy to contribute as much as I can, though I’m not yet at the level to implement this properly myself. Hopefully, I’ll get there!

Best regards, and keep up the great work!

@heidi-humansignal
Copy link
Collaborator

Hi Chris!

Thanks for the additional context! I'll pass this along to the team and if we hear anything, we'll be sure to follow up here!

Thanks,

Tyler Conlee
Head of Support
HumanSignal

Comment by Tyler Conlee
Workflow Run

@chrisdukeLlama
Copy link
Author

@heidi-humansignal
Hi Tyler,
Thanks a lot for your work. I will still be trying to implement it myself, because I really need it, better yesterday than tomorrow, but of course I would really love a nice implementation instead of my hack job!
Best regards
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants