-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Passing Label Studio DOM to Predict Endpoint or Provide Documentation for Matching XPaths #6937
Comments
@heidi-humansignal Dear Heidi :) I’d really appreciate your help with my issue, as resolving it is critical for my ability to use Label Studio in my project—something I’d really love to do. I believe this could also benefit others working with complex HTML structures. To give some context: I’ve made the DOM available via an automatic download process and used it to create static XPaths that are highlighted in Label Studio. While this workaround demonstrates that a solution is possible (surprise! :)), it’s obviously not a long-term fix for my project. Any guidance or solution that’s less messy than my current approach would be greatly appreciated! Thank you so much in advance! |
I would also like to add that this would not only be helpful with html data, but also with complex pdfs converted to html. |
Hello, Thank you for your feature request! We greatly appreciate your feedback and the opportunity to consider your suggestion. Your request will be evaluated and ranked alongside other roadmap items. If our product team opts to proceed with your idea, we will keep you updated throughout the process. Please understand that while we take all requests seriously, we cannot promise implementation or a specific timeframe. Thank you,
|
Hello Abu, Thank you for considering this feature. I truly believe it could significantly improve HTML support in Label Studio. I really enjoy using Label Studio, but unfortunately, without this feature, it won’t be possible for me to continue my project. I’m working with PDFs of 100+ pages, including very complex tables that have been converted to HTML. Matching the DOM by parsing it in Python just isn’t realistic in this case. If there’s anything I can do to help—whether testing or providing more examples—please let me know. I’m happy to contribute as much as I can, though I’m not yet at the level to implement this properly myself. Hopefully, I’ll get there! Best regards, and keep up the great work! |
Hi Chris! Thanks for the additional context! I'll pass this along to the team and if we hear anything, we'll be sure to follow up here! Thanks, Tyler Conlee
|
@heidi-humansignal |
Description: I work with complex HTMLs, including intricate table structures. My ML backend predictions rely on static XPaths to highlight specific elements in Label Studio. However, the XPaths generated by my backend do not align with the Label Studio DOM, resulting in predictions that fail to display properly.
Proposed Solutions:
Allow the serialized Label Studio DOM to be sent to the predict endpoint as part of the request. This would enable the backend to generate XPaths that align with the Label Studio DOM.
To minimize unnecessary data traffic, this functionality could be optional and enabled only for tasks requiring it.
or
2. Provide support for dynamic XPaths to bypass the necessity of having an exact DOM match.
or
3. Provide Documentation for DOM Alignment:Alternatively, detailed documentation on how to structure or format custom DOMs in Python to match the Label Studio DOM would be immensely helpful. I’ve spent significant time trying to achieve this alignment but have faced challenges understanding the exact requirements.
Additional Note: Thank you for creating this fantastic software. It’s been incredibly helpful in streamlining my annotation workflows, and I look forward to seeing it evolve further!
The text was updated successfully, but these errors were encountered: