Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extracted excel data in text_as_html have negative values #3934

Open
pradyrk opened this issue Feb 22, 2025 · 2 comments
Open

extracted excel data in text_as_html have negative values #3934

pradyrk opened this issue Feb 22, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@pradyrk
Copy link

pradyrk commented Feb 22, 2025

Describe the bug
I have a xlsx , read using partition_xlsx and parsing the text_as_html , I can find negative values whereas excel doesnt have any negative value

To Reproduce
from unstructured.partition.xlsx import partition_xlsx

elements = partition_xlsx(filename="excelfilepath")
print(elements[0].metadata.text_as_html)

Expected behavior
Values should be positive and exact value needs to be extracted

Screenshots

Image

Image

Environment Info
Databricks cluster - 16.1 ML Runtime - Complete details - https://docs.databricks.com/aws/en/release-notes/runtime/16.1ml

Additional context
file is sensitive , wouldnt be able to share the actual file , providing any direction to resolve this can help

@pradyrk pradyrk added the bug Something isn't working label Feb 22, 2025
@pradyrk
Copy link
Author

pradyrk commented Feb 22, 2025

If we notice 98 is the value in the sheet was extracted as -97.90044 , decimal value is still fine but negative value is not acceptable

@aswinjoseroy
Copy link

Maybe try reading the excel through a normal library such as pandas once to verify if the actual data is not different to what's being displayed. Excel can be weird with it's data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants