You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Post parsing PDF , how to validate the parsing results
To Reproduce
detection_class_prob, This key is not consistent that is, it is not available for all extracted elements.
Expected behavior
Let's say i am parsing a pdf which have images, texts, tables as image etc. I have used partition_pdf() and used hi_res as strategy. Now the behaviour should ,for each element in metadata ,detection_class_prob key should be available which will tell confidence score.However i am not seeing the detection_class_prob for few elements. Like for a Table element detection_class_prob is available and for Image element detection_class_prob is not, Simillarly for other elements the key is unavailable. Expected is to have this key for all the elements.
Screenshots
Environment Info
please use 👍
unstructured version : 0.16.23
raw_pdf_elements=partition_pdf(
filename="/content/data/Cocktails_Spirits.pdf",
strategy="hi_res",
infer_table_structure=True, # Infers table structures from contentextract_images_in_pdf=True, # Extract images from the PDFextract_image_block_types=["Image", "Table"], # Image and Table extractionextract_image_block_to_payload=True, # Return images in the responseoutput_format="application/json", # JSON output formatextract_image_block_output_dir="extracted_data_test"
)
Additional context
probabilities value we should get.
The text was updated successfully, but these errors were encountered:
The detection_class_prob field is not always present as not all detection methods rely on probabilistic approaches. It could be that in those cases the field should still be available but e.g. with value of 1.
@christinestraub for helping to decide whether we should reclassify it from a bug and whether there's a room for change here at the time.
Describe the bug
Post parsing PDF , how to validate the parsing results
To Reproduce
detection_class_prob, This key is not consistent that is, it is not available for all extracted elements.
Expected behavior
Let's say i am parsing a pdf which have images, texts, tables as image etc. I have used partition_pdf() and used hi_res as strategy. Now the behaviour should ,for each element in metadata ,detection_class_prob key should be available which will tell confidence score.However i am not seeing the detection_class_prob for few elements. Like for a Table element detection_class_prob is available and for Image element detection_class_prob is not, Simillarly for other elements the key is unavailable. Expected is to have this key for all the elements.
Screenshots
Environment Info
please use 👍
unstructured version : 0.16.23
Additional context
probabilities value we should get.
The text was updated successfully, but these errors were encountered: