Skip to content

Conversation

dar326
Copy link

@dar326 dar326 commented Jul 28, 2025

I have encountered some scenarios where attempts to check for empty html content in the _process_html_content function in src/opendeepsearch/context_building/process_sources_pro.py are failing. If the html content is not an empty string but consists solely of white space, the html content will continue to be processed. This causes a bug downstream where embeddings for an empty documents list are requested and reranking fails due to an invalid matrix multiplication attempt between the query embeddings and document embeddings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant