Skip to content

Comments

fix: strip thinking tags before all JSON parse strategies#209

Open
haosenwang1018 wants to merge 1 commit intoHKUDS:mainfrom
haosenwang1018:fix/strip-think-tags-before-parse
Open

fix: strip thinking tags before all JSON parse strategies#209
haosenwang1018 wants to merge 1 commit intoHKUDS:mainfrom
haosenwang1018:fix/strip-think-tags-before-parse

Conversation

@haosenwang1018
Copy link

Closes #159

When using reasoning models (qwen2.5-think, deepseek-r1, etc.), the <think>...</think> tags were only stripped in _extract_all_json_candidates(), meaning the regex fallback strategy (_extract_fields_with_regex) still operated on the raw response including thinking content. This could cause it to extract content from the thinking section rather than the actual analysis.

This fix moves the think-tag stripping to the top of _robust_json_parse() so all downstream strategies work with clean model output.

@LarFii
Copy link
Collaborator

LarFii commented Feb 24, 2026

Thanks for the fix.

I do see one potential side effect to consider:

  • Potential data loss from global tag stripping
    The current implementation removes <think>...</think> / <thinking>...</thinking> across the entire response before all parsing strategies.
    If those tags appear as legitimate literal content inside the actual payload (e.g., in detailed_description), that content would be removed unintentionally.

Also a minor maintainability point:

  • Duplicated cleanup logic
    The same think-tag stripping still exists in _extract_all_json_candidates(), so cleanup now happens in two places. It may be better to centralize this in one function to avoid drift.

Suggested refinement

Instead of global removal, strip only leading reasoning blocks (prefix-only), which still fixes the fallback issue while avoiding accidental mutation of valid body content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Analyzed data of tables not complete

2 participants