-
Notifications
You must be signed in to change notification settings - Fork 193
fix(ascend):update README #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lywbarca
wants to merge
1
commit into
UbiquitousLearning:main
Choose a base branch
from
lywbarca:ascend
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+46
−6
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
In utf8cpp, what doesutf8::find_invalid(begin, end)return for (1) an incomplete multi-byte sequence at buffer start and (2) an irrecoverably invalid leading byte?💡 Result:
For utf8::find_invalid(begin, end), the function returns an iterator pointing to the first invalid octet; if none are found it returns end. It does this by scanning from start and calling utf8::internal::validate_next(result, end) and immediately returning result when the returned error code is not UTF8_OK. [1] (1) Incomplete multi-byte sequence at buffer start: It returns begin (i.e., the iterator to the first byte of the incomplete sequence). validate_next is invoked starting from result==start, and any non-UTF8_OK (including INCOMPLETE_SEQUENCE) causes find_invalid to return the current iterator (the start of that invalid sequence). [1][2] (2) Irrecoverably invalid leading byte: It returns an iterator pointing to that invalid leading byte (the first byte whose validate_next call fails with a non-UTF8_OK error such as INVALID_LEAD). Since find_invalid returns the iterator value at which validate_next fails, it is the position of the bad leading octet. [1][2]
Citations:
🏁 Script executed:
# Check if the file exists and get its size wc -l examples/qwen_ascend/main.cppRepository: UbiquitousLearning/mllm
Length of output: 101
🏁 Script executed:
Repository: UbiquitousLearning/mllm
Length of output: 604
🏁 Script executed:
Repository: UbiquitousLearning/mllm
Length of output: 451
🏁 Script executed:
Repository: UbiquitousLearning/mllm
Length of output: 981
Handle irrecoverable leading bytes to avoid stalled output and invalid tail flush.
When
utf8::find_invaliddetects an invalid byte at the buffer start, the helper returns empty without consuming it. An irrecoverably invalid leading byte (e.g., a malformed UTF-8 start octet) will therefore block progress indefinitely—the next iteration finds the same byte, returns empty again, and loops without advancing. Additionally, the final flush at line 252-254 directly printspending_textwithout validation, allowing invalid UTF-8 to reach output even though the streaming loop uses the validatedtakeValidUtf8Prefix.Discard irrecoverable leading bytes that exceed the maximum UTF-8 sequence length (4 bytes), and apply
takeValidUtf8Prefixto the tail flush to ensure output is valid UTF-8.Proposed fix
std::string takeValidUtf8Prefix(std::string& pending_text) { + if (pending_text.empty()) { + return {}; + } auto invalid = utf8::find_invalid(pending_text.begin(), pending_text.end()); if (invalid == pending_text.begin()) { - return {}; + // Keep short prefixes that may become valid with future bytes. + // UTF-8 max code point width is 4 bytes; longer invalid-at-begin likely means malformed lead. + if (pending_text.size() <= 4) { + return {}; + } + pending_text.erase(0, 1); + return {}; } @@ - if (!pending_text.empty()) { - fmt::print("{}", pending_text); - } + auto tail = takeValidUtf8Prefix(pending_text); + if (!tail.empty()) { + fmt::print("{}", tail); + }🤖 Prompt for AI Agents