Skip to content

fix(ascend):update README#673

Open
lywbarca wants to merge 1 commit intoUbiquitousLearning:mainfrom
lywbarca:ascend
Open

fix(ascend):update README#673
lywbarca wants to merge 1 commit intoUbiquitousLearning:mainfrom
lywbarca:ascend

Conversation

@lywbarca
Copy link
Copy Markdown

@lywbarca lywbarca commented May 6, 2026

Please check Guidelines for Contributing.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Ascend NPU backend with Qwen3 W8A8 quantized inference.
  • Improvements

    • Enhanced UTF-8 safe streaming output for more reliable text rendering.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

This PR introduces UTF-8 safe streaming output for Qwen Ascend inference by adding prefill warmup, buffered token decoding, and UTF-8 validation logic. The tokenizer initializes with UTF-8 locale, and output is accumulated and printed only when valid UTF-8 sequences are complete. Documentation is updated to announce the feature.

Changes

UTF-8 Safe Streaming and Tokenizer Configuration

Layer / File(s) Summary
Locale and Tokenizer Setup
mllm/models/qwen_ascend/tokenization_qwen_ascend.hpp
QwenAscendTokenizer constructor now explicitly initializes the local environment with UTF-8 locale via preprocessor::initLocal("C.UTF-8").
UTF-8 Validation Utilities
examples/qwen_ascend/main.cpp (lines 6–38)
Anonymous namespace function takeValidUtf8Prefix extracts valid UTF-8 prefixes from partial byte sequences; include set expanded to support UTF-8 and string handling.
Buffered Generation and Output
examples/qwen_ascend/main.cpp (lines 223–255)
QA generation path adds prefill warmup step, decodes tokens into pending_text buffer, validates and prints only complete UTF-8 chunks after each generation step, then ensures final newline.
Documentation
README.md
Latest News announces Ascend NPU backend support with Qwen3 W8A8 inference; Qwen3-0.6B table entry for W8A8 converted to hyperlink pointing to model card.

Sequence Diagram

sequenceDiagram
    participant Gen as Token Generator
    participant Buf as UTF-8 Buffer
    participant Val as UTF-8 Validator
    participant Out as Output

    loop Generation Loop
        Gen->>Buf: Decode token → pending_text
        Buf->>Val: Check for valid UTF-8 prefix
        Val->>Buf: Return valid prefix + remainder
        Buf->>Out: Print valid UTF-8 chunk
        Buf->>Buf: Keep remainder for next iteration
    end
    Out->>Out: Print final newline
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • oreomaker
  • xumengwei
  • liang1232018
  • yirongjie

Poem

🐰 Streams of tokens, UTF-8 flow,
Buffered wisdom, safe and slow,
Qwen meets Ascend with locale care,
Valid prefixes float through air!
hops

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title only mentions README update, but the changes significantly include UTF-8 streaming improvements in main.cpp and tokenization changes, making it incomplete and misleading. Update title to reflect all major changes, e.g., 'fix(ascend): add UTF-8 safe streaming and update README for Qwen3 W8A8 support'
Description check ⚠️ Warning The PR description contains only the template boilerplate with no actual description of changes, missing all details about the modifications made. Add a meaningful description explaining the UTF-8 streaming improvements, prefill warmup step, locale initialization changes, and README updates with relevant context.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/qwen_ascend/main.cpp`:
- Around line 19-35: The helper takeValidUtf8Prefix currently returns empty when
an invalid byte is at the buffer start and never consumes it; update
takeValidUtf8Prefix to detect irrecoverable leading bytes (e.g., a start octet
that would require >4 continuation bytes or otherwise cannot form a valid UTF‑8
sequence) and consume/discard one invalid byte so progress continues, while
still returning empty when nothing valid precedes it; also ensure the final
flush path uses takeValidUtf8Prefix on pending_text (instead of printing
pending_text directly) so only validated UTF‑8 is emitted. Reference: function
takeValidUtf8Prefix and the final tail-flush that prints pending_text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e278d7c-fbd2-4317-b44b-90eb3e3d5fd7

📥 Commits

Reviewing files that changed from the base of the PR and between 729ca4c and 3973c6a.

📒 Files selected for processing (3)
  • README.md
  • examples/qwen_ascend/main.cpp
  • mllm/models/qwen_ascend/tokenization_qwen_ascend.hpp

Comment on lines +19 to +35
std::string takeValidUtf8Prefix(std::string& pending_text) {
auto invalid = utf8::find_invalid(pending_text.begin(), pending_text.end());
if (invalid == pending_text.begin()) {
return {};
}

if (invalid == pending_text.end()) {
std::string ready_text;
ready_text.swap(pending_text);
return ready_text;
}

auto ready_bytes = static_cast<size_t>(std::distance(pending_text.begin(), invalid));
auto ready_text = pending_text.substr(0, ready_bytes);
pending_text.erase(0, ready_bytes);
return ready_text;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In utf8cpp, what does utf8::find_invalid(begin, end) return for (1) an incomplete multi-byte sequence at buffer start and (2) an irrecoverably invalid leading byte?

💡 Result:

For utf8::find_invalid(begin, end), the function returns an iterator pointing to the first invalid octet; if none are found it returns end. It does this by scanning from start and calling utf8::internal::validate_next(result, end) and immediately returning result when the returned error code is not UTF8_OK. [1] (1) Incomplete multi-byte sequence at buffer start: It returns begin (i.e., the iterator to the first byte of the incomplete sequence). validate_next is invoked starting from result==start, and any non-UTF8_OK (including INCOMPLETE_SEQUENCE) causes find_invalid to return the current iterator (the start of that invalid sequence). [1][2] (2) Irrecoverably invalid leading byte: It returns an iterator pointing to that invalid leading byte (the first byte whose validate_next call fails with a non-UTF8_OK error such as INVALID_LEAD). Since find_invalid returns the iterator value at which validate_next fails, it is the position of the bad leading octet. [1][2]

Citations:


🏁 Script executed:

# Check if the file exists and get its size
wc -l examples/qwen_ascend/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 101


🏁 Script executed:

# View the takeValidUtf8Prefix helper (lines 19-35)
sed -n '19,35p' examples/qwen_ascend/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 604


🏁 Script executed:

# View the tail flush logic (lines 250-260 for context around 252-254)
sed -n '250,260p' examples/qwen_ascend/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 451


🏁 Script executed:

# Check how pending_text is used in the loop - search for the streaming context
sed -n '240,270p' examples/qwen_ascend/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 981


Handle irrecoverable leading bytes to avoid stalled output and invalid tail flush.

When utf8::find_invalid detects an invalid byte at the buffer start, the helper returns empty without consuming it. An irrecoverably invalid leading byte (e.g., a malformed UTF-8 start octet) will therefore block progress indefinitely—the next iteration finds the same byte, returns empty again, and loops without advancing. Additionally, the final flush at line 252-254 directly prints pending_text without validation, allowing invalid UTF-8 to reach output even though the streaming loop uses the validated takeValidUtf8Prefix.

Discard irrecoverable leading bytes that exceed the maximum UTF-8 sequence length (4 bytes), and apply takeValidUtf8Prefix to the tail flush to ensure output is valid UTF-8.

Proposed fix
 std::string takeValidUtf8Prefix(std::string& pending_text) {
+  if (pending_text.empty()) {
+    return {};
+  }
   auto invalid = utf8::find_invalid(pending_text.begin(), pending_text.end());
   if (invalid == pending_text.begin()) {
-    return {};
+    // Keep short prefixes that may become valid with future bytes.
+    // UTF-8 max code point width is 4 bytes; longer invalid-at-begin likely means malformed lead.
+    if (pending_text.size() <= 4) {
+      return {};
+    }
+    pending_text.erase(0, 1);
+    return {};
   }
@@
-      if (!pending_text.empty()) {
-        fmt::print("{}", pending_text);
-      }
+      auto tail = takeValidUtf8Prefix(pending_text);
+      if (!tail.empty()) {
+        fmt::print("{}", tail);
+      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen_ascend/main.cpp` around lines 19 - 35, The helper
takeValidUtf8Prefix currently returns empty when an invalid byte is at the
buffer start and never consumes it; update takeValidUtf8Prefix to detect
irrecoverable leading bytes (e.g., a start octet that would require >4
continuation bytes or otherwise cannot form a valid UTF‑8 sequence) and
consume/discard one invalid byte so progress continues, while still returning
empty when nothing valid precedes it; also ensure the final flush path uses
takeValidUtf8Prefix on pending_text (instead of printing pending_text directly)
so only validated UTF‑8 is emitted. Reference: function takeValidUtf8Prefix and
the final tail-flush that prints pending_text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant