Skip to content

Add QNN-compatible ONNX export for non-streaming zipformer transducer.#2088

Merged
csukuangfj merged 1 commit into
k2-fsa:masterfrom
csukuangfj:zipformer-export-qnn
Jun 11, 2026
Merged

Add QNN-compatible ONNX export for non-streaming zipformer transducer.#2088
csukuangfj merged 1 commit into
k2-fsa:masterfrom
csukuangfj:zipformer-export-qnn

Conversation

@csukuangfj

@csukuangfj csukuangfj commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Use https://huggingface.co/reazon-research/reazonspeech-k2-v2 as an example.

You can find its PyTorch checkpoint, created from its onnx model files, at https://huggingface.co/csukuangfj/reazonspeech-k2-v2/tree/main/checkpoint

  ./zipformer/export-onnx.py \
    --enable-int8-quantization 0 \
    --max-len 1000 \
    --keep-x-lens 0 \
    --use-int32-inputs 1 \
    --dynamic-axes 0 \
    --epoch 99 \
    --avg 1 \
    --use-averaged-model 0 \
    --exp-dir ./reazonspeech-k2-v2/checkpoint \
    --tokens ./reazonspeech-k2-v2/tokens.txt \
    \
    --num-encoder-layers 2,2,4,5,4,2 \
    --feedforward-dim 512,768,1536,2048,1536,768 \
    --encoder-dim 192,256,512,768,512,256 \
    --encoder-unmasked-dim 192,192,256,320,256,192

You would get 3 ONNX files that are suitable for export to QNN.
It may also support other types of NPU but only Qualcomm NPU has been verified.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced ONNX export configuration with new options for sequence length, input data types, and quantization control
    • Added flexible encoder export variants with configurable input/output signatures
  • Chores

    • Adjusted ONNX Runtime logging verbosity for improved runtime control

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a89b4d87-02d4-41a5-9c62-e413f1254437

📥 Commits

Reviewing files that changed from the base of the PR and between 27e14b3 and 535b1d6.

📒 Files selected for processing (2)
  • egs/librispeech/ASR/zipformer/export-onnx.py
  • egs/librispeech/ASR/zipformer/onnx_pretrained.py

📝 Walkthrough

Walkthrough

This PR extends ONNX export configuration for the Zipformer ASR model by adding CLI flags for max sequence length, x_lens retention, input integer width, dynamic axes support, and int8 quantization control. The encoder gains a simplified single-input forward2 path, and export functions conditionally apply new parameters and batch size settings. Main wiring and logging configuration complete the changes.

Changes

ONNX Export Configuration and Runtime Logging

Layer / File(s) Summary
CLI arguments and encoder forward2 variant
egs/librispeech/ASR/zipformer/export-onnx.py
New CLI arguments (--max-len, --keep-x-lens, --use-int32-inputs, --dynamic-axes, --enable-int8-quantization) are added to the argument parser. OnnxEncoder gains a forward2 method that provides a simplified single-input, single-output inference path for batch-size-1 traces by deriving x_lens internally and returning only encoder_out.
Encoder export with conditional forward/forward2 selection
egs/librispeech/ASR/zipformer/export-onnx.py
export_encoder_model_onnx signature expands to accept max_len, dynamic_axes, use_int32_inputs, and keep_x_lens. The implementation conditionally selects between the original forward signature (with x_lens and encoder_out_lens outputs) and the new forward2 signature (single encoder_out output) based on keep_x_lens. x_lens dtype and dynamic_axes are controlled by the new parameters.
Decoder and Joiner export configuration updates
egs/librispeech/ASR/zipformer/export-onnx.py
export_decoder_model_onnx and export_joiner_model_onnx signatures accept use_int32_inputs and dynamic_axes. Decoder dummy input y is conditionally created as torch.int32 or torch.int64. Both functions conditionally pass dynamic_axes to ONNX export. Joiner export batch size for dummy inputs is reduced from 11 to 1.
Main function parameter wiring and int8 quantization guard
egs/librispeech/ASR/zipformer/export-onnx.py
main() wires new CLI parameters into the three export functions (encoder, decoder, joiner). An early return guard skips int8 quantization export logic when enable_int8_quantization is false.
ONNX Runtime logging severity configuration
egs/librispeech/ASR/zipformer/onnx_pretrained.py
OnnxModel.init now explicitly sets session_opts.log_severity_level = 3 to control ONNX Runtime logging verbosity.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • k2-fsa/icefall#2084: Adds similar --use-int32-inputs export control and int32/int64 casting logic for encoder and decoder integer inputs in a different ASR variant's export script.
  • k2-fsa/icefall#2086: Adjusts Zipformer ONNX export masking and input typing to use torch.int32 pathways, overlapping directly with this PR's int32 input handling.

Poem

🐰 Hops through exports with flags held high,
forward2 simplifies the path nearby,
Dynamic axes dance, int32 takes flight,
While logging whispers at proper height.
Batches shrink small, the config grows bright!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@csukuangfj csukuangfj merged commit 36e0420 into k2-fsa:master Jun 11, 2026
91 of 128 checks passed
@csukuangfj csukuangfj deleted the zipformer-export-qnn branch June 11, 2026 05:37

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several command-line arguments to export-onnx.py to support exporting Zipformer models with static shapes, int32 inputs, optional dynamic axes, and optional int8 quantization. It also adds a forward2 helper method to handle exports without x_lens and updates the export functions for the encoder, decoder, and joiner. Feedback on these changes points out a typo in a help message, an incorrect docstring for forward2, a global side-effect risk from modifying class-level methods instead of instance-level methods, and a missing type annotation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

"--dynamic-axes",
type=int,
default=1,
help="1 to support dynamic axes. 0 to diable dynamic axes",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the help message for --dynamic-axes: diable should be disable.

Suggested change
help="1 to support dynamic axes. 0 to diable dynamic axes",
help="1 to support dynamic axes. 0 to disable dynamic axes",

Comment on lines +251 to +262
"""Please see the help information of Zipformer.forward

Args:
x:
A 3-D tensor of shape (N, T, C)
x_lens:
A 1-D tensor of shape (N,). Its dtype is torch.int64
Returns:
Return a tuple containing:
- encoder_out, A 3-D tensor of shape (N, T', joiner_dim)
- encoder_out_lens, A 1-D tensor of shape (N,)
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for forward2 is incorrect. It lists x_lens as an argument and claims to return a tuple of (encoder_out, encoder_out_lens), but the function only accepts x and returns encoder_out directly.

        """Please see the help information of Zipformer.forward

        Args:
          x:
            A 3-D tensor of shape (N, T, C)
        Returns:
          Return encoder_out, A 3-D tensor of shape (N, T', joiner_dim)
        """

"encoder_out_lens": {0: "N"},
}
else:
encoder_model.__class__.forward = encoder_model.__class__.forward2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying the class-level forward method (encoder_model.__class__.forward = ...) is a global side effect that affects all instances of OnnxEncoder (and any future uses of the class). Since encoder_model.forward2 is already a bound method on the instance, you can safely assign it directly to the instance's forward attribute instead.

Suggested change
encoder_model.__class__.forward = encoder_model.__class__.forward2
encoder_model.forward = encoder_model.forward2

Comment on lines +440 to +441
use_int32_inputs,
dynamic_axes: int,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameter use_int32_inputs is missing a type annotation. Please add : int to keep it consistent with the other parameters in this function.

Suggested change
use_int32_inputs,
dynamic_axes: int,
use_int32_inputs: int,
dynamic_axes: int,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant