Releases: quic/efficient-transformers
Releases · quic/efficient-transformers
release/v1.20.0
Release Summary
New Features
- SpD, multiprojection heads (#306): Implemented post-attention hidden size projections to speculate tokens ahead of the base model.
- Adding compilation support for io_encrypt flag (#393): Added support for Model-IP I/O encryption feature using qaic-exec (compile only).
- Multi model end to end test pipelines (#337): Introduced comprehensive test pipelines for multi-model scenarios.
- QNN Compilation path Support in QEFFBaseModel class (#374): Added support for QNN compilation path in the QEFFBaseModel class.
- CLI support for multimodel (AutoModelForImageTexttoText) (#287): Enabled CLI support for multimodel .
- Support for Disaggregated serving (#365): Added support for disaggregated serving.
- Support for GGUF model execution (without quantized weights) (#368): Enabled GGUF model execution without quantized weights.
- SwiftKV backup Pending changes (#367): Implemented pending changes for SwiftKV backup.
- Upgrading transformers version to latest 4.50 (#331): Upgraded transformers to version 4.50.
- Added support for gradient checkpointing in the finetuning script (#338): Enabled gradient checkpointing in the finetuning script.
- Passing device type in torch GradScaler (#345): Added functionality to pass device type in torch GradScaler.
- Enabled FP8 models for replicate_kv_heads script (#353): Enabled FP8 models for the replicate_kv_heads script.
- QNN Compilation command changes (#327): Updated QNN compilation commands.
Newly Onboarded Models
- Granite Vision (#359): Onboarded Granite Vision model.
- Granite MOE (#362): Onboarded Granite MOE model.
Bug Fixes
- Config dump issue when apps sdk not installed for qnn execution (#379): Fixed config dump issue when apps SDK is not installed for QNN execution.
- Removed device IDs from the test (#389): Removed device IDs from the test suite.
- Compilation issue of VLMs with mxint8(kv) precision (#336): Fixed compilation issue of VLMs with mxint8(kv) precision.
- Retrying downloads logic for stability (#370): Improved stability by updating retry logic for downloads.
- Mllama bug fix (#369): Fixed bug in Mllama.
- Update qeff wheel package name (#352): Updated the qeff wheel package name.
- Fix replicates biases too if they exist (e.g., Qwen) (#328): Fixed replication biases if they exist (e.g., Qwen).
Release V1.19.3
Added Features
- Vision Language Model
- Speech Sequence to Sequence Model
- Support for FP8 Execution
- Prompt-Lookup Decoding sample script.