Releases · quic/efficient-transformers

New Features

SpD, multiprojection heads (#306): Implemented post-attention hidden size projections to speculate tokens ahead of the base model.
Adding compilation support for io_encrypt flag (#393): Added support for Model-IP I/O encryption feature using qaic-exec (compile only).
Multi model end to end test pipelines (#337): Introduced comprehensive test pipelines for multi-model scenarios.
QNN Compilation path Support in QEFFBaseModel class (#374): Added support for QNN compilation path in the QEFFBaseModel class.
CLI support for multimodel (AutoModelForImageTexttoText) (#287): Enabled CLI support for multimodel .
Support for Disaggregated serving (#365): Added support for disaggregated serving.
Support for GGUF model execution (without quantized weights) (#368): Enabled GGUF model execution without quantized weights.
SwiftKV backup Pending changes (#367): Implemented pending changes for SwiftKV backup.
Upgrading transformers version to latest 4.50 (#331): Upgraded transformers to version 4.50.
Added support for gradient checkpointing in the finetuning script (#338): Enabled gradient checkpointing in the finetuning script.
Passing device type in torch GradScaler (#345): Added functionality to pass device type in torch GradScaler.
Enabled FP8 models for replicate_kv_heads script (#353): Enabled FP8 models for the replicate_kv_heads script.
QNN Compilation command changes (#327): Updated QNN compilation commands.

Newly Onboarded Models

Bug Fixes

Config dump issue when apps sdk not installed for qnn execution (#379): Fixed config dump issue when apps SDK is not installed for QNN execution.
Removed device IDs from the test (#389): Removed device IDs from the test suite.
Compilation issue of VLMs with mxint8(kv) precision (#336): Fixed compilation issue of VLMs with mxint8(kv) precision.
Retrying downloads logic for stability (#370): Improved stability by updating retry logic for downloads.
Mllama bug fix (#369): Fixed bug in Mllama.
Update qeff wheel package name (#352): Updated the qeff wheel package name.
Fix replicates biases too if they exist (e.g., Qwen) (#328): Fixed replication biases if they exist (e.g., Qwen).

Provide feedback