Skip to content

Releases: quic/efficient-transformers

release/v1.20.0

08 May 18:15
4da283e
Compare
Choose a tag to compare

Release Summary

New Features

  • SpD, multiprojection heads (#306): Implemented post-attention hidden size projections to speculate tokens ahead of the base model.
  • Adding compilation support for io_encrypt flag (#393): Added support for Model-IP I/O encryption feature using qaic-exec (compile only).
  • Multi model end to end test pipelines (#337): Introduced comprehensive test pipelines for multi-model scenarios.
  • QNN Compilation path Support in QEFFBaseModel class (#374): Added support for QNN compilation path in the QEFFBaseModel class.
  • CLI support for multimodel (AutoModelForImageTexttoText) (#287): Enabled CLI support for multimodel .
  • Support for Disaggregated serving (#365): Added support for disaggregated serving.
  • Support for GGUF model execution (without quantized weights) (#368): Enabled GGUF model execution without quantized weights.
  • SwiftKV backup Pending changes (#367): Implemented pending changes for SwiftKV backup.
  • Upgrading transformers version to latest 4.50 (#331): Upgraded transformers to version 4.50.
  • Added support for gradient checkpointing in the finetuning script (#338): Enabled gradient checkpointing in the finetuning script.
  • Passing device type in torch GradScaler (#345): Added functionality to pass device type in torch GradScaler.
  • Enabled FP8 models for replicate_kv_heads script (#353): Enabled FP8 models for the replicate_kv_heads script.
  • QNN Compilation command changes (#327): Updated QNN compilation commands.

Newly Onboarded Models

  • Granite Vision (#359): Onboarded Granite Vision model.
  • Granite MOE (#362): Onboarded Granite MOE model.

Bug Fixes

  • Config dump issue when apps sdk not installed for qnn execution (#379): Fixed config dump issue when apps SDK is not installed for QNN execution.
  • Removed device IDs from the test (#389): Removed device IDs from the test suite.
  • Compilation issue of VLMs with mxint8(kv) precision (#336): Fixed compilation issue of VLMs with mxint8(kv) precision.
  • Retrying downloads logic for stability (#370): Improved stability by updating retry logic for downloads.
  • Mllama bug fix (#369): Fixed bug in Mllama.
  • Update qeff wheel package name (#352): Updated the qeff wheel package name.
  • Fix replicates biases too if they exist (e.g., Qwen) (#328): Fixed replication biases if they exist (e.g., Qwen).

Release V1.19.3

28 Feb 16:40
2b17ebd
Compare
Choose a tag to compare

Added Features

  • Vision Language Model
  • Speech Sequence to Sequence Model
  • Support for FP8 Execution
  • Prompt-Lookup Decoding sample script.