[release/v1.21.0]: Updating docs for 1.21.0 by abukhoy · Pull Request #683 · quic/efficient-transformers

abukhoy · 2025-12-22T11:36:53Z

This is created for updating documentation for 1.21.0 release.

Note: First Draft

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr · 2025-12-22T12:53:17Z

+
+- **Mistral 3.1 (24B)**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Production-ready deployment


quic-rishinr · 2025-12-22T12:53:22Z

+- **Olmo2**
+  - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
+  - Full CausalLM support with optimizations
+  - Bug fixes included


Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr · 2025-12-22T14:12:59Z

We are missing a few more models, Olmo, Molmo and Wave2Vec2 @abukhoy please add these models as well

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr · 2025-12-24T12:14:11Z

+## Key Features & Enhancements
+
+- **Framework Upgrades**: Transformers `4.55`, PyTorch `2.7.0+cpu`, Torchvision `0.22.0+cpu`
+- **Python Support**: Now requires Python `>=3.9`


better to keep python support as 3.10

quic-rishinr · 2025-12-24T12:20:17Z

@@ -0,0 +1,84 @@
+# Diffuser Classes


Can we follow the similar approach like qeff_autoclasses.html? Add small examples and keep only the user exposed classes? @quic-amitraj can you suggest

quic-rishinr · 2025-12-24T12:22:26Z

 ```
+
+(QEFFAutoModelForCTC)=
+## `QEFFAutoModelForCTC`


please add an example here

quic-rishinr · 2025-12-24T12:23:54Z

+     - Enabled for GPT-OSS models, allowing for flexible deployment of large language models across different hardware configurations.
+   * - `ONNX Sub-Functions <https://github.com/quic/efficient-transformers/pull/621>`_
+     - Feature enabling more efficient model compilation and execution on hardware.
+   * - `Continuous Batching (VLMs) <https://github.com/quic/efficient-transformers/pull/610>`_


you can remove Continuous Batching (VLMs)

quic-rishinr · 2025-12-24T12:24:42Z

+     - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance.
+   * - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_
+     - Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**.
+   * - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_


remove it. it's an optimization and it's not a standalone feature

quic-rishinr · 2025-12-24T12:24:58Z

+     - Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**.
+   * - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_
+     - Adds periodic memory cleanup (e.g., to FP16ClipTransform / SplitTensorsTransform) during largetensor processing, and avoids redundant external data loading when already present
+   * - Onboarding Guide


quic-rishinr · 2025-12-24T12:25:09Z

+     - Extended to Vision Language Models with multi-image handling capabilities, optimizing throughput and latency by dynamically batching requests with varying image counts. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text/models/granite_vision/continuous_batching.py>`_ for more **details**.
+   * - `BlockedKV attention in CausalLM <https://github.com/quic/efficient-transformers/pull/618>`_
+     - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance.
+   * - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_


quic-rishinr · 2025-12-24T12:27:02Z

 | Architecture            | Model Family       | Representative Models                                                                 | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) |
 |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------|
-| **FalconForCausalLM**   | Falcon**             | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)                                                                | ✔️          |
+| **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕           |


There are some characters coming in these. Check for all the newly added models

quic-rishinr · 2025-12-24T12:28:38Z

 | Architecture            | Model Family       | Representative Models                                                                 | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) |
 |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------|
-| **FalconForCausalLM**   | Falcon**             | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)                                                                | ✔️          |
+| **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕           |


Check with @quic-vargupt and confirm if these models are supported on vllm and update it accordingly

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr · 2026-01-20T10:24:38Z

Closing this PR as changes added as part of #718

releasing docs for 1.21.0

acd065c

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

abukhoy requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners December 22, 2025 11:36

releasing docs for 1.21.0

89d09ec

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr requested changes Dec 22, 2025

View reviewed changes

abukhoy added 2 commits December 22, 2025 13:05

releasing docs for 1.21.0

2c343ab

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

releasing docs for 1.21.0

c423061

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

Merge branch 'main' into release-1.21-docs

7c1ce3b

abukhoy marked this pull request as draft December 23, 2025 05:31

abukhoy added 7 commits December 23, 2025 07:54

updating

3b607c4

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

updating

4df6d31

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

updating

28bfcdb

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

Merge branch 'main' into release-1.21-docs

ef2b72c

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

features are being added

61b102a

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

features are being added

534f0a1

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

features are being added

b251495

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr requested changes Dec 24, 2025

View reviewed changes

abukhoy and others added 3 commits December 26, 2025 06:17

comments incorporating

bceca0d

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

Merge branch 'main' into release-1.21-docs

c37df28

vLLM support added in validation

ed9bae4

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

quic-rishinr closed this Jan 20, 2026

Conversation

abukhoy commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

quic-rishinr commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-rishinr commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abukhoy commented Dec 22, 2025 •

edited

Loading