Skip to content

[release/v1.21.0]: Updating docs for 1.21.0#683

Closed
abukhoy wants to merge 15 commits intoquic:mainfrom
abukhoy:release-1.21-docs
Closed

[release/v1.21.0]: Updating docs for 1.21.0#683
abukhoy wants to merge 15 commits intoquic:mainfrom
abukhoy:release-1.21-docs

Conversation

@abukhoy
Copy link
Copy Markdown
Contributor

@abukhoy abukhoy commented Dec 22, 2025

This is created for updating documentation for 1.21.0 release.

Note: First Draft

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Comment thread docs/source/release_docs.md Outdated

- **Mistral 3.1 (24B)**
- Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
- Production-ready deployment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread docs/source/release_docs.md Outdated
- **Olmo2**
- Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM)
- Full CausalLM support with optimizations
- Bug fixes included
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread docs/source/validate.md
Comment thread docs/source/validate.md Outdated
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
@quic-rishinr
Copy link
Copy Markdown
Contributor

We are missing a few more models, Olmo, Molmo and Wave2Vec2 @abukhoy please add these models as well

@abukhoy abukhoy marked this pull request as draft December 23, 2025 05:31
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Comment thread docs/source/release_docs.md Outdated
## Key Features & Enhancements

- **Framework Upgrades**: Transformers `4.55`, PyTorch `2.7.0+cpu`, Torchvision `0.22.0+cpu`
- **Python Support**: Now requires Python `>=3.9`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to keep python support as 3.10

@@ -0,0 +1,84 @@
# Diffuser Classes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow the similar approach like qeff_autoclasses.html? Add small examples and keep only the user exposed classes? @quic-amitraj can you suggest

```

(QEFFAutoModelForCTC)=
## `QEFFAutoModelForCTC`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an example here

Comment thread docs/source/supported_features.rst Outdated
- Enabled for GPT-OSS models, allowing for flexible deployment of large language models across different hardware configurations.
* - `ONNX Sub-Functions <https://github.com/quic/efficient-transformers/pull/621>`_
- Feature enabling more efficient model compilation and execution on hardware.
* - `Continuous Batching (VLMs) <https://github.com/quic/efficient-transformers/pull/610>`_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove Continuous Batching (VLMs)

Comment thread docs/source/supported_features.rst Outdated
- Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance.
* - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_
- Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**.
* - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it. it's an optimization and it's not a standalone feature

Comment thread docs/source/supported_features.rst Outdated
- Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**.
* - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_
- Adds periodic memory cleanup (e.g., to FP16ClipTransform / SplitTensorsTransform) during largetensor processing, and avoids redundant external data loading when already present
* - Onboarding Guide
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread docs/source/supported_features.rst Outdated
- Extended to Vision Language Models with multi-image handling capabilities, optimizing throughput and latency by dynamically batching requests with varying image counts. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text/models/granite_vision/continuous_batching.py>`_ for more **details**.
* - `BlockedKV attention in CausalLM <https://github.com/quic/efficient-transformers/pull/618>`_
- Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance.
* - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread docs/source/validate.md
| Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) |
|-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------|
| **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ |
| **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image There are some characters coming in these. Check for all the newly added models

Comment thread docs/source/validate.md
| Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) |
|-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------|
| **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ |
| **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check with @quic-vargupt and confirm if these models are supported on vllm and update it accordingly

abukhoy and others added 3 commits December 26, 2025 06:17
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
@quic-rishinr
Copy link
Copy Markdown
Contributor

Closing this PR as changes added as part of #718

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants