[release/v1.21.0]: Updating docs for 1.21.0#683
Conversation
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
|
|
||
| - **Mistral 3.1 (24B)** | ||
| - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM) | ||
| - Production-ready deployment |
| - **Olmo2** | ||
| - Executable via [`QEffAutoModelForCausalLM`](#QEffAutoModelForCausalLM) | ||
| - Full CausalLM support with optimizations | ||
| - Bug fixes included |
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
|
We are missing a few more models, Olmo, Molmo and Wave2Vec2 @abukhoy please add these models as well |
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
| ## Key Features & Enhancements | ||
|
|
||
| - **Framework Upgrades**: Transformers `4.55`, PyTorch `2.7.0+cpu`, Torchvision `0.22.0+cpu` | ||
| - **Python Support**: Now requires Python `>=3.9` |
There was a problem hiding this comment.
better to keep python support as 3.10
| @@ -0,0 +1,84 @@ | |||
| # Diffuser Classes | |||
There was a problem hiding this comment.
Can we follow the similar approach like qeff_autoclasses.html? Add small examples and keep only the user exposed classes? @quic-amitraj can you suggest
| ``` | ||
|
|
||
| (QEFFAutoModelForCTC)= | ||
| ## `QEFFAutoModelForCTC` |
There was a problem hiding this comment.
please add an example here
| - Enabled for GPT-OSS models, allowing for flexible deployment of large language models across different hardware configurations. | ||
| * - `ONNX Sub-Functions <https://github.com/quic/efficient-transformers/pull/621>`_ | ||
| - Feature enabling more efficient model compilation and execution on hardware. | ||
| * - `Continuous Batching (VLMs) <https://github.com/quic/efficient-transformers/pull/610>`_ |
There was a problem hiding this comment.
you can remove Continuous Batching (VLMs)
| - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance. | ||
| * - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_ | ||
| - Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**. | ||
| * - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_ |
There was a problem hiding this comment.
remove it. it's an optimization and it's not a standalone feature
| - Adds scripts to profile memory during export/compile/infer (peak usage, cache footprint) for quicker diagnosis. Refer `sample scripts <https://github.com/quic/efficient-transformers/tree/main/scripts/memory_profiling>`_ for more **details**. | ||
| * - `ONNX transform, memory & time optimizations <https://github.com/quic/efficient-transformers/pull/640>`_ | ||
| - Adds periodic memory cleanup (e.g., to FP16ClipTransform / SplitTensorsTransform) during largetensor processing, and avoids redundant external data loading when already present | ||
| * - Onboarding Guide |
| - Extended to Vision Language Models with multi-image handling capabilities, optimizing throughput and latency by dynamically batching requests with varying image counts. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text/models/granite_vision/continuous_batching.py>`_ for more **details**. | ||
| * - `BlockedKV attention in CausalLM <https://github.com/quic/efficient-transformers/pull/618>`_ | ||
| - Implements a blocked K/V cache layout so attention reads/processes the cache blockbyblock, improving longcontext decode performance. | ||
| * - `Memory Profiling Tool <https://github.com/quic/efficient-transformers/pull/674>`_ |
| | Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) | | ||
| |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------| | ||
| | **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | | ||
| | **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ | |
| | Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) | | ||
| |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------| | ||
| | **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | | ||
| | **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ | |
There was a problem hiding this comment.
Check with @quic-vargupt and confirm if these models are supported on vllm and update it accordingly
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
|
Closing this PR as changes added as part of #718 |

This is created for updating documentation for 1.21.0 release.
Note: First Draft