Vulkan Optimizations and Fixes (#8959) by Nexesenex · Pull Request #299 · Nexesenex/croco.cpp

Nexesenex · 2024-08-14T19:24:15Z

Optimize Vulkan REPEAT performance
Use Vulkan GLSL fused multiply-add instruction where possible
Add GGML_VULKAN_PERF option to output performance data per operator
Rework and fix Vulkan descriptor set and descriptor pool handling
Fix float32 concat f16 shader validation error
Add Vulkan GROUP_NORM eps parameter
Fix validation error with transfer queue memory barrier flags
Remove trailing whitespaces

* Optimize Vulkan REPEAT performance * Use Vulkan GLSL fused multiply-add instruction where possible * Add GGML_VULKAN_PERF option to output performance data per operator * Rework and fix Vulkan descriptor set and descriptor pool handling * Fix float32 concat f16 shader validation error * Add Vulkan GROUP_NORM eps parameter * Fix validation error with transfer queue memory barrier flags * Remove trailing whitespaces

* Make sure no interleaved quants are being used for token embeddings also with `--pure` and/or `--custom-q`. * Simplify --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex merged commit e71ce8f into Nexesenex:spacestream Aug 14, 2024

github-actions bot added ggml Vulkan labels Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan Optimizations and Fixes (#8959)#299

Vulkan Optimizations and Fixes (#8959)#299
Nexesenex merged 1 commit intoNexesenex:spacestreamfrom
ggml-org:master

Nexesenex commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Nexesenex commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants