rwfsmith

rwfsmith

Popular repositories Loading

Triton-XDNA Triton-XDNA Public

Forked from amd/Triton-XDNA

Triton-XDNA with native Windows support - NPU kernel compilation & LLM inference on AMD Ryzen AI (Strix Halo)

Python 3
qwen-asr-rocm qwen-asr-rocm Public

Qwen3-ASR-0.6B speech-to-text service with vLLM, Flash Attention 2 (AMD triton), and Wyoming STT proxy for Home Assistant

Python 2
FastFlowLM-Docker FastFlowLM-Docker Public

Wyoming Protocol Docker container for FastFlowLM on AMD Ryzen AI NPUs — Whisper ASR + LLM conversation

Shell 1
bitsandbytes bitsandbytes Public

Forked from bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
llm_assistant llm_assistant Public

Home Assistant custom integration: OpenAI-compatible LLM conversation agent with MCP server support

Python