Releases: A3S-Lab/Power
Releases · A3S-Lab/Power
v0.4.2
Performance
- Release binary size optimized ~48% (29MB → 15MB default, 5.2MB → 3.3MB picolm)
opt-level = "z"(optimize for size) with LTO fat, codegen-units=1, strip, panic=abort
Bug Fixes
- Fix all clippy warnings (CI green):
- Replace needless index loops with iterators in picolm, attention, norm
- Fix excessive float precision in GELU constant
- Add
Defaultimpl forJsonGrammarSampler - Use
?operator in grammar sampler
- Fix
cargo fmtformatting in main.rs
Features (from v0.4.1)
- Full CLI with
serve,models(list/pull/rm/show),chat,pssubcommands PowerConfig::load_from(path)for--configflag support
v0.4.1
Features
- feat(cli): Full CLI with
serve,models(list/pull/rm/show),chat,pssubcommands - Backward-compatible: no subcommand =
serve(same as before)
Performance
- perf: Release binary size reduced ~48% (29MB → 15MB default, 5.2MB → 3.3MB picolm)
- Switch
opt-levelfrom3to"z"(optimize for size) - Combined with existing LTO fat,
codegen-units = 1,strip = true,panic = "abort"
- Switch
- perf(picolm): Pre-dequantized layer norms + gate/up dual matvec fusion (+12.4% decode speed)
Other
- docs: Updated README with picolm optimization status and v0.4.0 features
- Added
PowerConfig::load_from(path)for--configflag support - Made
reqwestnon-optional (needed by CLI HTTP client)
v0.4.0
Features
- Batch prefill — process prompt tokens in batch for faster time-to-first-token
- Grammar-constrained structured output — JSON schema enforcement during generation
- Tool/function calling — OpenAI-compatible tool_calls with auto-dispatch
- Speculative decoding — prompt-lookup draft for faster decode throughput
- AVX2 Q4_K/Q6_K kernels — SIMD-accelerated quantized dot products on x86_64
- Repeat/frequency/presence penalty — configurable repetition control
- Startup self-test — validates norm, f32_dot, q8_0_dot on load
- TEE hardening — AVX2 SIMD vec_dot kernels for secure enclaves
Performance
- NEON SIMD for attention softmax, RMSNorm, SiLU/add_residual in FFN
- Fused f16 KV attention — dot(q, k_f16) and accumulate(v_f16) without intermediate f32 buffer
- Zero-alloc sampler — pre-allocated probs/indices buffers, no heap allocation per token
- Zero-alloc hot path — pre-allocated ForwardBuffers for all decode operations
- Q4_K NEON kernel rewrite — register-based nibble extraction via NEON intrinsics
- Decode profiling instrumentation — per-stage timing breakdown (embed/attn/ffn/logit/sample)
- ~14 tok/s on Qwen 2.5 0.5B Q4_K_M (Apple Silicon, single-threaded decode)
Fixes
- Resolve clippy warnings for picolm feature build
- Fix missing
#[cfg(feature = "picolm")]on repeat penalty tests
v0.3.0
- chore: bump version to 0.3.0
- feat(picolm): production-ready pure Rust inference with true layer-streaming
- fix: correct Q4_K/Q5_K/Q6_K dequantization, add Qwen GPT-style tokenizer and attention bias
- test(picolm): add end-to-end integration tests with synthetic GGUF
- feat(picolm): implement pure-Rust LLM inference backend
- docs: add picolm layer-streaming technical deep-dive to README
- docs: rewrite README to lead with irreplaceable value proposition
- docs: add CI/CD badges and section to README, fix test race condition
v0.2.0
- fix: add #[serial] to test_power_home_default to prevent env var race
- fix: use rustls-tls for reqwest, remove OpenSSL dependency
- fix: allow deprecated Nonce::from_slice (generic-array transitive dep)
- fix: clippy warnings for hw-verify feature (imports, type complexity, inspect_err)
- fix(ci): drop --all-features (llamacpp needs C++ toolchain), use hf+hw-verify
- fix(ci): scope fmt to power crate, fix stub lib.rs
- fix(ci): setup-workspace stub, release profile optimization, fix homebrew heredoc
- fix(ci): use --lib for clippy, add cross-build matrix
- feat: v0.2.0 — picolm backend, HF pull, EPC routing, hw-verify, CI/CD
- docs: add Discord community link
- feat(server): graceful shutdown with SIGTERM support and audit log flush
- feat(verify): add client-side attestation verification SDK and CLI
- feat(backend): add embedding model support via HuggingFace format
- feat(api): add stream_options.include_usage and num_parallel passthrough
- feat(power): close gap analysis small items
- fix(api,tee,router): fix active_requests leak, lazy usage counters, rate limiter, redact all occurrences
- fix(auth,config,api): constant-time auth, config validation, lazy usage counters
- fix(api): return proper HTTP status codes on all error paths
- docs(readme): add model_signing_key to config reference, update test count to 671+
- fix: correct keep_alive=0 on cache hit, add active_requests getter, fix serial test flake
- feat(embeddings): add keep_alive to EmbeddingRequest and wire unload logic
- fix(autoload,chat,completions,llamacpp): unload after inference for keep_alive=0, wire config defaults
- feat(api): add keep_alive field and fix autoload integrity, SSE order, and reaper format lookup
- fix: audit streaming paths, real token counts, SSE order, and measurement validation
- feat: model management API, attestation health field, and privacy/TEE wiring
- feat(tee): wire in_memory_decrypt and suppress_token_metrics
- docs: update README for P2/P3 features
- feat(tee): implement P2/P3 security features
- feat: add Box + Power integration example with real model inference
- docs: use explicit remote URL in brew tap instructions
- feat(config): add HCL configuration file format support
- chore: bump a3s-updater to 0.2, add path dependency
- ci: add cargo-publish job to release workflow
- fix: clippy warnings (assertions_on_constants, missing Default, doc comment)
- style: cargo fmt cli/mod.rs
v0.1.5
Full Changelog: v0.1.4...v0.1.5
v0.1.4
Full Changelog: v0.1.2...v0.1.4
v0.1.2
What's New
- Ollama Registry Integration: Pull any model from
registry.ollama.aiby name — primary resolution source with automatic template, system prompt, parameters, and license extraction - 3-tier Model Resolution: Ollama Registry → built-in known_models.json → HuggingFace API fallback
- Vision Model Support: Multimodal projector auto-downloaded from Ollama registry for vision models (e.g. llava)
- 878 unit tests passing
Install
# Cargo
cargo install a3s-power
# macOS (Apple Silicon)
curl -LO https://github.com/A3S-Lab/Power/releases/download/v0.1.2/a3s-power-v0.1.2-aarch64-apple-darwin.tar.gz
tar xzf a3s-power-v0.1.2-aarch64-apple-darwin.tar.gz
sudo mv a3s-power /usr/local/bin/v0.1.1 - 861 Unit Tests with 90%+ Coverage
What's Changed
Quality & Testing
- 861 unit tests (up from 558) with 90.11% region coverage
- 91.47% function coverage across 59 source files
- Comprehensive test coverage for all modules: API handlers, CLI commands, model management, backend, server
Coverage Highlights
- 14 modules at 100% coverage
- All API handlers tested (native + OpenAI)
- All CLI commands tested
- Model storage, registry, manifest fully covered
- Backend (llama.cpp, chat templates, tool parsing, JSON schema) fully covered
Publishing
- Published to crates.io
- Homebrew formula updated:
brew install a3s-lab/tap/a3s-power
Install
# From crates.io
cargo install a3s-power
# From Homebrew (macOS)
brew tap a3s-lab/tap
brew install a3s-powerFull Changelog: v0.1.0...v0.1.1
a3s-power v0.1.0
Local model management and serving with OpenAI-compatible API
Key Features:
- Ollama-compatible CLI (run, pull, list, show, delete, serve, create, push, cp)
- OpenAI-compatible HTTP API
- Vision and tool calling support
- Blob management and model push
- Health endpoint and model auto-loading
- 291 tests passing