Skip to content

Releases: A3S-Lab/Power

v0.4.2

23 Feb 06:46

Choose a tag to compare

Performance

  • Release binary size optimized ~48% (29MB → 15MB default, 5.2MB → 3.3MB picolm)
    • opt-level = "z" (optimize for size) with LTO fat, codegen-units=1, strip, panic=abort

Bug Fixes

  • Fix all clippy warnings (CI green):
    • Replace needless index loops with iterators in picolm, attention, norm
    • Fix excessive float precision in GELU constant
    • Add Default impl for JsonGrammarSampler
    • Use ? operator in grammar sampler
  • Fix cargo fmt formatting in main.rs

Features (from v0.4.1)

  • Full CLI with serve, models (list/pull/rm/show), chat, ps subcommands
  • PowerConfig::load_from(path) for --config flag support

v0.4.1

23 Feb 06:21

Choose a tag to compare

Features

  • feat(cli): Full CLI with serve, models (list/pull/rm/show), chat, ps subcommands
  • Backward-compatible: no subcommand = serve (same as before)

Performance

  • perf: Release binary size reduced ~48% (29MB → 15MB default, 5.2MB → 3.3MB picolm)
    • Switch opt-level from 3 to "z" (optimize for size)
    • Combined with existing LTO fat, codegen-units = 1, strip = true, panic = "abort"
  • perf(picolm): Pre-dequantized layer norms + gate/up dual matvec fusion (+12.4% decode speed)

Other

  • docs: Updated README with picolm optimization status and v0.4.0 features
  • Added PowerConfig::load_from(path) for --config flag support
  • Made reqwest non-optional (needed by CLI HTTP client)

v0.4.0

22 Feb 07:45

Choose a tag to compare

Features

  • Batch prefill — process prompt tokens in batch for faster time-to-first-token
  • Grammar-constrained structured output — JSON schema enforcement during generation
  • Tool/function calling — OpenAI-compatible tool_calls with auto-dispatch
  • Speculative decoding — prompt-lookup draft for faster decode throughput
  • AVX2 Q4_K/Q6_K kernels — SIMD-accelerated quantized dot products on x86_64
  • Repeat/frequency/presence penalty — configurable repetition control
  • Startup self-test — validates norm, f32_dot, q8_0_dot on load
  • TEE hardening — AVX2 SIMD vec_dot kernels for secure enclaves

Performance

  • NEON SIMD for attention softmax, RMSNorm, SiLU/add_residual in FFN
  • Fused f16 KV attention — dot(q, k_f16) and accumulate(v_f16) without intermediate f32 buffer
  • Zero-alloc sampler — pre-allocated probs/indices buffers, no heap allocation per token
  • Zero-alloc hot path — pre-allocated ForwardBuffers for all decode operations
  • Q4_K NEON kernel rewrite — register-based nibble extraction via NEON intrinsics
  • Decode profiling instrumentation — per-stage timing breakdown (embed/attn/ffn/logit/sample)
  • ~14 tok/s on Qwen 2.5 0.5B Q4_K_M (Apple Silicon, single-threaded decode)

Fixes

  • Resolve clippy warnings for picolm feature build
  • Fix missing #[cfg(feature = "picolm")] on repeat penalty tests

v0.3.0

21 Feb 20:52

Choose a tag to compare

  • chore: bump version to 0.3.0
  • feat(picolm): production-ready pure Rust inference with true layer-streaming
  • fix: correct Q4_K/Q5_K/Q6_K dequantization, add Qwen GPT-style tokenizer and attention bias
  • test(picolm): add end-to-end integration tests with synthetic GGUF
  • feat(picolm): implement pure-Rust LLM inference backend
  • docs: add picolm layer-streaming technical deep-dive to README
  • docs: rewrite README to lead with irreplaceable value proposition
  • docs: add CI/CD badges and section to README, fix test race condition

v0.2.0

21 Feb 14:41

Choose a tag to compare

  • fix: add #[serial] to test_power_home_default to prevent env var race
  • fix: use rustls-tls for reqwest, remove OpenSSL dependency
  • fix: allow deprecated Nonce::from_slice (generic-array transitive dep)
  • fix: clippy warnings for hw-verify feature (imports, type complexity, inspect_err)
  • fix(ci): drop --all-features (llamacpp needs C++ toolchain), use hf+hw-verify
  • fix(ci): scope fmt to power crate, fix stub lib.rs
  • fix(ci): setup-workspace stub, release profile optimization, fix homebrew heredoc
  • fix(ci): use --lib for clippy, add cross-build matrix
  • feat: v0.2.0 — picolm backend, HF pull, EPC routing, hw-verify, CI/CD
  • docs: add Discord community link
  • feat(server): graceful shutdown with SIGTERM support and audit log flush
  • feat(verify): add client-side attestation verification SDK and CLI
  • feat(backend): add embedding model support via HuggingFace format
  • feat(api): add stream_options.include_usage and num_parallel passthrough
  • feat(power): close gap analysis small items
  • fix(api,tee,router): fix active_requests leak, lazy usage counters, rate limiter, redact all occurrences
  • fix(auth,config,api): constant-time auth, config validation, lazy usage counters
  • fix(api): return proper HTTP status codes on all error paths
  • docs(readme): add model_signing_key to config reference, update test count to 671+
  • fix: correct keep_alive=0 on cache hit, add active_requests getter, fix serial test flake
  • feat(embeddings): add keep_alive to EmbeddingRequest and wire unload logic
  • fix(autoload,chat,completions,llamacpp): unload after inference for keep_alive=0, wire config defaults
  • feat(api): add keep_alive field and fix autoload integrity, SSE order, and reaper format lookup
  • fix: audit streaming paths, real token counts, SSE order, and measurement validation
  • feat: model management API, attestation health field, and privacy/TEE wiring
  • feat(tee): wire in_memory_decrypt and suppress_token_metrics
  • docs: update README for P2/P3 features
  • feat(tee): implement P2/P3 security features
  • feat: add Box + Power integration example with real model inference
  • docs: use explicit remote URL in brew tap instructions
  • feat(config): add HCL configuration file format support
  • chore: bump a3s-updater to 0.2, add path dependency
  • ci: add cargo-publish job to release workflow
  • fix: clippy warnings (assertions_on_constants, missing Default, doc comment)
  • style: cargo fmt cli/mod.rs

v0.1.5

11 Feb 14:33

Choose a tag to compare

Full Changelog: v0.1.4...v0.1.5

v0.1.4

11 Feb 12:59

Choose a tag to compare

Full Changelog: v0.1.2...v0.1.4

v0.1.2

11 Feb 07:32

Choose a tag to compare

What's New

  • Ollama Registry Integration: Pull any model from registry.ollama.ai by name — primary resolution source with automatic template, system prompt, parameters, and license extraction
  • 3-tier Model Resolution: Ollama Registry → built-in known_models.json → HuggingFace API fallback
  • Vision Model Support: Multimodal projector auto-downloaded from Ollama registry for vision models (e.g. llava)
  • 878 unit tests passing

Install

# Cargo
cargo install a3s-power

# macOS (Apple Silicon)
curl -LO https://github.com/A3S-Lab/Power/releases/download/v0.1.2/a3s-power-v0.1.2-aarch64-apple-darwin.tar.gz
tar xzf a3s-power-v0.1.2-aarch64-apple-darwin.tar.gz
sudo mv a3s-power /usr/local/bin/

v0.1.1 - 861 Unit Tests with 90%+ Coverage

10 Feb 20:12

Choose a tag to compare

What's Changed

Quality & Testing

  • 861 unit tests (up from 558) with 90.11% region coverage
  • 91.47% function coverage across 59 source files
  • Comprehensive test coverage for all modules: API handlers, CLI commands, model management, backend, server

Coverage Highlights

  • 14 modules at 100% coverage
  • All API handlers tested (native + OpenAI)
  • All CLI commands tested
  • Model storage, registry, manifest fully covered
  • Backend (llama.cpp, chat templates, tool parsing, JSON schema) fully covered

Publishing

  • Published to crates.io
  • Homebrew formula updated: brew install a3s-lab/tap/a3s-power

Install

# From crates.io
cargo install a3s-power

# From Homebrew (macOS)
brew tap a3s-lab/tap
brew install a3s-power

Full Changelog: v0.1.0...v0.1.1

a3s-power v0.1.0

09 Feb 08:26

Choose a tag to compare

Local model management and serving with OpenAI-compatible API

Key Features:

  • Ollama-compatible CLI (run, pull, list, show, delete, serve, create, push, cp)
  • OpenAI-compatible HTTP API
  • Vision and tool calling support
  • Blob management and model push
  • Health endpoint and model auto-loading
  • 291 tests passing