embedrs

Unified embedding for Rust -- 6 cloud providers + local inference through one interface. Opinionated defaults backed by 8-model benchmark data.

Design philosophy

If we build it, it must be great -- every default backed by data.

embedrs::local()? -- all-MiniLM-L6-v2 (23MB, free, no API key)
embedrs::cloud(key) -- OpenAI text-embedding-3-small (best discrimination, cheapest cloud)
Both produce the same EmbedResult -- write code once, switch backends in one line

Defaults chosen by 8-dimension benchmark across 8 models. See examples/embedding_models/ for the full reproducible experiment.

Quick Start

// cloud -- one key, done
let client = embedrs::cloud("sk-...");
let result = client.embed(vec!["hello world".into()]).await?;
println!("dimensions: {}", result.embeddings[0].len());

// local -- zero config, free, 23MB model downloaded on first use
let client = embedrs::local()?;
let result = client.embed(vec!["hello world".into()]).await?;

Installation

[dependencies]
embedrs = "0.3"

# enable local inference (adds ~23MB model download on first use)
embedrs = { version = "0.3", features = ["local"] }

Feature Flags

Feature	Default	Description
(none)	yes	Core embedding client, all 6 cloud providers
`local`	no	Local inference via candle (all-MiniLM-L6-v2, 23MB)
`cost-tracking`	no	Estimated cost per request via `tiktoken` pricing data
`tracing`	no	Structured logging via the `tracing` crate

[dependencies]
# cloud only
embedrs = "0.3"

# cloud + local inference
embedrs = { version = "0.3", features = ["local"] }

# with cost tracking
embedrs = { version = "0.3", features = ["cost-tracking"] }

# with tracing
embedrs = { version = "0.3", features = ["local", "tracing"] }

Benchmark Results

8 dimensions, 184 unique texts. Full methodology and reproduction instructions in examples/embedding_models/ — run with cargo run --example embedding_models --features local --release.

Metric	MiniLM-L6	MiniLM-L12	BGE-small	GTE-small	OpenAI	Gemini	Cohere	Voyage
Size	23MB	133MB	133MB	67MB	cloud	cloud	cloud	cloud
Spearman ρ	0.81	0.84	0.71	0.75	0.91	0.94	0.91	0.89
Discrimination	0.52	0.52	0.29	0.14	0.58	0.30	0.46	0.45
Retrieval	100%	100%	89%	100%	100%	89%	100%	89%
EN ρ	0.92	0.94	0.92	0.90	0.91	0.91	0.89	0.88
ZH ρ	0.65	0.74	0.45	0.40	0.88	0.99	0.93	0.89
JA ρ	0.60	0.90	0.20	0.50	0.90	1.00	1.00	0.90
Cross-lingual	0.25	0.26	0.66	0.81	0.71	0.84	0.68	0.85
Robustness	0.89	0.90	0.94	0.97	0.88	0.94	0.89	0.95
Cluster sep.	8.73x	4.38x	1.29x	1.09x	2.55x	1.11x	1.41x	1.30x
Cost	$0	$0	$0	$0	$0.02/1M	free tier	$0.10/1M	$0.06/1M

Why MiniLM-L6 for local

23MB -- the only model small enough for app embedding (others are 67-133MB)
Best clustering separation at 8.73x (2nd place is 4.38x)
100% retrieval accuracy, EN ρ=0.92
12-layer models are 3-6x larger with no meaningful quality improvement
Known weakness: poor on Chinese/Japanese (ρ=0.60-0.65) and cross-lingual (0.25)

Why OpenAI for cloud

Best discrimination gap at 0.58 (dissimilar texts avg cosine = 0.09, closest to zero)
100% retrieval accuracy, MRR=1.0
Balanced multilingual: EN=0.91, ZH=0.88, JA=0.90 -- no weak language
Cheapest cloud option at $0.02/1M tokens
Gemini has higher ρ (0.94) but poor discrimination (0.30) and retrieval miss (89%)
Cohere matches quality but costs 5x more ($0.10/1M tokens)

Providers

Provider	Constructor	Default Model	Max Batch Size
OpenAI	`Client::openai(key)`	`text-embedding-3-small`	2048
Cohere	`Client::cohere(key)`	`embed-v4.0`	96
Google Gemini	`Client::gemini(key)`	`gemini-embedding-001`	100
Voyage AI	`Client::voyage(key)`	`voyage-3-large`	128
Jina AI	`Client::jina(key)`	`jina-embeddings-v3`	2048
Mistral	`Client::mistral(key)`	`mistral-embed`	512
Local	`Client::local(name)?`	`all-MiniLM-L6-v2`	256

Model is free-form — pass any current id with .embed(...).model("..."). Notable models as of 2026-06:

Voyage — voyage-3-large (default), voyage-4-large, voyage-4, voyage-4-lite, voyage-3.5, voyage-3.5-lite, voyage-code-3
OpenAI — text-embedding-3-small (default), text-embedding-3-large
Cohere — embed-v4.0 (default, multimodal)
Gemini — gemini-embedding-001 (default), gemini-embedding-2 (multimodal, supports output_dimensionality)
Jina — jina-embeddings-v3 (default). jina-embeddings-v4 is released but its cloud-API response schema isn't yet verified single-vector compatible with this crate — try it via .model("jina-embeddings-v4") and please file an issue if you hit a deserialization error.
Mistral — mistral-embed (default), codestral-embed-2505 (specialized for code)

Each cloud provider also has a *_compatible constructor for proxies or API-compatible services:

// OpenAI-compatible (Azure, proxies, etc.)
let client = Client::openai_compatible("sk-...", "https://your-proxy.com/v1");

// Cohere-compatible
let client = Client::cohere_compatible("key", "https://proxy.example.com/v2");

// Gemini-compatible
let client = Client::gemini_compatible("key", "https://proxy.example.com/v1beta");

// Voyage-compatible
let client = Client::voyage_compatible("key", "https://proxy.example.com/v1");

// Mistral-compatible
let client = Client::mistral_compatible("key", "https://proxy.example.com/v1");

// Jina-compatible
let client = Client::jina_compatible("key", "https://proxy.example.com/v1");

Batch Embedding

Embed thousands of texts concurrently. Texts are automatically chunked based on the provider's maximum batch size:

let client = embedrs::cloud("sk-...");

let texts: Vec<String> = (0..5000).map(|i| format!("document {i}")).collect();

let result = client.embed_batch(texts)
    .concurrency(5)       // max concurrent API requests (default: 5)
    .chunk_size(512)       // texts per request (default: provider max)
    .model("text-embedding-3-large")
    .await?;

println!("total embeddings: {}", result.embeddings.len());
println!("total tokens: {}", result.usage.total_tokens);

Similarity Functions

use embedrs::{cosine_similarity, dot_product, euclidean_distance};

let a = vec![1.0, 0.0, 0.0];
let b = vec![0.0, 1.0, 0.0];

let cos = cosine_similarity(&a, &b);    // 0.0 (orthogonal)
let dot = dot_product(&a, &b);          // 0.0
let dist = euclidean_distance(&a, &b);  // 1.414...

Input Type

Some providers use input type hints to optimize embeddings for specific use cases:

use embedrs::InputType;

// for indexing documents
let result = client.embed(docs)
    .input_type(InputType::SearchDocument)
    .await?;

// for search queries
let result = client.embed(queries)
    .input_type(InputType::SearchQuery)
    .await?;

Available variants: SearchDocument, SearchQuery, Classification, Clustering.

Dimensions

Request reduced-dimension embeddings where the provider supports it:

let result = client.embed(vec!["hello".into()])
    .model("text-embedding-3-large")
    .dimensions(256)
    .await?;

assert_eq!(result.embeddings[0].len(), 256);

Backoff and Timeout

use std::time::Duration;
use embedrs::BackoffConfig;

let client = Client::openai("sk-...")
    .with_retry_backoff(BackoffConfig::default())  // 500ms base, 30s cap, 3 retries
    .with_timeout(Duration::from_secs(120));        // overall timeout (default: 60s)

// per-request override
let result = client.embed(vec!["hello".into()])
    .retry_backoff(BackoffConfig {
        base_delay: Duration::from_millis(200),
        max_delay: Duration::from_secs(10),
        jitter: true,
        max_http_retries: 5,
    })
    .timeout(Duration::from_secs(30))
    .await?;

Without backoff configured, HTTP 429/503 errors fail immediately.

Client Defaults

Set defaults once, override per-request:

let client = Client::openai("sk-...")
    .with_model("text-embedding-3-large")
    .with_dimensions(256)
    .with_input_type(InputType::SearchDocument)
    .with_retry_backoff(BackoffConfig::default())
    .with_timeout(Duration::from_secs(120));

// all requests use the defaults above
let a = client.embed(vec!["doc 1".into()]).await?;
let b = client.embed(vec!["doc 2".into()]).await?;

// override for a specific request
let c = client.embed(vec!["query".into()])
    .model("text-embedding-3-small")
    .input_type(InputType::SearchQuery)
    .await?;

Provider Fallback

Chain fallback providers for automatic failover when the primary provider is unavailable:

let client = embedrs::Client::openai("sk-...")
    .with_fallback(embedrs::Client::cohere("cohere-key"));
// if OpenAI fails, automatically tries Cohere
let result = client.embed(vec!["hello".into()]).await?;

Multiple fallbacks are tried in order:

let client = embedrs::Client::openai("sk-...")
    .with_fallback(embedrs::Client::cohere("cohere-key"))
    .with_fallback(embedrs::Client::voyage("voyage-key"));

Cost Tracking

Enable the cost-tracking feature to get estimated cost per request:

embedrs = { version = "0.3", features = ["cost-tracking"] }

let result = client.embed(vec!["hello".into()]).await?;
if let Some(cost) = result.usage.cost {
    println!("estimated cost: ${cost:.6}");
}

Cost estimation uses tiktoken pricing data. Returns None for models without pricing information.

Error Handling

All fallible operations return embedrs::Result<T>. Match on Error variants for fine-grained control:

use embedrs::Error;

# async fn run(client: &embedrs::Client) {
match client.embed(vec!["hello".into()]).await {
    Ok(result) => println!("got {} embeddings", result.embeddings.len()),
    Err(Error::Api { status: 429, .. }) => eprintln!("rate limited"),
    Err(Error::Api { status, message }) => eprintln!("API error {status}: {message}"),
    Err(Error::Timeout(duration)) => eprintln!("timed out after {duration:?}"),
    Err(Error::Http(e)) => eprintln!("network error: {e}"),
    Err(e) => eprintln!("other error: {e}"),
}
# }

Why embedrs?

Aspect	embedrs	fastembed-rs	Raw reqwest
Cloud providers	6 built-in (OpenAI, Cohere, Gemini, Voyage, Jina, Mistral)	None	Manual per provider
Local inference	candle-based, 23MB default model	ONNX Runtime, multiple models	N/A
Unified interface	Same `EmbedResult` for cloud and local	Local only	N/A
Batch auto-chunking	Automatic by provider limits + concurrency	Manual	Manual
Provider fallback	Built-in `.with_fallback()` chain	N/A	Manual
Data-driven defaults	8-dimension benchmark across 8 models (`examples/embedding_models/`)	No published benchmark	N/A
Backoff & timeout	Built-in exponential backoff on 429/503	N/A	Manual

fastembed-rs is a solid choice if you only need local inference with ONNX Runtime and don't need cloud providers. embedrs is designed for applications that need both cloud and local through a single API, with opinionated defaults and production features like fallback and backoff.

Ecosystem

tiktoken · @goliapkg/tiktoken-wasm · instructors · chunkedrs · embedrs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
deny.toml		deny.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

embedrs

Design philosophy

Quick Start

Installation

Feature Flags

Benchmark Results

Why MiniLM-L6 for local

Why OpenAI for cloud

Providers

Batch Embedding

Similarity Functions

Input Type

Dimensions

Backoff and Timeout

Client Defaults

Provider Fallback

Cost Tracking

Error Handling

Why embedrs?

Ecosystem

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

embedrs

Design philosophy

Quick Start

Installation

Feature Flags

Benchmark Results

Why MiniLM-L6 for local

Why OpenAI for cloud

Providers

Batch Embedding

Similarity Functions

Input Type

Dimensions

Backoff and Timeout

Client Defaults

Provider Fallback

Cost Tracking

Error Handling

Why embedrs?

Ecosystem

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages