Embeddable Meta Search Engine
Utility layer β aggregate search results from multiple engines with ranking and deduplication
Features β’ Quick Start β’ SDKs β’ Architecture β’ API Reference β’ Development
A3S Search is an embeddable meta search engine library inspired by SearXNG. It aggregates search results from multiple search engines, deduplicates them, and ranks them using a consensus-based scoring algorithm.
use a3s_search::{Search, SearchQuery, engines::{DuckDuckGo, Wikipedia}};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// Create a new search instance
let mut search = Search::new();
// Add search engines
search.add_engine(DuckDuckGo::new());
search.add_engine(Wikipedia::new());
// Perform a search
let query = SearchQuery::new("rust programming");
let results = search.search(query).await?;
// Display results
for result in results.items().iter().take(10) {
println!("{}: {}", result.title, result.url);
println!(" Engines: {:?}, Score: {:.2}", result.engines, result.score);
}
Ok(())
}- Multi-Engine Search: Aggregate results from multiple search engines in parallel
- Result Deduplication: Merge duplicate results based on normalized URLs
- Consensus Ranking: Results found by multiple engines rank higher
- Configurable Weights: Adjust engine influence on final rankings
- Async-First: Built on Tokio for high-performance concurrent searches
- Timeout Handling: Per-engine timeout with graceful degradation
- Extensible: Easy to add custom search engines via the
Enginetrait - Dynamic Proxy Pool: IP rotation with pluggable
ProxyProvidertrait and auto-refresh - Health Monitor: Automatic engine suspension after repeated failures with configurable recovery
- HCL Configuration: Load engine and health settings from HCL config files
- Headless Browser: Optional Chrome/Chromium integration for JS-rendered engines (feature-gated)
- Auto-Install Chrome: Automatically detects or downloads Chrome for Testing when no browser is found
- PageFetcher Abstraction: Pluggable page fetching β
HttpFetcher,PooledHttpFetcher, orBrowserFetcher - CLI Tool: Command-line interface for quick searches
- Native SDKs: TypeScript (NAPI) and Python (PyO3) bindings with async support and dynamic proxy pool management
Homebrew (macOS):
brew tap a3s-lab/tap https://github.com/A3S-Lab/homebrew-tap
brew install a3s-searchCargo:
cargo install a3s-search# Basic search (uses DuckDuckGo and Wikipedia by default)
a3s-search "Rust programming"
# Search with specific engines
a3s-search "Rust programming" -e ddg,wiki,sogou
# Search with Google (Chrome auto-installed if needed)
a3s-search "Rust programming" -e g,ddg
# Search with Chinese headless engines
a3s-search "Rust ηΌη¨" -e baidu,bing_cn
# Limit results
a3s-search "Rust programming" -l 5
# JSON output
a3s-search "Rust programming" -f json
# Compact output (tab-separated)
a3s-search "Rust programming" -f compact
# Use proxy
a3s-search "Rust programming" -p http://127.0.0.1:8080
# SOCKS5 proxy
a3s-search "Rust programming" -p socks5://127.0.0.1:1080
# Verbose mode
a3s-search "Rust programming" -v
# List available engines
a3s-search engines| Shortcut | Engine | Description |
|---|---|---|
ddg |
DuckDuckGo | Privacy-focused search |
brave |
Brave | Brave Search |
bing |
Bing | Bing International |
wiki |
Wikipedia | Wikipedia API |
sogou |
Sogou | ζηζη΄’ |
360 |
360 Search | 360ζη΄’ |
g |
Google Search (Chrome auto-installed) | |
baidu |
Baidu | ηΎεΊ¦ζη΄’ (Chrome auto-installed) |
bing_cn |
Bing China | εΏ εΊδΈε½ (Chrome auto-installed) |
| Engine | Shortcut | Description |
|---|---|---|
| DuckDuckGo | ddg |
Privacy-focused search |
| Brave | brave |
Brave Search |
| Bing | bing |
Bing International |
| Wikipedia | wiki |
Wikipedia API |
g |
Google Search (headless browser) |
| Engine | Shortcut | Description |
|---|---|---|
| Sogou | sogou |
ζηζη΄’ |
| So360 | 360 |
360ζη΄’ |
| Baidu | baidu |
ηΎεΊ¦ζη΄’ (headless browser) |
| Bing China | bing_cn |
εΏ εΊδΈε½ (headless browser) |
When using headless engines (g, baidu, bing_cn), Chrome/Chromium is required. A3S Search handles this automatically:
- Detect β Checks
CHROMEenv var, PATH commands, and well-known install paths - Cache β Looks for a previously downloaded Chrome in
~/.a3s/chromium/ - Download β If not found, downloads Chrome for Testing from Google's official CDN
Supported platforms: macOS (arm64, x64), Linux (x64), and Windows (x64, x86).
# First run: Chrome is auto-downloaded if not installed
a3s-search "Rust programming" -e g
# Fetching Chrome for Testing version info...
# Downloading Chrome for Testing v145.0.7632.46 (mac-arm64)...
# Downloaded 150.2 MB, extracting...
# Chrome for Testing v145.0.7632.46 installed successfully!
# Subsequent runs: uses cached Chrome instantly
a3s-search "Rust programming" -e g
# Or set CHROME env var to use a specific binary
CHROME=/usr/bin/chromium a3s-search "query" -e gNative bindings for TypeScript and Python, powered by NAPI-RS and PyO3. No subprocess spawning β direct FFI calls to the Rust library.
cd sdk/node
npm install && npm run buildimport { A3SSearch } from '@a3s-lab/search';
const search = new A3SSearch();
// Simple search (uses DuckDuckGo + Wikipedia by default)
const response = await search.search('rust programming');
// With options
const response = await search.search('rust programming', {
engines: ['ddg', 'wiki', 'brave', 'bing'],
limit: 5,
timeout: 15,
proxy: 'http://127.0.0.1:8080',
});
// Dynamic proxy pool (IP rotation)
await search.setProxyPool([
'http://10.0.0.1:8080',
'http://10.0.0.2:8080',
'socks5://10.0.0.3:1080',
]);
const response = await search.search('rust programming');
// Toggle proxy pool at runtime
search.setProxyPoolEnabled(false); // direct connection
search.setProxyPoolEnabled(true); // re-enable rotation
for (const r of response.results) {
console.log(`${r.title}: ${r.url} (score: ${r.score})`);
}
console.log(`${response.count} results in ${response.durationMs}ms`);cd sdk/python
maturin developfrom a3s_search import A3SSearch
search = A3SSearch()
# Simple search (uses DuckDuckGo + Wikipedia by default)
response = await search.search("rust programming")
# With options
response = await search.search("rust programming",
engines=["ddg", "wiki", "brave", "bing"],
limit=5,
timeout=15,
proxy="http://127.0.0.1:8080",
)
# Dynamic proxy pool (IP rotation)
await search.set_proxy_pool([
"http://10.0.0.1:8080",
"http://10.0.0.2:8080",
"socks5://10.0.0.3:1080",
])
response = await search.search("rust programming")
# Toggle proxy pool at runtime
search.set_proxy_pool_enabled(False) # direct connection
search.set_proxy_pool_enabled(True) # re-enable rotation
for r in response.results:
print(f"{r.title}: {r.url} (score: {r.score})")
print(f"{response.count} results in {response.duration_ms}ms")Both SDKs support all engines (HTTP and headless):
| Shortcut | Aliases | Engine | Type |
|---|---|---|---|
ddg |
duckduckgo |
DuckDuckGo | HTTP |
brave |
β | Brave Search | HTTP |
bing |
β | Bing International | HTTP |
wiki |
wikipedia |
Wikipedia API | HTTP |
sogou |
β | Sogou (ζη) | HTTP |
360 |
so360 |
360 Search (360ζη΄’) | HTTP |
g |
google |
Google Search | Headless |
baidu |
β | Baidu (ηΎεΊ¦) | Headless |
bing_cn |
β | Bing China (εΏ εΊδΈε½) | Headless |
Headless engines require Chrome. Pre-download it after install:
Python:
# Option 1: CLI command (added to PATH on install)
a3s-search-setup
# Option 2: Python module
python -m a3s_search.ensure_chrome
# Option 3: In code (async)
from a3s_search import ensure_chrome
path = await ensure_chrome()Node.js:
# Runs automatically on npm install via postinstall script
# Or manually:
node -e "require('@a3s-lab/search').ensureChrome().then(console.log)"// In code
import { ensureChrome } from '@a3s-lab/search';
const path = await ensureChrome();
console.log(`Chrome at: ${path}`);# Node.js (49 tests)
cd sdk/node && npm test
# Python (54 tests)
cd sdk/python && pytest298 library + 31 CLI + 103 SDK = 401 total tests with 91.15% Rust line coverage:
| Module | Lines | Coverage | Functions | Coverage |
|---|---|---|---|---|
| engine.rs | 116 | 100.00% | 17 | 100.00% |
| error.rs | 52 | 100.00% | 10 | 100.00% |
| query.rs | 114 | 100.00% | 20 | 100.00% |
| result.rs | 194 | 100.00% | 35 | 100.00% |
| aggregator.rs | 292 | 100.00% | 30 | 100.00% |
| search.rs | 337 | 99.41% | 58 | 100.00% |
| proxy.rs | 410 | 99.02% | 91 | 96.70% |
| engines/duckduckgo.rs | 236 | 97.46% | 27 | 81.48% |
| engines/bing_china.rs | 164 | 96.95% | 18 | 77.78% |
| engines/baidu.rs | 146 | 96.58% | 17 | 76.47% |
| engines/google.rs | 180 | 96.11% | 19 | 73.68% |
| engines/brave.rs | 140 | 95.71% | 20 | 75.00% |
| engines/so360.rs | 132 | 95.45% | 18 | 77.78% |
| engines/sogou.rs | 131 | 95.42% | 17 | 76.47% |
| fetcher_http.rs | 29 | 93.10% | 7 | 85.71% |
| fetcher.rs | 73 | 93.15% | 10 | 100.00% |
| engines/wikipedia.rs | 153 | 90.85% | 26 | 88.46% |
| browser.rs | 244 | 68.85% | 42 | 61.90% |
| browser_setup.rs | 406 | 58.13% | 65 | 49.23% |
| TOTAL | 3549 | 91.15% | 547 | 84.10% |
Note: browser.rs and browser_setup.rs have lower coverage because BrowserPool::acquire_browser(), BrowserFetcher::fetch(), and download_chrome() require a running Chrome process or network access. Integration tests verify real browser functionality but are #[ignore] by default.
SDK tests (49 Node.js + 54 Python = 103 tests) cover error classes, type contracts, input validation, engine validation, and integration with all 5 HTTP engines.
Run coverage report:
# Default (19 modules, 267 tests, 91.15% coverage)
just test-cov
# Without headless (14 modules)
just test-cov --no-default-features
# Detailed file-by-file table
just cov-table
# HTML report (opens in browser)
just cov-html# Default build (9 engines, 244+ lib tests)
cargo test -p a3s-search --lib
# Without headless (6 engines)
cargo test -p a3s-search --no-default-features --lib
# Integration tests (requires network + Chrome for Google)
cargo test -p a3s-search -- --ignored
# With progress display (via justfile)
just test
# SDK tests (requires native build first)
cd sdk/node && npm test # 49 tests (vitest)
cd sdk/python && pytest # 54 tests (pytest)A3S Search is a meta search engine that aggregates results from multiple search engines, deduplicates them, and ranks them using a consensus-based algorithm. It supports both HTTP-based engines and JavaScript-rendered engines via headless browsers.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β A3S Search System β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Rust API β β Python SDK β β Node.js SDK β β
β β (Core) ββββββ€ (PyO3) β β (NAPI-RS) β β
β ββββββββ¬ββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Search Orchestrator β β
β β β’ Query parsing & validation β β
β β β’ Engine selection & filtering β β
β β β’ Parallel execution (tokio::join_all) β β
β β β’ Timeout handling (per-engine) β β
β β β’ Health monitoring (auto-suspend failed engines) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Engine Layer β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β HTTP Engines β β Headless β β Custom β β β
β β β β β Engines β β Engines β β β
β β β β’ DuckDuckGo β β β’ Google β β β’ User- β β β
β β β β’ Brave β β β’ Baidu β β defined β β β
β β β β’ Bing β β β’ BingChina β β (trait) β β β
β β β β’ Wikipedia β β β β β β β
β β β β’ Sogou β β β β β β β
β β β β’ 360 β β β β β β β
β β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β β
β βββββββββββΌβββββββββββββββββββΌβββββββββββββββββββΌββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PageFetcher Layer β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β HttpFetcher β βPooledHttp β βBrowser β β β
β β β β βFetcher β βFetcher β β β
β β β β’ reqwest β β β’ ProxyPool β β β’ Lightpanda β β β
β β β β’ single β β β’ Round-robinβ β β’ Chrome β β β
β β β proxy β β β’ IP rotationβ β β’ CDP β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββ¬ββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Browser Pool β β
β β β’ Shared browser process (Lightpanda/Chrome) β β
β β β’ Tab concurrency control (semaphore) β β
β β β’ Auto-download & cache (~/.a3s/) β β
β β β’ CDP connection management β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Aggregator β β
β β β’ URL normalization & deduplication β β
β β β’ Consensus-based scoring β β
β β β’ Result merging & ranking β β
β β β’ Suggestions & answers extraction β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SearchResults β β
β β β’ Ranked results (by score) β β
β β β’ Engine attribution β β
β β β’ Metadata (duration, count, errors) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Query Processing: Parses and validates search queries
- Engine Selection: Filters engines based on query requirements
- Parallel Execution: Executes all engines concurrently using
tokio::join_all - Timeout Management: Per-engine timeout with graceful degradation
- Health Monitoring: Tracks engine failures and auto-suspends unhealthy engines
- HTTP Engines: Direct HTTP requests (DuckDuckGo, Brave, Bing, Wikipedia, Sogou, 360)
- Headless Engines: JavaScript rendering via browser (Google, Baidu, BingChina)
- Custom Engines: User-defined engines via
Enginetrait
- HttpFetcher: Simple HTTP client with optional proxy
- PooledHttpFetcher: Proxy pool with round-robin IP rotation
- BrowserFetcher: Headless browser rendering (Lightpanda/Chrome)
- Shared Process: Single browser instance shared across all headless engines
- Tab Concurrency: Semaphore-based tab limit (default: 4)
- Auto-Setup: Automatic browser detection and download
- CDP Protocol: Chrome DevTools Protocol for page control
- Deduplication: Normalizes URLs and merges duplicate results
- Scoring: Consensus-based ranking algorithm
- Merging: Combines results from multiple engines
- Extraction: Pulls out suggestions and instant answers
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Search Execution Flow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. User Query
β
βββΊ "rust programming"
β engines: ["ddg", "brave", "google"]
β limit: 10
β timeout: 15s
β
βΌ
2. Query Validation & Parsing
β
βββΊ SearchQuery {
β query: "rust programming",
β categories: [General],
β language: "en",
β safesearch: Moderate,
β page: 1
β }
β
βΌ
3. Engine Selection & Filtering
β
βββΊ Selected Engines:
β β’ DuckDuckGo (HTTP, weight: 1.0)
β β’ Brave (HTTP, weight: 1.0)
β β’ Google (Headless, weight: 1.0)
β
βΌ
4. Parallel Engine Execution (tokio::join_all)
β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
βDuckDuckGoβ β Brave β β Google β β Health β
β β β β β β β Monitor β
β HTTP GET β β HTTP GET β β Browser β β β
β β β β β β β Render β β Track β
β Parse β β Parse β β β β β Failures β
β HTML β β HTML β β Parse β β β
β β β β β β β HTML β β β
β Results β β Results β β β β β β
β [10] β β [10] β β Results β β β
β β β β β [10] β β β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββββββ
β β β
β β β (timeout: 15s each)
β β β
βββββββββββββββββ΄ββββββββββββββββ
β
βΌ
5. Result Aggregation
β
βββΊ Collect all results (30 total)
β β’ DuckDuckGo: 10 results
β β’ Brave: 10 results
β β’ Google: 10 results
β
βΌ
6. URL Normalization & Deduplication
β
βββΊ Normalize URLs:
β β’ Remove tracking params
β β’ Lowercase domain
β β’ Remove www prefix
β β’ Normalize path
β
βββΊ Merge duplicates:
β β’ Same URL from multiple engines
β β’ Combine engine lists
β β’ Merge positions
β
βββΊ Result: 18 unique results
β
βΌ
7. Consensus-Based Scoring
β
βββΊ For each result:
β score = Ξ£ (weight / position) for each engine
β weight = engine_weight Γ num_engines_found
β
βββΊ Example:
β Result A found by DuckDuckGo (#1) and Brave (#2):
β score = (1.0 Γ 2 / 1) + (1.0 Γ 2 / 2) = 2.0 + 1.0 = 3.0
β
β Result B found only by Google (#1):
β score = (1.0 Γ 1 / 1) = 1.0
β
β β Result A ranks higher (consensus bonus)
β
βΌ
8. Sorting & Limiting
β
βββΊ Sort by score (descending)
βββΊ Apply limit (10 results)
β
βΌ
9. SearchResults
β
βββΊ {
results: [10 ranked results],
count: 10,
duration_ms: 1234,
errors: [],
suggestions: ["rust tutorial", "rust book"],
answers: []
}
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Headless Engine Execution (Google) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Engine Initialization
β
βββΊ Check if browser needed
β β’ Engine: Google (requires JS rendering)
β β’ Browser: Lightpanda (default) or Chrome
β
βΌ
2. Browser Detection & Setup
β
βββΊ Lightpanda (default):
β ββ Check LIGHTPANDA env var
β ββ Check PATH for lightpanda command
β ββ Check cache: ~/.a3s/lightpanda/<tag>/
β ββ If not found: Download from GitHub releases
β ββ Platform: macOS arm64
β ββ URL: github.com/lightpanda-io/browser/releases
β ββ Download: lightpanda-darwin-aarch64.tar.gz
β ββ Extract to: ~/.a3s/lightpanda/nightly/
β ββ Set executable: chmod +x lightpanda
β
βββΊ Chrome (fallback, browser="chrome"):
β ββ Check CHROME env var
β ββ Check PATH for chrome/chromium commands
β ββ Check known paths (/Applications/Google Chrome.app, etc.)
β ββ Check cache: ~/.a3s/chromium/<version>/
β ββ If not found: Download Chrome for Testing
β ββ Platform: mac-arm64
β ββ URL: googlechromelabs.github.io/chrome-for-testing
β ββ Download: chrome-mac-arm64.zip (150MB)
β ββ Extract to: ~/.a3s/chromium/131.0.6778.85/
β ββ Return: chrome executable path
β
βΌ
3. Browser Pool Initialization
β
βββΊ BrowserPool::new(config)
β β’ backend: Lightpanda (default)
β β’ max_tabs: 4
β β’ headless: true
β β’ proxy_url: None
β
βΌ
4. Browser Launch (Lazy, on first request)
β
βββΊ Lightpanda:
β ββ Find free port (OS-assigned)
β ββ Spawn: lightpanda serve --host 127.0.0.1 --port <port>
β ββ Wait for CDP server ready (TCP connect)
β ββ Connect via WebSocket: ws://127.0.0.1:<port>
β
βββΊ Chrome:
β ββ Launch with args:
β β --headless=new
β β --disable-gpu
β β --no-sandbox
β β --disable-blink-features=AutomationControlled
β β --user-agent=<realistic UA>
β ββ Connect via CDP (chromiumoxide)
β
βΌ
5. Page Rendering
β
βββΊ Acquire tab permit (semaphore, max 4 concurrent)
β
βββΊ Create new tab
β β’ browser.new_page(url)
β
βββΊ Navigate to search URL
β β’ https://www.google.com/search?q=rust+programming
β
βββΊ Wait for page load
β β’ Strategy: Load (default)
β β’ Alternatives: NetworkIdle, Selector, Delay
β
βββΊ Extract rendered HTML
β β’ page.content() β full HTML string
β
βββΊ Close tab
β β’ page.close()
β
βββΊ Release tab permit
β
βΌ
6. HTML Parsing
β
βββΊ Parse with scraper (HTML5 parser)
β
βββΊ Extract results:
β β’ Selector: div.g (Google result container)
β β’ Title: h3
β β’ URL: a[href]
β β’ Snippet: div.VwiC3b
β
βββΊ Handle errors:
β β’ CAPTCHA detection
β β’ Rate limiting
β β’ Empty results
β
βΌ
7. Return Results
β
βββΊ Vec<SearchResult> [10 results]
The scoring algorithm is based on SearXNG's approach with consensus weighting:
score = Ξ£ (weight / position) for each engine
weight = engine_weight Γ num_engines_found
Key factors:
- Engine Weight: Configurable per-engine multiplier (default: 1.0)
- Consensus: Results found by multiple engines score higher
- Position: Earlier positions in individual engines score higher
Example Calculation:
Query: "rust programming"
Engines: DuckDuckGo (weight: 1.0), Brave (weight: 1.0), Google (weight: 1.0)
Result A: "The Rust Programming Language"
β’ Found by DuckDuckGo at position 1
β’ Found by Brave at position 1
β’ Found by Google at position 2
β’ num_engines_found = 3
score = (1.0 Γ 3 / 1) + (1.0 Γ 3 / 1) + (1.0 Γ 3 / 2)
= 3.0 + 3.0 + 1.5
= 7.5
Result B: "Rust Tutorial"
β’ Found by DuckDuckGo at position 3
β’ Found by Google at position 5
β’ num_engines_found = 2
score = (1.0 Γ 2 / 3) + (1.0 Γ 2 / 5)
= 0.67 + 0.4
= 1.07
β Result A ranks higher (consensus + better positions)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Proxy Pool System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. ProxyPool (Arc<ProxyPool>)
β
βββΊ Configuration:
β β’ Strategy: RoundRobin | Random
β β’ Enabled: AtomicBool (thread-safe toggle)
β β’ Proxies: RwLock<Vec<ProxyConfig>>
β
βββΊ Static Mode:
β ProxyPool::with_proxies([
β "http://10.0.0.1:8080",
β "socks5://10.0.0.2:1080",
β ])
β
βββΊ Dynamic Mode:
β ProxyPool::with_provider(MyProxyProvider)
β β’ Implements ProxyProvider trait
β β’ fetch_proxies() β Vec<ProxyConfig>
β β’ refresh_interval() β Duration
β
βββΊ Auto-Refresh:
spawn_auto_refresh(pool)
β’ Background task
β’ Periodic refresh
β’ Updates pool atomically
β
βΌ
2. PooledHttpFetcher
β
βββΊ Per-Request Rotation:
β β’ get_proxy() β ProxyConfig
β β’ create_client(proxy) β reqwest::Client
β β’ fetch(url) β HTML
β
βββΊ Strategy: RoundRobin
β Request 1 β Proxy A
β Request 2 β Proxy B
β Request 3 β Proxy C
β Request 4 β Proxy A (cycle)
β
βββΊ Strategy: Random
Request 1 β Proxy B
Request 2 β Proxy A
Request 3 β Proxy B
Request 4 β Proxy C
β
βΌ
3. Runtime Control
β
βββΊ Enable/Disable (thread-safe):
β pool.set_enabled(false) // Direct connection
β pool.set_enabled(true) // Resume rotation
β
βββΊ No restart required
β’ AtomicBool check on each request
β’ Instant toggle
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Health Monitor System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Configuration
β
βββΊ HealthConfig {
β max_failures: 5, // Suspend after 5 consecutive failures
β suspend_duration: 120s // Suspend for 2 minutes
β }
β
βΌ
2. Failure Tracking (per engine)
β
βββΊ Success: Reset counter to 0
β
βββΊ Failure: Increment counter
β β’ Network error
β β’ Timeout
β β’ Parse error
β β’ CAPTCHA
β
βββΊ Threshold Reached:
β failures >= max_failures
β β Suspend engine
β β Record suspend_until timestamp
β
βΌ
3. Engine Suspension
β
βββΊ Suspended Engine:
β β’ Skipped in search execution
β β’ Not counted in results
β β’ Logged as suspended
β
βββΊ Auto-Recovery:
β β’ Check suspend_until on each search
β β’ If current_time > suspend_until:
β β Re-enable engine
β β Reset failure counter
β
βΌ
4. Example Timeline
β
βββΊ T=0s: Google search succeeds (failures: 0)
βββΊ T=10s: Google search fails (failures: 1)
βββΊ T=20s: Google search fails (failures: 2)
βββΊ T=30s: Google search fails (failures: 3)
βββΊ T=40s: Google search fails (failures: 4)
βββΊ T=50s: Google search fails (failures: 5)
β β SUSPENDED until T=170s
βββΊ T=60s: Google skipped (suspended)
βββΊ T=170s: Google re-enabled (auto-recovery)
βββΊ T=180s: Google search succeeds (failures: 0)
PageFetcher (trait)
βββ HttpFetcher (reqwest, plain HTTP, single proxy)
βββ PooledHttpFetcher (reqwest, proxy pool rotation)
βββ BrowserFetcher (chromiumoxide, headless browser)
βββ BrowserPool (shared process, tab semaphore)
βββ Lightpanda (default, 59MB, <100ms startup)
βββ Chrome (fallback, 200MB, 1-2s startup)
Add to your Cargo.toml:
[dependencies]
a3s-search = "0.8"
tokio = { version = "1", features = ["full"] }
# To disable headless browser support:
# a3s-search = { version = "0.8", default-features = false }use a3s_search::{Search, SearchQuery, engines::DuckDuckGo};
let mut search = Search::new();
search.add_engine(DuckDuckGo::new());
let query = SearchQuery::new("rust async");
let results = search.search(query).await?;
println!("Found {} results", results.count);use a3s_search::{Search, SearchQuery, engines::{Sogou, So360}};
let mut search = Search::new();
search.add_engine(Sogou::new()); // ζη
search.add_engine(So360::new()); // 360ζη΄’
let query = SearchQuery::new("Rust ηΌη¨θ―θ¨");
let results = search.search(query).await?;use a3s_search::{SearchQuery, EngineCategory, SafeSearch, TimeRange};
let query = SearchQuery::new("rust tutorial")
.with_categories(vec![EngineCategory::General])
.with_language("en-US")
.with_safesearch(SafeSearch::Moderate)
.with_page(1)
.with_time_range(TimeRange::Month);use a3s_search::{Search, EngineConfig, engines::Wikipedia};
// Wikipedia results will have 1.5x weight
let wiki = Wikipedia::new().with_config(EngineConfig {
name: "Wikipedia".to_string(),
shortcut: "wiki".to_string(),
weight: 1.5,
..Default::default()
});
let mut search = Search::new();
search.add_engine(wiki);use std::sync::Arc;
use a3s_search::{Search, SearchQuery, PooledHttpFetcher, PageFetcher};
use a3s_search::engines::{DuckDuckGo, DuckDuckGoParser};
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProtocol, ProxyStrategy};
// Create a proxy pool with multiple proxies
let pool = Arc::new(ProxyPool::with_proxies(vec![
ProxyConfig::new("proxy1.example.com", 8080),
ProxyConfig::new("proxy2.example.com", 8080)
.with_protocol(ProxyProtocol::Socks5),
ProxyConfig::new("proxy3.example.com", 8080)
.with_auth("username", "password"),
]).with_strategy(ProxyStrategy::RoundRobin));
// PooledHttpFetcher rotates proxies per request
let fetcher: Arc<dyn PageFetcher> = Arc::new(PooledHttpFetcher::new(Arc::clone(&pool)));
let mut search = Search::new();
search.add_engine(DuckDuckGo::with_fetcher(DuckDuckGoParser, fetcher));
let query = SearchQuery::new("rust programming");
let results = search.search(query).await?;
// Toggle proxy pool at runtime (thread-safe via AtomicBool)
pool.set_enabled(false); // direct connection
pool.set_enabled(true); // re-enable rotationuse std::sync::Arc;
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProvider, spawn_auto_refresh};
use async_trait::async_trait;
use std::time::Duration;
// Implement custom proxy provider (e.g., from API, Redis, database)
struct MyProxyProvider {
api_url: String,
}
#[async_trait]
impl ProxyProvider for MyProxyProvider {
async fn fetch_proxies(&self) -> a3s_search::Result<Vec<ProxyConfig>> {
// Fetch proxies from your API β format is up to you
Ok(vec![
ProxyConfig::new("dynamic-proxy.example.com", 8080),
])
}
fn refresh_interval(&self) -> Duration {
Duration::from_secs(60) // Refresh every minute
}
}
// Use with auto-refresh background task
let pool = Arc::new(ProxyPool::with_provider(
MyProxyProvider { api_url: "https://api.example.com/proxies".into() }
));
let _refresh_handle = spawn_auto_refresh(Arc::clone(&pool));
// Pool now auto-refreshes every 60 secondsuse a3s_search::{Engine, EngineConfig, EngineCategory, SearchQuery, SearchResult, Result};
use async_trait::async_trait;
struct MySearchEngine {
config: EngineConfig,
}
impl MySearchEngine {
fn new() -> Self {
Self {
config: EngineConfig {
name: "MyEngine".to_string(),
shortcut: "my".to_string(),
categories: vec![EngineCategory::General],
weight: 1.0,
timeout: 5,
enabled: true,
paging: false,
safesearch: false,
},
}
}
}
#[async_trait]
impl Engine for MySearchEngine {
fn config(&self) -> &EngineConfig {
&self.config
}
async fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>> {
// Implement your search logic here
Ok(vec![
SearchResult::new(
"https://example.com",
"Example Result",
"This is an example search result"
)
])
}
}| Method | Description |
|---|---|
new() |
Create a new search instance |
with_health_config(config) |
Create with health monitoring |
add_engine(engine) |
Add a search engine |
set_timeout(duration) |
Set default search timeout |
engine_count() |
Get number of configured engines |
search(query) |
Perform a search |
| Method | Description |
|---|---|
new(query) |
Create a new query |
with_categories(cats) |
Set target categories |
with_language(lang) |
Set language/locale |
with_safesearch(level) |
Set safe search level |
with_page(page) |
Set page number |
with_time_range(range) |
Set time range filter |
with_engines(engines) |
Limit to specific engines |
| Field | Type | Description |
|---|---|---|
url |
String |
Result URL |
title |
String |
Result title |
content |
String |
Result snippet |
result_type |
ResultType |
Type of result |
engines |
HashSet<String> |
Engines that found this |
positions |
Vec<u32> |
Positions in each engine |
score |
f64 |
Calculated ranking score |
thumbnail |
Option<String> |
Thumbnail URL |
published_date |
Option<String> |
Publication date |
| Method | Description |
|---|---|
items() |
Get result slice |
suggestions() |
Get query suggestions |
answers() |
Get direct answers |
count |
Number of results |
duration_ms |
Search duration in ms |
#[async_trait]
pub trait Engine: Send + Sync {
/// Returns the engine configuration
fn config(&self) -> &EngineConfig;
/// Performs a search and returns results
async fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>>;
/// Returns the engine name
fn name(&self) -> &str { &self.config().name }
/// Returns the engine shortcut
fn shortcut(&self) -> &str { &self.config().shortcut }
/// Returns the engine weight
fn weight(&self) -> f64 { self.config().weight }
/// Returns whether the engine is enabled
fn is_enabled(&self) -> bool { self.config().enabled }
}| Field | Type | Default | Description |
|---|---|---|---|
name |
String |
- | Display name |
shortcut |
String |
- | Short identifier |
categories |
Vec<EngineCategory> |
[General] |
Categories |
weight |
f64 |
1.0 |
Ranking weight |
timeout |
u64 |
5 |
Timeout in seconds |
enabled |
bool |
true |
Is enabled |
paging |
bool |
false |
Supports pagination |
safesearch |
bool |
false |
Supports safe search |
| Method | Description |
|---|---|
new() |
Create empty proxy pool (disabled) |
with_proxies(proxies) |
Create with static proxy list |
with_provider(provider) |
Create with dynamic provider |
with_strategy(strategy) |
Set selection strategy |
set_enabled(bool) |
Enable/disable proxy pool (thread-safe, &self) |
is_enabled() |
Check if enabled |
refresh() |
Refresh proxies from provider |
get_proxy() |
Get next proxy (based on strategy) |
add_proxy(proxy) |
Add a proxy to pool |
remove_proxy(host, port) |
Remove a proxy |
len() |
Number of proxies in pool |
create_client(user_agent) |
Create HTTP client with proxy |
| Method | Description |
|---|---|
new(pool) |
Create with Arc<ProxyPool> β rotates proxy per request |
with_timeout(duration) |
Set request timeout (default: 30s) |
pub fn spawn_auto_refresh(pool: Arc<ProxyPool>) -> tokio::task::JoinHandle<()>Spawns a background task that periodically calls pool.refresh() based on the provider's refresh_interval(). Returns a handle that can be aborted to stop refreshing.
| Field/Method | Description |
|---|---|
HealthConfig { max_failures, suspend_duration } |
Configure failure threshold and suspension time |
Search::with_health_config(config) |
Create search with health monitoring |
Engines are automatically suspended after max_failures consecutive failures and re-enabled after suspend_duration.
| Method | Description |
|---|---|
SearchConfig::load(path) |
Load config from .hcl file |
SearchConfig::parse(content) |
Parse HCL string |
health_config() |
Get HealthConfig from config |
enabled_engines() |
Get list of enabled engine shortcuts |
Example HCL config:
timeout = 10
health {
max_failures = 5
suspend_seconds = 120
}
engine "ddg" {
enabled = true
weight = 1.0
}
engine "bing" {
enabled = true
weight = 1.2
}| Method | Description |
|---|---|
new(host, port) |
Create HTTP proxy config |
with_protocol(protocol) |
Set protocol (Http/Https/Socks5) |
with_auth(user, pass) |
Set authentication |
url() |
Get proxy URL string |
| Variant | Description |
|---|---|
RoundRobin |
Rotate through proxies sequentially |
Random |
Select random proxy each time |
| Dependency | Install | Purpose |
|---|---|---|
cargo-llvm-cov |
cargo install cargo-llvm-cov |
Code coverage (optional) |
lcov |
brew install lcov / apt install lcov |
Coverage report formatting (optional) |
| Chrome/Chromium | Auto-installed | For headless browser engines (auto-downloaded if not found) |
# Build (default, 8 engines including headless)
just build
# Build without headless browser support (5 engines)
just build --no-default-features
# Build release
just release
# Test (with colored progress display)
just test # All tests with pretty output
just test-raw # Raw cargo output
just test-v # Verbose output (--nocapture)
just test-one TEST # Run specific test
# Test subsets
just test-engine # Engine module tests
just test-query # Query module tests
just test-result # Result module tests
just test-search # Search module tests
just test-aggregator # Aggregator module tests
just test-proxy # Proxy module tests
just test-error # Error module tests
# Coverage (requires cargo-llvm-cov)
just test-cov # Pretty coverage with progress
just cov # Terminal coverage report
just cov-html # HTML report (opens in browser)
just cov-table # File-by-file table
just cov-ci # Generate lcov.info for CI
just cov-module proxy # Coverage for specific module
# Format & Lint
just fmt # Format code
just fmt-check # Check formatting
just lint # Clippy lint
just ci # Full CI checks (fmt + lint + test)
# Utilities
just check # Fast compile check
just watch # Watch and rebuild
just doc # Generate and open docs
just clean # Clean build artifacts
just update # Update dependenciesSee RELEASE.md for detailed release instructions.
Quick release:
# Check GitHub secrets are configured
./scripts/check-secrets.sh
# Release new version (runs tests, commits, tags, pushes)
./scripts/release.sh 0.9.0
# Monitor CI/CD progress
gh run watch --repo A3S-Lab/SearchThe release workflow automatically:
- β Runs CI checks (fmt, clippy, tests)
- π¦ Publishes to crates.io
- π Publishes Python SDK to PyPI (7 platforms)
- π¦ Publishes Node.js SDK to npm (7 platforms)
- πΊ Updates Homebrew formula
- π Creates GitHub Release with CLI binaries
search/
βββ Cargo.toml
βββ justfile
βββ README.md
βββ RELEASE.md # Release guide
βββ .github/
β βββ setup-workspace.sh # CI workspace restructuring
β βββ workflows/
β βββ ci.yml # Push/PR checks
β βββ release.yml # Tag-triggered release
β βββ publish-node.yml # Node SDK publishing
β βββ publish-python.yml # Python SDK publishing
βββ scripts/
β βββ release.sh # Automated release script
β βββ check-secrets.sh # Check GitHub secrets
βββ examples/
β βββ basic_search.rs # Basic usage example
β βββ chinese_search.rs # Chinese engines example
βββ tests/
β βββ integration.rs # Integration tests (network-dependent)
βββ sdk/
β βββ node/ # TypeScript SDK (NAPI-RS)
β β βββ Cargo.toml # Rust cdylib crate
β β βββ src/ # Rust NAPI bindings
β β βββ lib/ # TypeScript wrappers
β β βββ tests/ # vitest tests (49 tests)
β β βββ package.json
β βββ python/ # Python SDK (PyO3)
β βββ Cargo.toml # Rust cdylib crate
β βββ src/ # Rust PyO3 bindings
β βββ a3s_search/ # Python wrappers
β βββ tests/ # pytest tests (54 tests)
β βββ pyproject.toml
βββ src/
βββ main.rs # CLI entry point
βββ lib.rs # Library entry point
βββ engine.rs # Engine trait and config
βββ error.rs # Error types
βββ query.rs # SearchQuery
βββ result.rs # SearchResult, SearchResults
βββ aggregator.rs # Result aggregation and ranking
βββ search.rs # Search orchestrator with HealthMonitor
βββ config.rs # HCL configuration loading
βββ health.rs # HealthMonitor, HealthConfig
βββ proxy.rs # ProxyPool, ProxyProvider, spawn_auto_refresh
βββ fetcher.rs # PageFetcher trait, WaitStrategy
βββ fetcher_http.rs # HttpFetcher + PooledHttpFetcher
βββ html_engine.rs # HtmlEngine<P> generic engine framework
βββ browser.rs # BrowserPool, BrowserFetcher (headless browser)
βββ browser_setup.rs # Chrome auto-detection and download
βββ engines/
βββ mod.rs # Engine exports
βββ duckduckgo.rs # DuckDuckGo
βββ brave.rs # Brave Search
βββ bing.rs # Bing International
βββ google.rs # Google (headless browser)
βββ wikipedia.rs # Wikipedia
βββ baidu.rs # Baidu (ηΎεΊ¦, headless browser)
βββ bing_china.rs # Bing China (εΏ
εΊδΈε½, headless browser)
βββ sogou.rs # Sogou (ζη)
βββ so360.rs # 360 Search (360ζη΄’)
A3S Search is a utility component of the A3S ecosystem.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β A3S Ecosystem β
β β
β Infrastructure: a3s-box (MicroVM sandbox) β
β β β
β Application: a3s-code (AI coding agent) β
β / \ β
β Utilities: a3s-lane a3s-context a3s-search β
β (queue) (memory) (search) β
β β² β
β β β
β You are here β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Standalone Usage: a3s-search works independently for any meta search needs:
- AI agents needing web search capabilities
- Privacy-focused search aggregation
- Research tools requiring multi-source results
- Any application needing unified search across engines
- Engine trait abstraction
- Result deduplication by URL
- Consensus-based ranking algorithm
- Parallel async search execution
- Per-engine timeout handling
- 9 built-in engines (5 international + 4 Chinese)
- Bing International engine (HTTP, no headless required)
- Headless browser support for JS-rendered engines (Google, Baidu, Bing China β enabled by default)
- PageFetcher abstraction (HttpFetcher + PooledHttpFetcher + BrowserFetcher)
- BrowserPool with tab concurrency control
- Dynamic proxy pool with pluggable
ProxyProvidertrait andspawn_auto_refresh -
PooledHttpFetcherfor per-request proxy IP rotation - Runtime proxy pool toggle via
AtomicBool(set_enabled(&self)) - Health monitoring with automatic engine suspension and recovery
- HCL configuration file loading for engines and health settings
- CLI tool with Homebrew distribution
- Automatic Chrome detection and download (Chrome for Testing)
- Proxy support for all engines via
-pflag (HTTP/HTTPS/SOCKS5) - UTF-8 safe content truncation for CJK/emoji
- Native SDKs: TypeScript (NAPI-RS) and Python (PyO3) with dynamic proxy pool management
- SDK proxy pool:
setProxyPool(),setProxyPoolEnabled(), per-requestproxyPooloption
- Enable headless feature in Python and Node SDKs (all 9 engines available)
-
ensure_chrome()/ensure_chrome_sync()bindings for Python and Node SDKs - Python post-install:
a3s-search-setupCLI +python -m a3s_search.ensure_chrome - Node post-install: automatic Chrome download on
npm install
Join us on Discord for questions, discussions, and updates.
MIT