Skip to content

A3S-Lab/Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A3S Search

Embeddable Meta Search Engine

Utility layer β€” aggregate search results from multiple engines with ranking and deduplication

Features β€’ Quick Start β€’ SDKs β€’ Architecture β€’ API Reference β€’ Development


Overview

A3S Search is an embeddable meta search engine library inspired by SearXNG. It aggregates search results from multiple search engines, deduplicates them, and ranks them using a consensus-based scoring algorithm.

Basic Usage

use a3s_search::{Search, SearchQuery, engines::{DuckDuckGo, Wikipedia}};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create a new search instance
    let mut search = Search::new();

    // Add search engines
    search.add_engine(DuckDuckGo::new());
    search.add_engine(Wikipedia::new());

    // Perform a search
    let query = SearchQuery::new("rust programming");
    let results = search.search(query).await?;

    // Display results
    for result in results.items().iter().take(10) {
        println!("{}: {}", result.title, result.url);
        println!("  Engines: {:?}, Score: {:.2}", result.engines, result.score);
    }

    Ok(())
}

Features

  • Multi-Engine Search: Aggregate results from multiple search engines in parallel
  • Result Deduplication: Merge duplicate results based on normalized URLs
  • Consensus Ranking: Results found by multiple engines rank higher
  • Configurable Weights: Adjust engine influence on final rankings
  • Async-First: Built on Tokio for high-performance concurrent searches
  • Timeout Handling: Per-engine timeout with graceful degradation
  • Extensible: Easy to add custom search engines via the Engine trait
  • Dynamic Proxy Pool: IP rotation with pluggable ProxyProvider trait and auto-refresh
  • Health Monitor: Automatic engine suspension after repeated failures with configurable recovery
  • HCL Configuration: Load engine and health settings from HCL config files
  • Headless Browser: Optional Chrome/Chromium integration for JS-rendered engines (feature-gated)
  • Auto-Install Chrome: Automatically detects or downloads Chrome for Testing when no browser is found
  • PageFetcher Abstraction: Pluggable page fetching β€” HttpFetcher, PooledHttpFetcher, or BrowserFetcher
  • CLI Tool: Command-line interface for quick searches
  • Native SDKs: TypeScript (NAPI) and Python (PyO3) bindings with async support and dynamic proxy pool management

CLI Usage

Installation

Homebrew (macOS):

brew tap a3s-lab/tap https://github.com/A3S-Lab/homebrew-tap
brew install a3s-search

Cargo:

cargo install a3s-search

Commands

# Basic search (uses DuckDuckGo and Wikipedia by default)
a3s-search "Rust programming"

# Search with specific engines
a3s-search "Rust programming" -e ddg,wiki,sogou

# Search with Google (Chrome auto-installed if needed)
a3s-search "Rust programming" -e g,ddg

# Search with Chinese headless engines
a3s-search "Rust 编程" -e baidu,bing_cn

# Limit results
a3s-search "Rust programming" -l 5

# JSON output
a3s-search "Rust programming" -f json

# Compact output (tab-separated)
a3s-search "Rust programming" -f compact

# Use proxy
a3s-search "Rust programming" -p http://127.0.0.1:8080

# SOCKS5 proxy
a3s-search "Rust programming" -p socks5://127.0.0.1:1080

# Verbose mode
a3s-search "Rust programming" -v

# List available engines
a3s-search engines

Available Engines

Shortcut Engine Description
ddg DuckDuckGo Privacy-focused search
brave Brave Brave Search
bing Bing Bing International
wiki Wikipedia Wikipedia API
sogou Sogou ζœη‹—ζœη΄’
360 360 Search 360搜紒
g Google Google Search (Chrome auto-installed)
baidu Baidu η™ΎεΊ¦ζœη΄’ (Chrome auto-installed)
bing_cn Bing China εΏ…εΊ”δΈ­ε›½ (Chrome auto-installed)

Supported Search Engines

International Engines

Engine Shortcut Description
DuckDuckGo ddg Privacy-focused search
Brave brave Brave Search
Bing bing Bing International
Wikipedia wiki Wikipedia API
Google g Google Search (headless browser)

Chinese Engines (δΈ­ε›½ζœη΄’εΌ•ζ“Ž)

Engine Shortcut Description
Sogou sogou ζœη‹—ζœη΄’
So360 360 360搜紒
Baidu baidu η™ΎεΊ¦ζœη΄’ (headless browser)
Bing China bing_cn εΏ…εΊ”δΈ­ε›½ (headless browser)

Automatic Chrome Setup

When using headless engines (g, baidu, bing_cn), Chrome/Chromium is required. A3S Search handles this automatically:

  1. Detect β€” Checks CHROME env var, PATH commands, and well-known install paths
  2. Cache β€” Looks for a previously downloaded Chrome in ~/.a3s/chromium/
  3. Download β€” If not found, downloads Chrome for Testing from Google's official CDN

Supported platforms: macOS (arm64, x64), Linux (x64), and Windows (x64, x86).

# First run: Chrome is auto-downloaded if not installed
a3s-search "Rust programming" -e g
# Fetching Chrome for Testing version info...
# Downloading Chrome for Testing v145.0.7632.46 (mac-arm64)...
# Downloaded 150.2 MB, extracting...
# Chrome for Testing v145.0.7632.46 installed successfully!

# Subsequent runs: uses cached Chrome instantly
a3s-search "Rust programming" -e g

# Or set CHROME env var to use a specific binary
CHROME=/usr/bin/chromium a3s-search "query" -e g

SDKs

Native bindings for TypeScript and Python, powered by NAPI-RS and PyO3. No subprocess spawning β€” direct FFI calls to the Rust library.

TypeScript (Node.js)

cd sdk/node
npm install && npm run build
import { A3SSearch } from '@a3s-lab/search';

const search = new A3SSearch();

// Simple search (uses DuckDuckGo + Wikipedia by default)
const response = await search.search('rust programming');

// With options
const response = await search.search('rust programming', {
  engines: ['ddg', 'wiki', 'brave', 'bing'],
  limit: 5,
  timeout: 15,
  proxy: 'http://127.0.0.1:8080',
});

// Dynamic proxy pool (IP rotation)
await search.setProxyPool([
  'http://10.0.0.1:8080',
  'http://10.0.0.2:8080',
  'socks5://10.0.0.3:1080',
]);
const response = await search.search('rust programming');

// Toggle proxy pool at runtime
search.setProxyPoolEnabled(false);  // direct connection
search.setProxyPoolEnabled(true);   // re-enable rotation

for (const r of response.results) {
  console.log(`${r.title}: ${r.url} (score: ${r.score})`);
}
console.log(`${response.count} results in ${response.durationMs}ms`);

Python

cd sdk/python
maturin develop
from a3s_search import A3SSearch

search = A3SSearch()

# Simple search (uses DuckDuckGo + Wikipedia by default)
response = await search.search("rust programming")

# With options
response = await search.search("rust programming",
    engines=["ddg", "wiki", "brave", "bing"],
    limit=5,
    timeout=15,
    proxy="http://127.0.0.1:8080",
)

# Dynamic proxy pool (IP rotation)
await search.set_proxy_pool([
    "http://10.0.0.1:8080",
    "http://10.0.0.2:8080",
    "socks5://10.0.0.3:1080",
])
response = await search.search("rust programming")

# Toggle proxy pool at runtime
search.set_proxy_pool_enabled(False)  # direct connection
search.set_proxy_pool_enabled(True)   # re-enable rotation

for r in response.results:
    print(f"{r.title}: {r.url} (score: {r.score})")
print(f"{response.count} results in {response.duration_ms}ms")

SDK Available Engines

Both SDKs support all engines (HTTP and headless):

Shortcut Aliases Engine Type
ddg duckduckgo DuckDuckGo HTTP
brave β€” Brave Search HTTP
bing β€” Bing International HTTP
wiki wikipedia Wikipedia API HTTP
sogou β€” Sogou (ζœη‹—) HTTP
360 so360 360 Search (360搜紒) HTTP
g google Google Search Headless
baidu β€” Baidu (η™ΎεΊ¦) Headless
bing_cn β€” Bing China (εΏ…εΊ”δΈ­ε›½) Headless

Chrome Setup for SDK

Headless engines require Chrome. Pre-download it after install:

Python:

# Option 1: CLI command (added to PATH on install)
a3s-search-setup

# Option 2: Python module
python -m a3s_search.ensure_chrome

# Option 3: In code (async)
from a3s_search import ensure_chrome
path = await ensure_chrome()

Node.js:

# Runs automatically on npm install via postinstall script
# Or manually:
node -e "require('@a3s-lab/search').ensureChrome().then(console.log)"
// In code
import { ensureChrome } from '@a3s-lab/search';
const path = await ensureChrome();
console.log(`Chrome at: ${path}`);

SDK Tests

# Node.js (49 tests)
cd sdk/node && npm test

# Python (54 tests)
cd sdk/python && pytest

Quality Metrics

Test Coverage

298 library + 31 CLI + 103 SDK = 401 total tests with 91.15% Rust line coverage:

Module Lines Coverage Functions Coverage
engine.rs 116 100.00% 17 100.00%
error.rs 52 100.00% 10 100.00%
query.rs 114 100.00% 20 100.00%
result.rs 194 100.00% 35 100.00%
aggregator.rs 292 100.00% 30 100.00%
search.rs 337 99.41% 58 100.00%
proxy.rs 410 99.02% 91 96.70%
engines/duckduckgo.rs 236 97.46% 27 81.48%
engines/bing_china.rs 164 96.95% 18 77.78%
engines/baidu.rs 146 96.58% 17 76.47%
engines/google.rs 180 96.11% 19 73.68%
engines/brave.rs 140 95.71% 20 75.00%
engines/so360.rs 132 95.45% 18 77.78%
engines/sogou.rs 131 95.42% 17 76.47%
fetcher_http.rs 29 93.10% 7 85.71%
fetcher.rs 73 93.15% 10 100.00%
engines/wikipedia.rs 153 90.85% 26 88.46%
browser.rs 244 68.85% 42 61.90%
browser_setup.rs 406 58.13% 65 49.23%
TOTAL 3549 91.15% 547 84.10%

Note: browser.rs and browser_setup.rs have lower coverage because BrowserPool::acquire_browser(), BrowserFetcher::fetch(), and download_chrome() require a running Chrome process or network access. Integration tests verify real browser functionality but are #[ignore] by default.

SDK tests (49 Node.js + 54 Python = 103 tests) cover error classes, type contracts, input validation, engine validation, and integration with all 5 HTTP engines.

Run coverage report:

# Default (19 modules, 267 tests, 91.15% coverage)
just test-cov

# Without headless (14 modules)
just test-cov --no-default-features

# Detailed file-by-file table
just cov-table

# HTML report (opens in browser)
just cov-html

Running Tests

# Default build (9 engines, 244+ lib tests)
cargo test -p a3s-search --lib

# Without headless (6 engines)
cargo test -p a3s-search --no-default-features --lib

# Integration tests (requires network + Chrome for Google)
cargo test -p a3s-search -- --ignored

# With progress display (via justfile)
just test

# SDK tests (requires native build first)
cd sdk/node && npm test       # 49 tests (vitest)
cd sdk/python && pytest       # 54 tests (pytest)

Architecture

System Overview

A3S Search is a meta search engine that aggregates results from multiple search engines, deduplicates them, and ranks them using a consensus-based algorithm. It supports both HTTP-based engines and JavaScript-rendered engines via headless browsers.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        A3S Search System                        β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚   Rust API   β”‚    β”‚  Python SDK  β”‚    β”‚  Node.js SDK β”‚    β”‚
β”‚  β”‚   (Core)     │◄────   (PyO3)     β”‚    β”‚   (NAPI-RS)  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚         β”‚                                                       β”‚
β”‚         β–Ό                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Search Orchestrator                   β”‚  β”‚
β”‚  β”‚  β€’ Query parsing & validation                           β”‚  β”‚
β”‚  β”‚  β€’ Engine selection & filtering                         β”‚  β”‚
β”‚  β”‚  β€’ Parallel execution (tokio::join_all)                 β”‚  β”‚
β”‚  β”‚  β€’ Timeout handling (per-engine)                        β”‚  β”‚
β”‚  β”‚  β€’ Health monitoring (auto-suspend failed engines)      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                                                       β”‚
β”‚         β–Ό                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Engine Layer                          β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚ HTTP Engines β”‚  β”‚   Headless   β”‚  β”‚   Custom     β”‚  β”‚  β”‚
β”‚  β”‚  β”‚              β”‚  β”‚   Engines    β”‚  β”‚   Engines    β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ DuckDuckGo β”‚  β”‚ β€’ Google     β”‚  β”‚ β€’ User-      β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ Brave      β”‚  β”‚ β€’ Baidu      β”‚  β”‚   defined    β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ Bing       β”‚  β”‚ β€’ BingChina  β”‚  β”‚   (trait)    β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ Wikipedia  β”‚  β”‚              β”‚  β”‚              β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ Sogou      β”‚  β”‚              β”‚  β”‚              β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ 360        β”‚  β”‚              β”‚  β”‚              β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚            β”‚                  β”‚                  β”‚             β”‚
β”‚            β–Ό                  β–Ό                  β–Ό             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                  PageFetcher Layer                       β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚ HttpFetcher  β”‚  β”‚PooledHttp    β”‚  β”‚Browser       β”‚  β”‚  β”‚
β”‚  β”‚  β”‚              β”‚  β”‚Fetcher       β”‚  β”‚Fetcher       β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ reqwest    β”‚  β”‚ β€’ ProxyPool  β”‚  β”‚ β€’ Lightpanda β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ β€’ single     β”‚  β”‚ β€’ Round-robinβ”‚  β”‚ β€’ Chrome     β”‚  β”‚  β”‚
β”‚  β”‚  β”‚   proxy      β”‚  β”‚ β€’ IP rotationβ”‚  β”‚ β€’ CDP        β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                 β”‚             β”‚
β”‚                                                 β–Ό             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                  Browser Pool                            β”‚  β”‚
β”‚  β”‚  β€’ Shared browser process (Lightpanda/Chrome)           β”‚  β”‚
β”‚  β”‚  β€’ Tab concurrency control (semaphore)                  β”‚  β”‚
β”‚  β”‚  β€’ Auto-download & cache (~/.a3s/)                      β”‚  β”‚
β”‚  β”‚  β€’ CDP connection management                            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                                                       β”‚
β”‚         β–Ό                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Aggregator                            β”‚  β”‚
β”‚  β”‚  β€’ URL normalization & deduplication                    β”‚  β”‚
β”‚  β”‚  β€’ Consensus-based scoring                              β”‚  β”‚
β”‚  β”‚  β€’ Result merging & ranking                             β”‚  β”‚
β”‚  β”‚  β€’ Suggestions & answers extraction                     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                                                       β”‚
β”‚         β–Ό                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                  SearchResults                           β”‚  β”‚
β”‚  β”‚  β€’ Ranked results (by score)                            β”‚  β”‚
β”‚  β”‚  β€’ Engine attribution                                   β”‚  β”‚
β”‚  β”‚  β€’ Metadata (duration, count, errors)                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

1. Search Orchestrator

  • Query Processing: Parses and validates search queries
  • Engine Selection: Filters engines based on query requirements
  • Parallel Execution: Executes all engines concurrently using tokio::join_all
  • Timeout Management: Per-engine timeout with graceful degradation
  • Health Monitoring: Tracks engine failures and auto-suspends unhealthy engines

2. Engine Layer

  • HTTP Engines: Direct HTTP requests (DuckDuckGo, Brave, Bing, Wikipedia, Sogou, 360)
  • Headless Engines: JavaScript rendering via browser (Google, Baidu, BingChina)
  • Custom Engines: User-defined engines via Engine trait

3. PageFetcher Layer

  • HttpFetcher: Simple HTTP client with optional proxy
  • PooledHttpFetcher: Proxy pool with round-robin IP rotation
  • BrowserFetcher: Headless browser rendering (Lightpanda/Chrome)

4. Browser Pool

  • Shared Process: Single browser instance shared across all headless engines
  • Tab Concurrency: Semaphore-based tab limit (default: 4)
  • Auto-Setup: Automatic browser detection and download
  • CDP Protocol: Chrome DevTools Protocol for page control

5. Aggregator

  • Deduplication: Normalizes URLs and merges duplicate results
  • Scoring: Consensus-based ranking algorithm
  • Merging: Combines results from multiple engines
  • Extraction: Pulls out suggestions and instant answers

Search Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Search Execution Flow                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. User Query
   β”‚
   β”œβ”€β–Ί "rust programming"
   β”‚   engines: ["ddg", "brave", "google"]
   β”‚   limit: 10
   β”‚   timeout: 15s
   β”‚
   β–Ό
2. Query Validation & Parsing
   β”‚
   β”œβ”€β–Ί SearchQuery {
   β”‚     query: "rust programming",
   β”‚     categories: [General],
   β”‚     language: "en",
   β”‚     safesearch: Moderate,
   β”‚     page: 1
   β”‚   }
   β”‚
   β–Ό
3. Engine Selection & Filtering
   β”‚
   β”œβ”€β–Ί Selected Engines:
   β”‚   β€’ DuckDuckGo (HTTP, weight: 1.0)
   β”‚   β€’ Brave (HTTP, weight: 1.0)
   β”‚   β€’ Google (Headless, weight: 1.0)
   β”‚
   β–Ό
4. Parallel Engine Execution (tokio::join_all)
   β”‚
   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                 β”‚                 β”‚                 β”‚
   β–Ό                 β–Ό                 β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚DuckDuckGoβ”‚    β”‚  Brave   β”‚    β”‚  Google  β”‚    β”‚  Health  β”‚
β”‚          β”‚    β”‚          β”‚    β”‚          β”‚    β”‚ Monitor  β”‚
β”‚ HTTP GET β”‚    β”‚ HTTP GET β”‚    β”‚ Browser  β”‚    β”‚          β”‚
β”‚  ↓       β”‚    β”‚  ↓       β”‚    β”‚  Render  β”‚    β”‚ Track    β”‚
β”‚ Parse    β”‚    β”‚ Parse    β”‚    β”‚  ↓       β”‚    β”‚ Failures β”‚
β”‚ HTML     β”‚    β”‚ HTML     β”‚    β”‚ Parse    β”‚    β”‚          β”‚
β”‚  ↓       β”‚    β”‚  ↓       β”‚    β”‚ HTML     β”‚    β”‚          β”‚
β”‚ Results  β”‚    β”‚ Results  β”‚    β”‚  ↓       β”‚    β”‚          β”‚
β”‚ [10]     β”‚    β”‚ [10]     β”‚    β”‚ Results  β”‚    β”‚          β”‚
β”‚          β”‚    β”‚          β”‚    β”‚ [10]     β”‚    β”‚          β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚               β”‚               β”‚
     β”‚               β”‚               β”‚ (timeout: 15s each)
     β”‚               β”‚               β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
5. Result Aggregation
   β”‚
   β”œβ”€β–Ί Collect all results (30 total)
   β”‚   β€’ DuckDuckGo: 10 results
   β”‚   β€’ Brave: 10 results
   β”‚   β€’ Google: 10 results
   β”‚
   β–Ό
6. URL Normalization & Deduplication
   β”‚
   β”œβ”€β–Ί Normalize URLs:
   β”‚   β€’ Remove tracking params
   β”‚   β€’ Lowercase domain
   β”‚   β€’ Remove www prefix
   β”‚   β€’ Normalize path
   β”‚
   β”œβ”€β–Ί Merge duplicates:
   β”‚   β€’ Same URL from multiple engines
   β”‚   β€’ Combine engine lists
   β”‚   β€’ Merge positions
   β”‚
   β”œβ”€β–Ί Result: 18 unique results
   β”‚
   β–Ό
7. Consensus-Based Scoring
   β”‚
   β”œβ”€β–Ί For each result:
   β”‚   score = Ξ£ (weight / position) for each engine
   β”‚   weight = engine_weight Γ— num_engines_found
   β”‚
   β”œβ”€β–Ί Example:
   β”‚   Result A found by DuckDuckGo (#1) and Brave (#2):
   β”‚   score = (1.0 Γ— 2 / 1) + (1.0 Γ— 2 / 2) = 2.0 + 1.0 = 3.0
   β”‚
   β”‚   Result B found only by Google (#1):
   β”‚   score = (1.0 Γ— 1 / 1) = 1.0
   β”‚
   β”‚   β†’ Result A ranks higher (consensus bonus)
   β”‚
   β–Ό
8. Sorting & Limiting
   β”‚
   β”œβ”€β–Ί Sort by score (descending)
   β”œβ”€β–Ί Apply limit (10 results)
   β”‚
   β–Ό
9. SearchResults
   β”‚
   └─► {
         results: [10 ranked results],
         count: 10,
         duration_ms: 1234,
         errors: [],
         suggestions: ["rust tutorial", "rust book"],
         answers: []
       }

Headless Browser Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Headless Engine Execution (Google)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. Engine Initialization
   β”‚
   β”œβ”€β–Ί Check if browser needed
   β”‚   β€’ Engine: Google (requires JS rendering)
   β”‚   β€’ Browser: Lightpanda (default) or Chrome
   β”‚
   β–Ό
2. Browser Detection & Setup
   β”‚
   β”œβ”€β–Ί Lightpanda (default):
   β”‚   β”œβ”€ Check LIGHTPANDA env var
   β”‚   β”œβ”€ Check PATH for lightpanda command
   β”‚   β”œβ”€ Check cache: ~/.a3s/lightpanda/<tag>/
   β”‚   └─ If not found: Download from GitHub releases
   β”‚       β”œβ”€ Platform: macOS arm64
   β”‚       β”œβ”€ URL: github.com/lightpanda-io/browser/releases
   β”‚       β”œβ”€ Download: lightpanda-darwin-aarch64.tar.gz
   β”‚       β”œβ”€ Extract to: ~/.a3s/lightpanda/nightly/
   β”‚       └─ Set executable: chmod +x lightpanda
   β”‚
   β”œβ”€β–Ί Chrome (fallback, browser="chrome"):
   β”‚   β”œβ”€ Check CHROME env var
   β”‚   β”œβ”€ Check PATH for chrome/chromium commands
   β”‚   β”œβ”€ Check known paths (/Applications/Google Chrome.app, etc.)
   β”‚   β”œβ”€ Check cache: ~/.a3s/chromium/<version>/
   β”‚   └─ If not found: Download Chrome for Testing
   β”‚       β”œβ”€ Platform: mac-arm64
   β”‚       β”œβ”€ URL: googlechromelabs.github.io/chrome-for-testing
   β”‚       β”œβ”€ Download: chrome-mac-arm64.zip (150MB)
   β”‚       β”œβ”€ Extract to: ~/.a3s/chromium/131.0.6778.85/
   β”‚       └─ Return: chrome executable path
   β”‚
   β–Ό
3. Browser Pool Initialization
   β”‚
   β”œβ”€β–Ί BrowserPool::new(config)
   β”‚   β€’ backend: Lightpanda (default)
   β”‚   β€’ max_tabs: 4
   β”‚   β€’ headless: true
   β”‚   β€’ proxy_url: None
   β”‚
   β–Ό
4. Browser Launch (Lazy, on first request)
   β”‚
   β”œβ”€β–Ί Lightpanda:
   β”‚   β”œβ”€ Find free port (OS-assigned)
   β”‚   β”œβ”€ Spawn: lightpanda serve --host 127.0.0.1 --port <port>
   β”‚   β”œβ”€ Wait for CDP server ready (TCP connect)
   β”‚   └─ Connect via WebSocket: ws://127.0.0.1:<port>
   β”‚
   β”œβ”€β–Ί Chrome:
   β”‚   β”œβ”€ Launch with args:
   β”‚   β”‚   --headless=new
   β”‚   β”‚   --disable-gpu
   β”‚   β”‚   --no-sandbox
   β”‚   β”‚   --disable-blink-features=AutomationControlled
   β”‚   β”‚   --user-agent=<realistic UA>
   β”‚   └─ Connect via CDP (chromiumoxide)
   β”‚
   β–Ό
5. Page Rendering
   β”‚
   β”œβ”€β–Ί Acquire tab permit (semaphore, max 4 concurrent)
   β”‚
   β”œβ”€β–Ί Create new tab
   β”‚   β€’ browser.new_page(url)
   β”‚
   β”œβ”€β–Ί Navigate to search URL
   β”‚   β€’ https://www.google.com/search?q=rust+programming
   β”‚
   β”œβ”€β–Ί Wait for page load
   β”‚   β€’ Strategy: Load (default)
   β”‚   β€’ Alternatives: NetworkIdle, Selector, Delay
   β”‚
   β”œβ”€β–Ί Extract rendered HTML
   β”‚   β€’ page.content() β†’ full HTML string
   β”‚
   β”œβ”€β–Ί Close tab
   β”‚   β€’ page.close()
   β”‚
   └─► Release tab permit
   β”‚
   β–Ό
6. HTML Parsing
   β”‚
   β”œβ”€β–Ί Parse with scraper (HTML5 parser)
   β”‚
   β”œβ”€β–Ί Extract results:
   β”‚   β€’ Selector: div.g (Google result container)
   β”‚   β€’ Title: h3
   β”‚   β€’ URL: a[href]
   β”‚   β€’ Snippet: div.VwiC3b
   β”‚
   β”œβ”€β–Ί Handle errors:
   β”‚   β€’ CAPTCHA detection
   β”‚   β€’ Rate limiting
   β”‚   β€’ Empty results
   β”‚
   β–Ό
7. Return Results
   β”‚
   └─► Vec<SearchResult> [10 results]

Ranking Algorithm

The scoring algorithm is based on SearXNG's approach with consensus weighting:

score = Ξ£ (weight / position) for each engine
weight = engine_weight Γ— num_engines_found

Key factors:

  1. Engine Weight: Configurable per-engine multiplier (default: 1.0)
  2. Consensus: Results found by multiple engines score higher
  3. Position: Earlier positions in individual engines score higher

Example Calculation:

Query: "rust programming"
Engines: DuckDuckGo (weight: 1.0), Brave (weight: 1.0), Google (weight: 1.0)

Result A: "The Rust Programming Language"
  β€’ Found by DuckDuckGo at position 1
  β€’ Found by Brave at position 1
  β€’ Found by Google at position 2
  β€’ num_engines_found = 3

  score = (1.0 Γ— 3 / 1) + (1.0 Γ— 3 / 1) + (1.0 Γ— 3 / 2)
        = 3.0 + 3.0 + 1.5
        = 7.5

Result B: "Rust Tutorial"
  β€’ Found by DuckDuckGo at position 3
  β€’ Found by Google at position 5
  β€’ num_engines_found = 2

  score = (1.0 Γ— 2 / 3) + (1.0 Γ— 2 / 5)
        = 0.67 + 0.4
        = 1.07

β†’ Result A ranks higher (consensus + better positions)

Proxy Pool Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Proxy Pool System                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. ProxyPool (Arc<ProxyPool>)
   β”‚
   β”œβ”€β–Ί Configuration:
   β”‚   β€’ Strategy: RoundRobin | Random
   β”‚   β€’ Enabled: AtomicBool (thread-safe toggle)
   β”‚   β€’ Proxies: RwLock<Vec<ProxyConfig>>
   β”‚
   β”œβ”€β–Ί Static Mode:
   β”‚   ProxyPool::with_proxies([
   β”‚     "http://10.0.0.1:8080",
   β”‚     "socks5://10.0.0.2:1080",
   β”‚   ])
   β”‚
   β”œβ”€β–Ί Dynamic Mode:
   β”‚   ProxyPool::with_provider(MyProxyProvider)
   β”‚   β€’ Implements ProxyProvider trait
   β”‚   β€’ fetch_proxies() β†’ Vec<ProxyConfig>
   β”‚   β€’ refresh_interval() β†’ Duration
   β”‚
   └─► Auto-Refresh:
       spawn_auto_refresh(pool)
       β€’ Background task
       β€’ Periodic refresh
       β€’ Updates pool atomically
   β”‚
   β–Ό
2. PooledHttpFetcher
   β”‚
   β”œβ”€β–Ί Per-Request Rotation:
   β”‚   β€’ get_proxy() β†’ ProxyConfig
   β”‚   β€’ create_client(proxy) β†’ reqwest::Client
   β”‚   β€’ fetch(url) β†’ HTML
   β”‚
   β”œβ”€β–Ί Strategy: RoundRobin
   β”‚   Request 1 β†’ Proxy A
   β”‚   Request 2 β†’ Proxy B
   β”‚   Request 3 β†’ Proxy C
   β”‚   Request 4 β†’ Proxy A (cycle)
   β”‚
   └─► Strategy: Random
       Request 1 β†’ Proxy B
       Request 2 β†’ Proxy A
       Request 3 β†’ Proxy B
       Request 4 β†’ Proxy C
   β”‚
   β–Ό
3. Runtime Control
   β”‚
   β”œβ”€β–Ί Enable/Disable (thread-safe):
   β”‚   pool.set_enabled(false)  // Direct connection
   β”‚   pool.set_enabled(true)   // Resume rotation
   β”‚
   └─► No restart required
       β€’ AtomicBool check on each request
       β€’ Instant toggle

Health Monitoring

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Health Monitor System                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. Configuration
   β”‚
   β”œβ”€β–Ί HealthConfig {
   β”‚     max_failures: 5,        // Suspend after 5 consecutive failures
   β”‚     suspend_duration: 120s  // Suspend for 2 minutes
   β”‚   }
   β”‚
   β–Ό
2. Failure Tracking (per engine)
   β”‚
   β”œβ”€β–Ί Success: Reset counter to 0
   β”‚
   β”œβ”€β–Ί Failure: Increment counter
   β”‚   β€’ Network error
   β”‚   β€’ Timeout
   β”‚   β€’ Parse error
   β”‚   β€’ CAPTCHA
   β”‚
   β”œβ”€β–Ί Threshold Reached:
   β”‚   failures >= max_failures
   β”‚   β†’ Suspend engine
   β”‚   β†’ Record suspend_until timestamp
   β”‚
   β–Ό
3. Engine Suspension
   β”‚
   β”œβ”€β–Ί Suspended Engine:
   β”‚   β€’ Skipped in search execution
   β”‚   β€’ Not counted in results
   β”‚   β€’ Logged as suspended
   β”‚
   β”œβ”€β–Ί Auto-Recovery:
   β”‚   β€’ Check suspend_until on each search
   β”‚   β€’ If current_time > suspend_until:
   β”‚     β†’ Re-enable engine
   β”‚     β†’ Reset failure counter
   β”‚
   β–Ό
4. Example Timeline
   β”‚
   β”œβ”€β–Ί T=0s:  Google search succeeds (failures: 0)
   β”œβ”€β–Ί T=10s: Google search fails (failures: 1)
   β”œβ”€β–Ί T=20s: Google search fails (failures: 2)
   β”œβ”€β–Ί T=30s: Google search fails (failures: 3)
   β”œβ”€β–Ί T=40s: Google search fails (failures: 4)
   β”œβ”€β–Ί T=50s: Google search fails (failures: 5)
   β”‚           β†’ SUSPENDED until T=170s
   β”œβ”€β–Ί T=60s: Google skipped (suspended)
   β”œβ”€β–Ί T=170s: Google re-enabled (auto-recovery)
   └─► T=180s: Google search succeeds (failures: 0)

Component Diagram

PageFetcher (trait)
  β”œβ”€β”€ HttpFetcher        (reqwest, plain HTTP, single proxy)
  β”œβ”€β”€ PooledHttpFetcher  (reqwest, proxy pool rotation)
  └── BrowserFetcher     (chromiumoxide, headless browser)
        └── BrowserPool (shared process, tab semaphore)
              β”œβ”€β”€ Lightpanda (default, 59MB, <100ms startup)
              └── Chrome (fallback, 200MB, 1-2s startup)

Quick Start

Installation

Add to your Cargo.toml:

[dependencies]
a3s-search = "0.8"
tokio = { version = "1", features = ["full"] }

# To disable headless browser support:
# a3s-search = { version = "0.8", default-features = false }

Basic Search

use a3s_search::{Search, SearchQuery, engines::DuckDuckGo};

let mut search = Search::new();
search.add_engine(DuckDuckGo::new());

let query = SearchQuery::new("rust async");
let results = search.search(query).await?;

println!("Found {} results", results.count);

Chinese Search (δΈ­ζ–‡ζœη΄’)

use a3s_search::{Search, SearchQuery, engines::{Sogou, So360}};

let mut search = Search::new();
search.add_engine(Sogou::new());      // ζœη‹—
search.add_engine(So360::new());      // 360搜紒

let query = SearchQuery::new("Rust 编程语言");
let results = search.search(query).await?;

Query Options

use a3s_search::{SearchQuery, EngineCategory, SafeSearch, TimeRange};

let query = SearchQuery::new("rust tutorial")
    .with_categories(vec![EngineCategory::General])
    .with_language("en-US")
    .with_safesearch(SafeSearch::Moderate)
    .with_page(1)
    .with_time_range(TimeRange::Month);

Custom Engine Weights

use a3s_search::{Search, EngineConfig, engines::Wikipedia};

// Wikipedia results will have 1.5x weight
let wiki = Wikipedia::new().with_config(EngineConfig {
    name: "Wikipedia".to_string(),
    shortcut: "wiki".to_string(),
    weight: 1.5,
    ..Default::default()
});

let mut search = Search::new();
search.add_engine(wiki);

Using Proxy Pool (Anti-Crawler Protection)

use std::sync::Arc;
use a3s_search::{Search, SearchQuery, PooledHttpFetcher, PageFetcher};
use a3s_search::engines::{DuckDuckGo, DuckDuckGoParser};
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProtocol, ProxyStrategy};

// Create a proxy pool with multiple proxies
let pool = Arc::new(ProxyPool::with_proxies(vec![
    ProxyConfig::new("proxy1.example.com", 8080),
    ProxyConfig::new("proxy2.example.com", 8080)
        .with_protocol(ProxyProtocol::Socks5),
    ProxyConfig::new("proxy3.example.com", 8080)
        .with_auth("username", "password"),
]).with_strategy(ProxyStrategy::RoundRobin));

// PooledHttpFetcher rotates proxies per request
let fetcher: Arc<dyn PageFetcher> = Arc::new(PooledHttpFetcher::new(Arc::clone(&pool)));

let mut search = Search::new();
search.add_engine(DuckDuckGo::with_fetcher(DuckDuckGoParser, fetcher));

let query = SearchQuery::new("rust programming");
let results = search.search(query).await?;

// Toggle proxy pool at runtime (thread-safe via AtomicBool)
pool.set_enabled(false);  // direct connection
pool.set_enabled(true);   // re-enable rotation

Dynamic Proxy Provider

use std::sync::Arc;
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProvider, spawn_auto_refresh};
use async_trait::async_trait;
use std::time::Duration;

// Implement custom proxy provider (e.g., from API, Redis, database)
struct MyProxyProvider {
    api_url: String,
}

#[async_trait]
impl ProxyProvider for MyProxyProvider {
    async fn fetch_proxies(&self) -> a3s_search::Result<Vec<ProxyConfig>> {
        // Fetch proxies from your API β€” format is up to you
        Ok(vec![
            ProxyConfig::new("dynamic-proxy.example.com", 8080),
        ])
    }

    fn refresh_interval(&self) -> Duration {
        Duration::from_secs(60) // Refresh every minute
    }
}

// Use with auto-refresh background task
let pool = Arc::new(ProxyPool::with_provider(
    MyProxyProvider { api_url: "https://api.example.com/proxies".into() }
));
let _refresh_handle = spawn_auto_refresh(Arc::clone(&pool));
// Pool now auto-refreshes every 60 seconds

Implementing Custom Engines

use a3s_search::{Engine, EngineConfig, EngineCategory, SearchQuery, SearchResult, Result};
use async_trait::async_trait;

struct MySearchEngine {
    config: EngineConfig,
}

impl MySearchEngine {
    fn new() -> Self {
        Self {
            config: EngineConfig {
                name: "MyEngine".to_string(),
                shortcut: "my".to_string(),
                categories: vec![EngineCategory::General],
                weight: 1.0,
                timeout: 5,
                enabled: true,
                paging: false,
                safesearch: false,
            },
        }
    }
}

#[async_trait]
impl Engine for MySearchEngine {
    fn config(&self) -> &EngineConfig {
        &self.config
    }

    async fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>> {
        // Implement your search logic here
        Ok(vec![
            SearchResult::new(
                "https://example.com",
                "Example Result",
                "This is an example search result"
            )
        ])
    }
}

API Reference

Search

Method Description
new() Create a new search instance
with_health_config(config) Create with health monitoring
add_engine(engine) Add a search engine
set_timeout(duration) Set default search timeout
engine_count() Get number of configured engines
search(query) Perform a search

SearchQuery

Method Description
new(query) Create a new query
with_categories(cats) Set target categories
with_language(lang) Set language/locale
with_safesearch(level) Set safe search level
with_page(page) Set page number
with_time_range(range) Set time range filter
with_engines(engines) Limit to specific engines

SearchResult

Field Type Description
url String Result URL
title String Result title
content String Result snippet
result_type ResultType Type of result
engines HashSet<String> Engines that found this
positions Vec<u32> Positions in each engine
score f64 Calculated ranking score
thumbnail Option<String> Thumbnail URL
published_date Option<String> Publication date

SearchResults

Method Description
items() Get result slice
suggestions() Get query suggestions
answers() Get direct answers
count Number of results
duration_ms Search duration in ms

Engine Trait

#[async_trait]
pub trait Engine: Send + Sync {
    /// Returns the engine configuration
    fn config(&self) -> &EngineConfig;

    /// Performs a search and returns results
    async fn search(&self, query: &SearchQuery) -> Result<Vec<SearchResult>>;

    /// Returns the engine name
    fn name(&self) -> &str { &self.config().name }

    /// Returns the engine shortcut
    fn shortcut(&self) -> &str { &self.config().shortcut }

    /// Returns the engine weight
    fn weight(&self) -> f64 { self.config().weight }

    /// Returns whether the engine is enabled
    fn is_enabled(&self) -> bool { self.config().enabled }
}

EngineConfig

Field Type Default Description
name String - Display name
shortcut String - Short identifier
categories Vec<EngineCategory> [General] Categories
weight f64 1.0 Ranking weight
timeout u64 5 Timeout in seconds
enabled bool true Is enabled
paging bool false Supports pagination
safesearch bool false Supports safe search

ProxyPool

Method Description
new() Create empty proxy pool (disabled)
with_proxies(proxies) Create with static proxy list
with_provider(provider) Create with dynamic provider
with_strategy(strategy) Set selection strategy
set_enabled(bool) Enable/disable proxy pool (thread-safe, &self)
is_enabled() Check if enabled
refresh() Refresh proxies from provider
get_proxy() Get next proxy (based on strategy)
add_proxy(proxy) Add a proxy to pool
remove_proxy(host, port) Remove a proxy
len() Number of proxies in pool
create_client(user_agent) Create HTTP client with proxy

PooledHttpFetcher

Method Description
new(pool) Create with Arc<ProxyPool> β€” rotates proxy per request
with_timeout(duration) Set request timeout (default: 30s)

spawn_auto_refresh

pub fn spawn_auto_refresh(pool: Arc<ProxyPool>) -> tokio::task::JoinHandle<()>

Spawns a background task that periodically calls pool.refresh() based on the provider's refresh_interval(). Returns a handle that can be aborted to stop refreshing.

HealthMonitor / HealthConfig

Field/Method Description
HealthConfig { max_failures, suspend_duration } Configure failure threshold and suspension time
Search::with_health_config(config) Create search with health monitoring

Engines are automatically suspended after max_failures consecutive failures and re-enabled after suspend_duration.

SearchConfig (HCL)

Method Description
SearchConfig::load(path) Load config from .hcl file
SearchConfig::parse(content) Parse HCL string
health_config() Get HealthConfig from config
enabled_engines() Get list of enabled engine shortcuts

Example HCL config:

timeout = 10

health {
  max_failures    = 5
  suspend_seconds = 120
}

engine "ddg" {
  enabled = true
  weight  = 1.0
}

engine "bing" {
  enabled = true
  weight  = 1.2
}

ProxyConfig

Method Description
new(host, port) Create HTTP proxy config
with_protocol(protocol) Set protocol (Http/Https/Socks5)
with_auth(user, pass) Set authentication
url() Get proxy URL string

ProxyStrategy

Variant Description
RoundRobin Rotate through proxies sequentially
Random Select random proxy each time

Development

Dependencies

Dependency Install Purpose
cargo-llvm-cov cargo install cargo-llvm-cov Code coverage (optional)
lcov brew install lcov / apt install lcov Coverage report formatting (optional)
Chrome/Chromium Auto-installed For headless browser engines (auto-downloaded if not found)

Build Commands

# Build (default, 8 engines including headless)
just build

# Build without headless browser support (5 engines)
just build --no-default-features

# Build release
just release

# Test (with colored progress display)
just test                    # All tests with pretty output
just test-raw                # Raw cargo output
just test-v                  # Verbose output (--nocapture)
just test-one TEST           # Run specific test

# Test subsets
just test-engine             # Engine module tests
just test-query              # Query module tests
just test-result             # Result module tests
just test-search             # Search module tests
just test-aggregator         # Aggregator module tests
just test-proxy              # Proxy module tests
just test-error              # Error module tests

# Coverage (requires cargo-llvm-cov)
just test-cov                # Pretty coverage with progress
just cov                     # Terminal coverage report
just cov-html                # HTML report (opens in browser)
just cov-table               # File-by-file table
just cov-ci                  # Generate lcov.info for CI
just cov-module proxy        # Coverage for specific module

# Format & Lint
just fmt                     # Format code
just fmt-check               # Check formatting
just lint                    # Clippy lint
just ci                      # Full CI checks (fmt + lint + test)

# Utilities
just check                   # Fast compile check
just watch                   # Watch and rebuild
just doc                     # Generate and open docs
just clean                   # Clean build artifacts
just update                  # Update dependencies

Releasing

See RELEASE.md for detailed release instructions.

Quick release:

# Check GitHub secrets are configured
./scripts/check-secrets.sh

# Release new version (runs tests, commits, tags, pushes)
./scripts/release.sh 0.9.0

# Monitor CI/CD progress
gh run watch --repo A3S-Lab/Search

The release workflow automatically:

  • βœ… Runs CI checks (fmt, clippy, tests)
  • πŸ“¦ Publishes to crates.io
  • 🐍 Publishes Python SDK to PyPI (7 platforms)
  • πŸ“¦ Publishes Node.js SDK to npm (7 platforms)
  • 🍺 Updates Homebrew formula
  • πŸŽ‰ Creates GitHub Release with CLI binaries

Project Structure

search/
β”œβ”€β”€ Cargo.toml
β”œβ”€β”€ justfile
β”œβ”€β”€ README.md
β”œβ”€β”€ RELEASE.md               # Release guide
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ setup-workspace.sh   # CI workspace restructuring
β”‚   └── workflows/
β”‚       β”œβ”€β”€ ci.yml           # Push/PR checks
β”‚       β”œβ”€β”€ release.yml      # Tag-triggered release
β”‚       β”œβ”€β”€ publish-node.yml # Node SDK publishing
β”‚       └── publish-python.yml # Python SDK publishing
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ release.sh           # Automated release script
β”‚   └── check-secrets.sh     # Check GitHub secrets
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ basic_search.rs      # Basic usage example
β”‚   └── chinese_search.rs    # Chinese engines example
β”œβ”€β”€ tests/
β”‚   └── integration.rs       # Integration tests (network-dependent)
β”œβ”€β”€ sdk/
β”‚   β”œβ”€β”€ node/                # TypeScript SDK (NAPI-RS)
β”‚   β”‚   β”œβ”€β”€ Cargo.toml       # Rust cdylib crate
β”‚   β”‚   β”œβ”€β”€ src/             # Rust NAPI bindings
β”‚   β”‚   β”œβ”€β”€ lib/             # TypeScript wrappers
β”‚   β”‚   β”œβ”€β”€ tests/           # vitest tests (49 tests)
β”‚   β”‚   └── package.json
β”‚   └── python/              # Python SDK (PyO3)
β”‚       β”œβ”€β”€ Cargo.toml       # Rust cdylib crate
β”‚       β”œβ”€β”€ src/             # Rust PyO3 bindings
β”‚       β”œβ”€β”€ a3s_search/      # Python wrappers
β”‚       β”œβ”€β”€ tests/           # pytest tests (54 tests)
β”‚       └── pyproject.toml
└── src/
    β”œβ”€β”€ main.rs              # CLI entry point
    β”œβ”€β”€ lib.rs               # Library entry point
    β”œβ”€β”€ engine.rs            # Engine trait and config
    β”œβ”€β”€ error.rs             # Error types
    β”œβ”€β”€ query.rs             # SearchQuery
    β”œβ”€β”€ result.rs            # SearchResult, SearchResults
    β”œβ”€β”€ aggregator.rs        # Result aggregation and ranking
    β”œβ”€β”€ search.rs            # Search orchestrator with HealthMonitor
    β”œβ”€β”€ config.rs            # HCL configuration loading
    β”œβ”€β”€ health.rs            # HealthMonitor, HealthConfig
    β”œβ”€β”€ proxy.rs             # ProxyPool, ProxyProvider, spawn_auto_refresh
    β”œβ”€β”€ fetcher.rs           # PageFetcher trait, WaitStrategy
    β”œβ”€β”€ fetcher_http.rs      # HttpFetcher + PooledHttpFetcher
    β”œβ”€β”€ html_engine.rs       # HtmlEngine<P> generic engine framework
    β”œβ”€β”€ browser.rs           # BrowserPool, BrowserFetcher (headless browser)
    β”œβ”€β”€ browser_setup.rs     # Chrome auto-detection and download
    └── engines/
        β”œβ”€β”€ mod.rs           # Engine exports
        β”œβ”€β”€ duckduckgo.rs    # DuckDuckGo
        β”œβ”€β”€ brave.rs         # Brave Search
        β”œβ”€β”€ bing.rs          # Bing International
        β”œβ”€β”€ google.rs        # Google (headless browser)
        β”œβ”€β”€ wikipedia.rs     # Wikipedia
        β”œβ”€β”€ baidu.rs         # Baidu (η™ΎεΊ¦, headless browser)
        β”œβ”€β”€ bing_china.rs    # Bing China (εΏ…εΊ”δΈ­ε›½, headless browser)
        β”œβ”€β”€ sogou.rs         # Sogou (ζœη‹—)
        └── so360.rs         # 360 Search (360搜紒)

A3S Ecosystem

A3S Search is a utility component of the A3S ecosystem.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    A3S Ecosystem                     β”‚
β”‚                                                      β”‚
β”‚  Infrastructure:  a3s-box     (MicroVM sandbox)     β”‚
β”‚                      β”‚                               β”‚
β”‚  Application:     a3s-code    (AI coding agent)     β”‚
β”‚                    /   \                             β”‚
β”‚  Utilities:   a3s-lane  a3s-context  a3s-search    β”‚
β”‚               (queue)   (memory)     (search)       β”‚
β”‚                                          β–²          β”‚
β”‚                                          β”‚          β”‚
β”‚                                    You are here     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Standalone Usage: a3s-search works independently for any meta search needs:

  • AI agents needing web search capabilities
  • Privacy-focused search aggregation
  • Research tools requiring multi-source results
  • Any application needing unified search across engines

Roadmap

Phase 1: Core βœ… (Complete)

  • Engine trait abstraction
  • Result deduplication by URL
  • Consensus-based ranking algorithm
  • Parallel async search execution
  • Per-engine timeout handling
  • 9 built-in engines (5 international + 4 Chinese)
  • Bing International engine (HTTP, no headless required)
  • Headless browser support for JS-rendered engines (Google, Baidu, Bing China β€” enabled by default)
  • PageFetcher abstraction (HttpFetcher + PooledHttpFetcher + BrowserFetcher)
  • BrowserPool with tab concurrency control
  • Dynamic proxy pool with pluggable ProxyProvider trait and spawn_auto_refresh
  • PooledHttpFetcher for per-request proxy IP rotation
  • Runtime proxy pool toggle via AtomicBool (set_enabled(&self))
  • Health monitoring with automatic engine suspension and recovery
  • HCL configuration file loading for engines and health settings
  • CLI tool with Homebrew distribution
  • Automatic Chrome detection and download (Chrome for Testing)
  • Proxy support for all engines via -p flag (HTTP/HTTPS/SOCKS5)
  • UTF-8 safe content truncation for CJK/emoji
  • Native SDKs: TypeScript (NAPI-RS) and Python (PyO3) with dynamic proxy pool management
  • SDK proxy pool: setProxyPool(), setProxyPoolEnabled(), per-request proxyPool option

Phase 2: SDK Headless Support βœ… (v0.8.0)

  • Enable headless feature in Python and Node SDKs (all 9 engines available)
  • ensure_chrome() / ensure_chrome_sync() bindings for Python and Node SDKs
  • Python post-install: a3s-search-setup CLI + python -m a3s_search.ensure_chrome
  • Node post-install: automatic Chrome download on npm install

Community

Join us on Discord for questions, discussions, and updates.

License

MIT

About

Embeddable Meta Search Engine

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors