Used by 18k+ projects • ~300k weekly downloads
Lunr Languages is an extension for Lunr.js that enables fast, multilingual full-text search across dozens of languages — in the browser or Node.js.
Originally built for classic search, it is now widely used as a lightweight retrieval layer in AI systems, including:
- Retrieval-Augmented Generation (RAG)
- Hybrid search (keyword + vector)
- Local-first / edge AI apps
- Static site search and documentation search
⭐ If this project saves you time or powers something important, consider starring it or supporting its maintenance.
German
French
Spanish
Italian
Dutch
Danish
Portuguese
Finnish
Romanian
Hungarian
Russian
Norwegian
Swedish
Turkish
Japanese
Thai
Arabic
Chinese1
Vietnamese
Sanskrit
Kannada
Telugu
Hindi
Tamil
Korean
Armenian
Hebrew
Greek
1 Chinese tokenization uses Intl.Segmenter with CJK bigrams by default, which works in modern browsers and Node.js without native dependencies. In Node.js, if @node-rs/jieba is installed, Lunr Languages uses it automatically for higher-quality Jieba segmentation. Browsers must support Intl.Segmenter; there is no frontend fallback.
Modern AI systems don’t replace search — they depend on it.
Before an LLM can generate an answer, it needs relevant context. That’s where Lunr Languages fits:
Filter thousands of documents down to a small candidate set before embedding or reranking.
Tokenization, stemming, and stopwords for 30+ languages — still a hard problem in AI pipelines.
Runs entirely in the browser or Node.js. No vector DB required.
Perfect for:
- in-browser AI assistants
- local knowledge bases
- on-device search
User query
→ Lunr (keyword search, multilingual)
→ top 100–500 documents
→ embeddings / reranker
→ LLM generates answer
Lunr Languages improves recall and precision, especially for:
- non-English content
- inflected languages
- mixed-language datasets
npm install lunr-languagesconst lunr = require('lunr');
require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.de')(lunr);
const idx = lunr(function () {
this.use(lunr.de);
this.field('title', { boost: 10 });
this.field('body');
this.add({ title: 'Dokument', body: 'Beispieltext' });
});require('lunr-languages/lunr.multi')(lunr);
const idx = lunr(function () {
this.use(lunr.multiLanguage('en', 'ru', 'de'));
this.field('title');
this.field('body');
});Chinese support is designed to work without mandatory native binaries:
- In browsers,
lunr.zhusesIntl.Segmenterplus CJK bigrams. IfIntl.Segmenteris unavailable, it logs an error and throws because there is no bundled browser fallback. - In Node.js,
lunr.zhfirst tries to load@node-rs/jieba. If it is installed, it is used for better Chinese segmentation. If it is not installed, Lunr Languages logs an informational message and falls back toIntl.Segmenterplus CJK bigrams. - If neither
@node-rs/jiebanorIntl.Segmenteris available in Node.js, Chinese tokenization logs an error and throws.
The Intl.Segmenter fallback avoids native package supply-chain risk and works well for lightweight search, but it is not identical to Jieba. Bigrams improve recall for common two-character search terms such as 车主 and 学姐, while Jieba generally provides better precision and ranking for serious Chinese search.
To opt into Jieba tokenization in Node.js:
npm install @node-rs/jiebaLunr Languages is commonly used as:
- Pre-filter for vector search
- Fallback when embeddings fail
- Client-side retrieval for AI apps
- Static / documentation search
👉 In practice, hybrid search (keyword + vector) performs best
To provide high-quality search across languages:
- Tokenization — language-aware splitting (including Japanese, Chinese, etc.)
- Stemming — matches different word forms
- Stopword filtering — removes noise
- Trimming — normalizes tokens
These steps improve both classic search and AI retrieval pipelines.
Use Lunr Languages when you need:
- fast, deterministic keyword matching
- multilingual normalization
- offline / browser-based search
- low-cost retrieval
Combine with embeddings for:
- semantic similarity
- fuzzy concept matching
Want to add a new language?
See CONTRIBUTING.md
Maintained as an open-source project for over a decade.
If your company relies on this in production:
- consider sponsoring
- or contributing improvements
It helps keep the ecosystem stable.
Even in an AI-first world, retrieval is the bottleneck.
Lunr Languages ensures the right content reaches your models — fast, locally, and across languages.