Lunr Languages — Multilingual Search for AI, RAG & Local-First Apps

Used by 18k+ projects • ~300k weekly downloads

Lunr Languages is an extension for Lunr.js that enables fast, multilingual full-text search across dozens of languages — in the browser or Node.js.

Originally built for classic search, it is now widely used as a lightweight retrieval layer in AI systems, including:

Retrieval-Augmented Generation (RAG)
Hybrid search (keyword + vector)
Local-first / edge AI apps
Static site search and documentation search

⭐ If this project saves you time or powers something important, consider starring it or supporting its maintenance.

Supported Languages

German
French
Spanish
Italian
Dutch
Danish
Portuguese
Finnish
Romanian
Hungarian
Russian
Norwegian
Swedish
Turkish
Japanese
Thai
Arabic
Chinese¹
Vietnamese
Sanskrit
Kannada
Telugu
Hindi
Tamil
Korean
Armenian
Hebrew
Greek

→ Contribute a new language

¹ Chinese tokenization uses Intl.Segmenter with CJK bigrams by default, which works in modern browsers and Node.js without native dependencies. In Node.js, if @node-rs/jieba is installed, Lunr Languages uses it automatically for higher-quality Jieba segmentation. Browsers must support Intl.Segmenter; there is no frontend fallback.

Why Lunr Languages in an AI world?

Modern AI systems don’t replace search — they depend on it.

Before an LLM can generate an answer, it needs relevant context. That’s where Lunr Languages fits:

🔎 Fast and consistent lexical retrieval

Filter thousands of documents down to a small candidate set before embedding or reranking.

🌍 Multilingual support out of the box

Tokenization, stemming, and stopwords for 30+ languages — still a hard problem in AI pipelines.

⚡ Zero infrastructure

Runs entirely in the browser or Node.js. No vector DB required.

🔒 Privacy-friendly / offline-ready

Perfect for:

in-browser AI assistants
local knowledge bases
on-device search

Example: Hybrid Search (Keyword + AI)

User query
→ Lunr (keyword search, multilingual)
→ top 100–500 documents
→ embeddings / reranker
→ LLM generates answer

Lunr Languages improves recall and precision, especially for:

non-English content
inflected languages
mixed-language datasets

Installation

npm install lunr-languages

Usage

Basic example (German)

const lunr = require('lunr');
require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.de')(lunr);

const idx = lunr(function () {
  this.use(lunr.de);

  this.field('title', { boost: 10 });
  this.field('body');

  this.add({ title: 'Dokument', body: 'Beispieltext' });
});

Multi-language indexing

require('lunr-languages/lunr.multi')(lunr);

const idx = lunr(function () {
  this.use(lunr.multiLanguage('en', 'ru', 'de'));

  this.field('title');
  this.field('body');
});

Chinese Tokenization

Chinese support is designed to work without mandatory native binaries:

In browsers, lunr.zh uses Intl.Segmenter plus CJK bigrams. If Intl.Segmenter is unavailable, it logs an error and throws because there is no bundled browser fallback.
In Node.js, lunr.zh first tries to load @node-rs/jieba. If it is installed, it is used for better Chinese segmentation. If it is not installed, Lunr Languages logs an informational message and falls back to Intl.Segmenter plus CJK bigrams.
If neither @node-rs/jieba nor Intl.Segmenter is available in Node.js, Chinese tokenization logs an error and throws.

The Intl.Segmenter fallback avoids native package supply-chain risk and works well for lightweight search, but it is not identical to Jieba. Bigrams improve recall for common two-character search terms such as 车主 and 学姐, while Jieba generally provides better precision and ranking for serious Chinese search.

To opt into Jieba tokenization in Node.js:

npm install @node-rs/jieba

Where this fits in modern architectures

Lunr Languages is commonly used as:

Pre-filter for vector search
Fallback when embeddings fail
Client-side retrieval for AI apps
Static / documentation search

👉 In practice, hybrid search (keyword + vector) performs best

How it works

To provide high-quality search across languages:

Tokenization — language-aware splitting (including Japanese, Chinese, etc.)
Stemming — matches different word forms
Stopword filtering — removes noise
Trimming — normalizes tokens

These steps improve both classic search and AI retrieval pipelines.

When to use Lunr Languages vs vector search

Use Lunr Languages when you need:

fast, deterministic keyword matching
multilingual normalization
offline / browser-based search
low-cost retrieval

Combine with embeddings for:

semantic similarity
fuzzy concept matching

Contributing

Want to add a new language?

See CONTRIBUTING.md

Support / Sponsorship

Maintained as an open-source project for over a decade.

If your company relies on this in production:

consider sponsoring
or contributing improvements

It helps keep the ecosystem stable.

Final note

Even in an AI-first world, retrieval is the bottleneck.

Lunr Languages ensures the right content reaches your models — fast, locally, and across languages.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github		.github
build		build
demos		demos
min		min
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bower.json		bower.json
lunr.ar.js		lunr.ar.js
lunr.da.js		lunr.da.js
lunr.de.js		lunr.de.js
lunr.du.js		lunr.du.js
lunr.el.js		lunr.el.js
lunr.es.js		lunr.es.js
lunr.fi.js		lunr.fi.js
lunr.fr.js		lunr.fr.js
lunr.he.js		lunr.he.js
lunr.hi.js		lunr.hi.js
lunr.hu.js		lunr.hu.js
lunr.hy.js		lunr.hy.js
lunr.it.js		lunr.it.js
lunr.ja.js		lunr.ja.js
lunr.jp.js		lunr.jp.js
lunr.kn.js		lunr.kn.js
lunr.ko.js		lunr.ko.js
lunr.multi.js		lunr.multi.js
lunr.nl.js		lunr.nl.js
lunr.no.js		lunr.no.js
lunr.pt.js		lunr.pt.js
lunr.ro.js		lunr.ro.js
lunr.ru.js		lunr.ru.js
lunr.sa.js		lunr.sa.js
lunr.stemmer.support.js		lunr.stemmer.support.js
lunr.sv.js		lunr.sv.js
lunr.ta.js		lunr.ta.js
lunr.te.js		lunr.te.js
lunr.th.js		lunr.th.js
lunr.tr.js		lunr.tr.js
lunr.vi.js		lunr.vi.js
lunr.zh.js		lunr.zh.js
package-lock.json		package-lock.json
package.json		package.json
tinyseg.js		tinyseg.js
wordcut.js		wordcut.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lunr Languages — Multilingual Search for AI, RAG & Local-First Apps

Supported Languages

Why Lunr Languages in an AI world?

🔎 Fast and consistent lexical retrieval

🌍 Multilingual support out of the box

⚡ Zero infrastructure

🔒 Privacy-friendly / offline-ready

Example: Hybrid Search (Keyword + AI)

Installation

Usage

Basic example (German)

Multi-language indexing

Chinese Tokenization

Where this fits in modern architectures

How it works

When to use Lunr Languages vs vector search

Contributing

Support / Sponsorship

Final note

About

Uh oh!

Releases 16

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lunr Languages — Multilingual Search for AI, RAG & Local-First Apps

Supported Languages

Why Lunr Languages in an AI world?

🔎 Fast and consistent lexical retrieval

🌍 Multilingual support out of the box

⚡ Zero infrastructure

🔒 Privacy-friendly / offline-ready

Example: Hybrid Search (Keyword + AI)

Installation

Usage

Basic example (German)

Multi-language indexing

Chinese Tokenization

Where this fits in modern architectures

How it works

When to use Lunr Languages vs vector search

Contributing

Support / Sponsorship

Final note

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages