Skip to content

Releases: PyThaiNLP/pythainlp

PyThaiNLP v5.3.4 Released!

02 Apr 18:41
3152a36

Choose a tag to compare

All about bug fixes.

Install/upgrade:

pip install -U pythainlp

What's changed

  • fix: apply เอ็ด rule for ones=1 in hundreds and above by @phoneee in #1386
  • fix: build WSD Trie after populating dictionary, not before by @phoneee in #1388
  • fix: guard han_solo list_cut[-1] and narrow longest.py exception catch by @phoneee in #1382
  • fix: handle empty strings and empty list in word_detokenize by @phoneee in #1374
  • fix: remove misleading second arg from assertTrue/assertFalse calls by @phoneee in #1381
  • fix: use full version string in _check_version for <= and < operators by @phoneee in #1379
  • docs: convert :Example: RST code blocks to proper Python doctest format by @Copilot in #1392

New contributors

Full Changelog: v5.3.3...v5.3.4

PyThaiNLP v5.3.3 Released!

26 Mar 11:51
01b0a86

Choose a tag to compare

Security fixes and thai2rom_onnx bug fixes.

Install/upgrade:

pip install -U pythainlp

What's changed

Added

  • EntitySpan TypedDict to allow type checking of tagged entity members (#1363).

    Migration notes:

    # Before (plain dict)
    from pythainlp.tag.thai_nner import get_top_level_entities
    entities = [
        {"text": ["ห้า"], "span": [7, 9], "entity_type": "cardinal"},
        {"text": ["ห้า", "โมง"], "span": [7, 11], "entity_type": "time"},
        {"text": ["โมง"], "span": [9, 11], "entity_type": "unit"},
    ]
    top_entities = get_top_level_entities(entities)
    
    # After (TypedDict)
    from pythainlp.tag.named_entity import EntitySpan
    from pythainlp.tag.thai_nner import get_top_level_entities
    entities = [
        EntitySpan(text=["ห้า"], span=[7, 9], entity_type="cardinal"),
        EntitySpan(text=["ห้า", "โมง"], span=[7, 11], entity_type="time"),
        EntitySpan(text=["โมง"], span=[9, 11], entity_type="unit"),
    ]
    top_entities = get_top_level_entities(entities)

Fixed

  • thai2rom_onnx: fix ONNX encoder model and fix inference bugs (#1349)
  • wordnet: fix AttributeError (#1354)

Security

  • Replace os.path.join with safe_path_join throughout the codebase
    to prevent path manipulation vulnerabilities (CWE-22) (#1369)

Full Changelog: v5.3.2...v5.3.3

PyThaiNLP v5.3.2 Released!

19 Mar 16:19
6ddcc19

Choose a tag to compare

This release focuses on security improvements related to path traversal and renaming functions to conform with PEP 8 and follow NLTK convention. Old function names are still accessible, but migration to new names are recommended as old function names will be removed in a future version.

Install/upgrade:

pip install -U pythainlp

What's changed

Added

  • pythainlp.chunk module: canonical home for chunking/phrase-structure parsing, following the NLTK nltk.chunk naming convention.

Deprecated

The following names are deprecated and will be removed in 6.0 (#1339):

  • pythainlp.util.isthaichar(): use pythainlp.util.is_thai_char().
  • pythainlp.util.isthai(): use pythainlp.util.is_thai().
  • pythainlp.util.countthai(): use pythainlp.util.count_thai().
  • pythainlp.tag.crfchunk.CRFchunk: use pythainlp.chunk.CRFChunkParser.
  • pythainlp.tag.chunk_parse(): use pythainlp.chunk.chunk_parse().

Security

  • Prevent path traversal: validate that paths stay within their expected base directory (#1342)

Full Changelog: v5.3.1...v5.3.2

PyThaiNLP v5.3.1 Released!

14 Mar 07:08
77ebb17

Choose a tag to compare

This release focuses on security issues related to corpus file loading.

Install/upgrade:

pip install -U pythainlp

What's changed

Security

Full Changelog: v5.3.0...v5.3.1

PyThaiNLP v5.3.0 Released!

10 Mar 06:27
ef81afc

Choose a tag to compare

This release modernizes the codebase, delivering a 62x reduction in peak import memory utilizing lazy-loaded technique (thanks to @what-in-the-nim)*, better support for offline and read-only environments, and achieving typed package status through improved type annotations.

Install/upgrade:

pip install -U pythainlp
  • This is the final minor update for the 5.x series. The upcoming 6.0 release will be a major milestone and is expected to introduce breaking changes.
  • Minimum required Python version is now 3.9. The library ensures compatibility across Python 3.9–3.14.
  • Many updates were AI-assisted; see pull requests for specific prompts and implementation details.
  • Lazy-loaded word lists: Users may notice a slight "cold start" delay during the first function call while word lists initialize; subsequent runs will perform at full speed.
  • Documentation: https://pythainlp.github.io/docs/5.3
  • Report bug: https://github.com/PyThaiNLP/pythainlp/issues

What's changed

Added

  • Tapsai et al. 2020 soundex (#1175)
  • Thai profanity detection (#1183)
  • Qwen3-0.6B language model (#1217)
  • Thai-NNER integration with top-level entity filtering (#1221)
  • pythainlp.braille module for Thai braille conversion (#1287)
  • BLEU, ROUGE, WER, and CER metrics to pythainlp.benchmarks (#1295)
  • Attaparse engine to dependency parser (dependency_parsing, engine="attaparse") (#1303)
  • pythainlp.is_offline_mode() helper function; use PYTHAINLP_OFFLINE=1 to disable automatic corpus downloads (#1306)
  • Thai consonant cluster detection (check_khuap_klam) (#1308)
  • pythainlp.is_read_only_mode() helper function; use PYTHAINLP_READ_ONLY=1 to prevent all write operations (#1317)

Changed

  • Optimized for performance (#1182, #1237, #1320)
  • Lazy load dictionaries to reduce memory usage (#1186)
  • Migrate configurations to pyproject.toml (#1188, #1190, #1226, #1239)
  • Update type hints; use Python 3.9 features (#1189, #1190, #1232, #1262, #1263, #1264, #1274, etc.)
  • Make package zip-safe (#1212)
  • Ensure thread-safety for tokenizers (#1213)
  • Replace TNC word frequency dataset with Phupha filtered by ORST words (#1284)
  • Reorganize "noauto" test suite by dependency groups (torch, tensorflow, onnx, cython, network) (#1290)
  • get_corpus_path() now respects PYTHAINLP_OFFLINE env var (follows HF_HUB_OFFLINE convention from Hugging Face): raises FileNotFoundError if the corpus is not cached locally when the var is set; auto-downloads otherwise (#1306)
  • Callers raise FileNotFoundError with download instructions when a corpus path cannot be resolved (#1306)
  • Migrate build backend to hatchling (#1311)

Deprecated

  • PYTHAINLP_DATA_DIR env var; use PYTHAINLP_DATA instead (follows NLTK_DATA convention from NLTK) PYTHAINLP_DATA_DIR will be removed in a future version (#1306)
  • PYTHAINLP_READ_MODE env var; use PYTHAINLP_READ_ONLY instead PYTHAINLP_READ_MODE will be removed in a future version (#1317)

Removed

  • Duplicated entries in Volubilis dictionary (#1200)
  • Star imports (#1207)
  • requests dependency (#1211)
  • pythainlp.util.is_native_thai (deprecated since v5.0); use pythainlp.morpheme.is_native_thai instead (#1315)

Fixed

  • royin romanization: Consonant cluster boundary (#1172)
  • check_marttra(): Final consonant classification (#1173)
  • Base dependencies (#1185)
  • tltk transliteration: Kho Khon alphabet issue in (#1187)
  • Fix tone_detector and sound_syllable bugs (#1197)
  • normalize(): Remove spaces before tone marks and non-base characters (#1222)
  • Suppress Gensim duplicate-word warnings when loading word2vec binary files (#1316)
  • db.json: created lazily only when a corpus is first downloaded (#1317)
  • newmm tokenization: Exponential-time explosion when text has many ambiguous breaking points (#1319)
  • Trie: Reduce memory usage and faster TCC boundary lookups (#1323)

Security

  • Prevent path traversal and symlink attacks in archive extraction (#1225)

New contributors

Full Changelog: v5.2.0...v5.3.0

PyThaiNLP v5.2.0 Released!

20 Dec 12:54

Choose a tag to compare

We released PyThaiNLP v5.2.0! This version adds more functions, improves performance, enhances documentation, and fixes some bugs.

Install: pip install pythainlp
Upgrade: pip install -U pythainlp

See PyThaiNLP 5.2 Change Log: #1080

What is new?

  • Add pythainlp.translate.word_translate #1102
  • Update Dockerfile #1049
  • Add Words Spelling Correction using Char2Vec #1075
  • Add Thailand Ancient Currency Converter #1113
  • Add B-K/umt5-thai-g2p-v2-0.5k #1140
  • Add budoux to word tokenizer engine #1161
  • pythainlp.cls, use instead pythainlp.classify 9a2c157

Remove

  • Remove conceptnet #1103

Deprecation and other API changes

  • pythainlp.cls, use instead pythainlp.classify
  • pythainlp.corpus.thai_synonym, use instead pythainlp.corpus.thai_synonyms
  • pythainlp.util.maiyamok, use instead pythainlp.util.expand_maiyamok

Full Changelog: v5.1.2...v5.2.0

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

PyThaiNLP v5.2.0-beta1

10 Dec 17:11

Choose a tag to compare

Pre-release

Schedule

  • First Beta release: 10 December 2025
  • Production release: 20 December 2025

PyThaiNLP 5.2 Change Log #1080

Docs: https://pythainlp.org/dev-docs/

What's Changed

New Contributors

Full Changelog: v5.1.2...v5.2.0-beta1

PyThaiNLP v5.1.2 Released!

09 May 09:02
4b7644b

Choose a tag to compare

PyThaiNLP v5.1.2 is a bug fix release of PyThaiNLP v5.1.

Install: pip install pythainlp
Upgrade: pip install -U pythainlp

See PyThaiNLP 5.1 Change Log: #900.

What's Changed

  • Update romanize docs and keep space #1110

Full Changelog: v5.1.1...v5.1.2

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

PyThaiNLP v5.1.1 Released!

31 Mar 11:58
ae8018c

Choose a tag to compare

PyThaiNLP v5.1.1 is a bug fix release of PyThaiNLP v5.1.

Install: pip install pythainlp
Upgrade: pip install -U pythainlp

See PyThaiNLP 5.1 Change Log: #900.

What's Changed

  • PR Description: Refactor thai_consonants_all to Use set in syllable.py #1087 by @allrob23
  • ThaiTransliterator: Select 1D CPU int64 tensor device #1089 by @jkingd0n

Full Changelog: v5.1.0...v5.1.1

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP

PyThaiNLP v5.1.0 Released!

25 Feb 12:13
d88d971

Choose a tag to compare

We released PyThaiNLP v5.1.0! This version has increased features and fixed problems such as Thai Discourse Treebank (TDTB), Thai Solar Date converted to Thai Lunar Date, and others.

Install: pip install pythainlp
Upgrade: pip install -U pythainlp

See PyThaiNLP 5.1 Change Log: #900

What is new?

New features

  • Add Thai Discourse Treebank postag #910
  • Add Thai Universal Dependency Treebank postag #916
  • Add Thai G2P v2 Grapheme-to-Phoneme model #923
  • Add support for list of strings as input to sent_tokenize() #927
  • Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console #969
  • Add Thai Solar Date convert to Thai Lunar Date #998
  • Add Thai pangram text #1045
  • Add pythainlp.llm #1043

Bug fixes

  • Fix collate() to consider tonemark in ordering #926
  • Fix maiyamok() that expanding the wrong word #962
  • Fix nlpo3.load_dict() that never print error msg when not success #979

Remove

  • Remove clause_tokenize #1024

Deprecation and other API changes

  • 5.1
    • pythainlp.util.is_native_thai, use instead pythainlp.morpheme.is_native_thai
  • 5.2
    • pythainlp.cls, use instead pythainlp.classify
    • pythainlp.corpus.thai_synonym, use instead pythainlp.corpus.thai_synonyms
    • pythainlp.util.maiyamok, use instead pythainlp.util.expand_maiyamok

Improve

  • Add more Thailand political party to Thai dictionary 2252dee
  • Fix inconsistency in newmm-safe engine by copilot #1063
  • Update warn_deprecation to get deprecated and removal versions #1028
  • Remove unnecessary enumerate in expand_maiyamok #1029
  • Add SPDX FileType #1032
  • Fix bug in Longest Matching tokenizer to preprocess spaces consistently #1062
  • Add codemeta.json file to root directory #1053

Full Changelog: v5.0.0...v5.1.0

Contributors

Thanks all the contributors. (Image made with contributors-img)

We build Thai NLP.

PyThaiNLP