Releases: PyThaiNLP/pythainlp
PyThaiNLP v5.3.4 Released!
All about bug fixes.
Install/upgrade:
pip install -U pythainlp- Documentation: https://pythainlp.github.io/docs/5.3
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's changed
- fix: apply เอ็ด rule for ones=1 in hundreds and above by @phoneee in #1386
- fix: build WSD Trie after populating dictionary, not before by @phoneee in #1388
- fix: guard han_solo list_cut[-1] and narrow longest.py exception catch by @phoneee in #1382
- fix: handle empty strings and empty list in word_detokenize by @phoneee in #1374
- fix: remove misleading second arg from assertTrue/assertFalse calls by @phoneee in #1381
- fix: use full version string in _check_version for <= and < operators by @phoneee in #1379
- docs: convert :Example: RST code blocks to proper Python doctest format by @Copilot in #1392
New contributors
Full Changelog: v5.3.3...v5.3.4
PyThaiNLP v5.3.3 Released!
Security fixes and thai2rom_onnx bug fixes.
Install/upgrade:
pip install -U pythainlp- Documentation: https://pythainlp.github.io/docs/5.3
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's changed
Added
-
EntitySpanTypedDict to allow type checking of tagged entity members (#1363).Migration notes:
# Before (plain dict) from pythainlp.tag.thai_nner import get_top_level_entities entities = [ {"text": ["ห้า"], "span": [7, 9], "entity_type": "cardinal"}, {"text": ["ห้า", "โมง"], "span": [7, 11], "entity_type": "time"}, {"text": ["โมง"], "span": [9, 11], "entity_type": "unit"}, ] top_entities = get_top_level_entities(entities) # After (TypedDict) from pythainlp.tag.named_entity import EntitySpan from pythainlp.tag.thai_nner import get_top_level_entities entities = [ EntitySpan(text=["ห้า"], span=[7, 9], entity_type="cardinal"), EntitySpan(text=["ห้า", "โมง"], span=[7, 11], entity_type="time"), EntitySpan(text=["โมง"], span=[9, 11], entity_type="unit"), ] top_entities = get_top_level_entities(entities)
Fixed
- thai2rom_onnx: fix ONNX encoder model and fix inference bugs (#1349)
- wordnet: fix AttributeError (#1354)
Security
- Replace
os.path.joinwithsafe_path_jointhroughout the codebase
to prevent path manipulation vulnerabilities (CWE-22) (#1369)
Full Changelog: v5.3.2...v5.3.3
PyThaiNLP v5.3.2 Released!
This release focuses on security improvements related to path traversal and renaming functions to conform with PEP 8 and follow NLTK convention. Old function names are still accessible, but migration to new names are recommended as old function names will be removed in a future version.
Install/upgrade:
pip install -U pythainlp- Documentation: https://pythainlp.github.io/docs/5.3
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's changed
Added
pythainlp.chunkmodule: canonical home for chunking/phrase-structure parsing, following the NLTKnltk.chunknaming convention.
Deprecated
The following names are deprecated and will be removed in 6.0 (#1339):
pythainlp.util.isthaichar(): usepythainlp.util.is_thai_char().pythainlp.util.isthai(): usepythainlp.util.is_thai().pythainlp.util.countthai(): usepythainlp.util.count_thai().pythainlp.tag.crfchunk.CRFchunk: usepythainlp.chunk.CRFChunkParser.pythainlp.tag.chunk_parse(): usepythainlp.chunk.chunk_parse().
Security
- Prevent path traversal: validate that paths stay within their expected base directory (#1342)
Full Changelog: v5.3.1...v5.3.2
PyThaiNLP v5.3.1 Released!
This release focuses on security issues related to corpus file loading.
Install/upgrade:
pip install -U pythainlp- Documentation: https://pythainlp.github.io/docs/5.3
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's changed
Security
- thai2fit: Use JSON model instead of pickle by @wannaphong in #1325
- Defensive corpus loading: validate fields before processing by @bact in #1327
- w2p: Use npz model instead of pickle by @wannaphong in #1328
Full Changelog: v5.3.0...v5.3.1
PyThaiNLP v5.3.0 Released!
This release modernizes the codebase, delivering a 62x reduction in peak import memory utilizing lazy-loaded technique (thanks to @what-in-the-nim)*, better support for offline and read-only environments, and achieving typed package status through improved type annotations.
Install/upgrade:
pip install -U pythainlp- This is the final minor update for the 5.x series. The upcoming 6.0 release will be a major milestone and is expected to introduce breaking changes.
- Minimum required Python version is now 3.9. The library ensures compatibility across Python 3.9–3.14.
- Many updates were AI-assisted; see pull requests for specific prompts and implementation details.
- Lazy-loaded word lists: Users may notice a slight "cold start" delay during the first function call while word lists initialize; subsequent runs will perform at full speed.
- Documentation: https://pythainlp.github.io/docs/5.3
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
What's changed
Added
- Tapsai et al. 2020 soundex (#1175)
- Thai profanity detection (#1183)
- Qwen3-0.6B language model (#1217)
- Thai-NNER integration with top-level entity filtering (#1221)
pythainlp.braillemodule for Thai braille conversion (#1287)- BLEU, ROUGE, WER, and CER metrics to
pythainlp.benchmarks(#1295) - Attaparse engine to dependency parser (
dependency_parsing, engine="attaparse") (#1303) pythainlp.is_offline_mode()helper function; usePYTHAINLP_OFFLINE=1to disable automatic corpus downloads (#1306)- Thai consonant cluster detection (
check_khuap_klam) (#1308) pythainlp.is_read_only_mode()helper function; usePYTHAINLP_READ_ONLY=1to prevent all write operations (#1317)
Changed
- Optimized for performance (#1182, #1237, #1320)
- Lazy load dictionaries to reduce memory usage (#1186)
- Migrate configurations to
pyproject.toml(#1188, #1190, #1226, #1239) - Update type hints; use Python 3.9 features (#1189, #1190, #1232, #1262, #1263, #1264, #1274, etc.)
- Make package zip-safe (#1212)
- Ensure thread-safety for tokenizers (#1213)
- Replace TNC word frequency dataset with Phupha filtered by ORST words (#1284)
- Reorganize "noauto" test suite by dependency groups (torch, tensorflow, onnx, cython, network) (#1290)
get_corpus_path()now respectsPYTHAINLP_OFFLINEenv var (followsHF_HUB_OFFLINEconvention from Hugging Face): raisesFileNotFoundErrorif the corpus is not cached locally when the var is set; auto-downloads otherwise (#1306)- Callers raise
FileNotFoundErrorwith download instructions when a corpus path cannot be resolved (#1306) - Migrate build backend to
hatchling(#1311)
Deprecated
PYTHAINLP_DATA_DIRenv var; usePYTHAINLP_DATAinstead (followsNLTK_DATAconvention from NLTK)PYTHAINLP_DATA_DIRwill be removed in a future version (#1306)PYTHAINLP_READ_MODEenv var; usePYTHAINLP_READ_ONLYinsteadPYTHAINLP_READ_MODEwill be removed in a future version (#1317)
Removed
- Duplicated entries in Volubilis dictionary (#1200)
- Star imports (#1207)
requestsdependency (#1211)pythainlp.util.is_native_thai(deprecated since v5.0); usepythainlp.morpheme.is_native_thaiinstead (#1315)
Fixed
royinromanization: Consonant cluster boundary (#1172)check_marttra(): Final consonant classification (#1173)- Base dependencies (#1185)
tltktransliteration: Kho Khon alphabet issue in (#1187)- Fix tone_detector and sound_syllable bugs (#1197)
normalize(): Remove spaces before tone marks and non-base characters (#1222)- Suppress Gensim duplicate-word warnings when loading word2vec binary files (#1316)
db.json: created lazily only when a corpus is first downloaded (#1317)newmmtokenization: Exponential-time explosion when text has many ambiguous breaking points (#1319)Trie: Reduce memory usage and faster TCC boundary lookups (#1323)
Security
- Prevent path traversal and symlink attacks in archive extraction (#1225)
New contributors
- @Copilot made their first contribution in #1172
- @what-in-the-nim made their first contribution in #1185
Full Changelog: v5.2.0...v5.3.0
PyThaiNLP v5.2.0 Released!
We released PyThaiNLP v5.2.0! This version adds more functions, improves performance, enhances documentation, and fixes some bugs.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.2
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.2 Change Log: #1080
What is new?
- Add pythainlp.translate.word_translate #1102
- Update Dockerfile #1049
- Add Words Spelling Correction using Char2Vec #1075
- Add Thailand Ancient Currency Converter #1113
- Add B-K/umt5-thai-g2p-v2-0.5k #1140
- Add budoux to word tokenizer engine #1161
- pythainlp.cls, use instead pythainlp.classify 9a2c157
Remove
- Remove conceptnet #1103
Deprecation and other API changes
pythainlp.cls, use insteadpythainlp.classifypythainlp.corpus.thai_synonym, use insteadpythainlp.corpus.thai_synonymspythainlp.util.maiyamok, use insteadpythainlp.util.expand_maiyamok
Full Changelog: v5.1.2...v5.2.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
PyThaiNLP v5.2.0-beta1
Schedule
- First Beta release: 10 December 2025
- Production release: 20 December 2025
PyThaiNLP 5.2 Change Log #1080
Docs: https://pythainlp.org/dev-docs/
What's Changed
- Bump h5py from 3.12.1 to 3.13.0 by @dependabot[bot] in #1082
- Bump transformers from 4.48.2 to 4.49.0 by @dependabot[bot] in #1081
- PyThaiNLP v5.1.0 Released! by @wannaphong in #1079
- Bump symspellpy from 6.7.8 to 6.9.0 by @dependabot[bot] in #1083
- Bump tensorflow from 2.18.0 to 2.18.1 by @dependabot[bot] in #1085
- Refactor syllable.py to use set for thai_consonants_all by @allrob23 in #1087
- ThaiTransliterator: Select 1D CPU int64 tensor device by @jkingd0n in #1089
- Bump transformers from 4.49.0 to 4.50.3 by @dependabot[bot] in #1092
- Bump transformers from 4.50.3 to 4.51.0 by @dependabot[bot] in #1096
- Bump transformers from 4.51.0 to 4.51.3 by @dependabot[bot] in #1098
- Bump pyicu from 2.14 to 2.15 by @dependabot[bot] in #1094
- Bump pyicu from 2.15 to 2.15.1 by @dependabot[bot] in #1099
- Remove conceptnet by @wannaphong in #1103
- Fixed #1105 by @wannaphong in #1106
- Bump pyicu from 2.15.1 to 2.15.2 by @dependabot[bot] in #1107
- Add pythainlp.translate.word_translate by @wannaphong in #1102
- Update romanize docs and keep space by @wannaphong in #1110
- Add convert_currency by @wannaphong in #1114
- Bump transformers from 4.51.3 to 4.52.3 by @dependabot[bot] in #1115
- Remove spylls by @wannaphong in #1117
- Update docker and fairseq by @wannaphong in #1116
- Bump panphon from 0.21.2 to 0.22.1 by @dependabot[bot] in #1119
- Bump transformers from 4.52.3 to 4.52.4 by @dependabot[bot] in #1118
- Bump panphon from 0.22.1 to 0.22.2 by @dependabot[bot] in #1122
- Update numpy requirement from <2,>=1.26.0 to >=1.26.0,<3 by @dependabot[bot] in #1121
- Bump transformers from 4.52.4 to 4.53.0 by @dependabot[bot] in #1123
- Bump transformers from 4.53.0 to 4.53.1 by @dependabot[bot] in #1125
- Bump transformers from 4.53.1 to 4.53.2 by @dependabot[bot] in #1128
- Bump actions/first-interaction from 1 to 2 by @dependabot[bot] in #1129
- Fix Docker build failure, add docker compose file for convenience by @tassa-yoniso-manasi-karoto in #1132
- Bump actions/first-interaction from 2 to 3 by @dependabot[bot] in #1139
- Add TypeError to digitconv.py by @wannaphong in #1127
- Bump actions/checkout from 4 to 5 by @dependabot[bot] in #1138
- Add B-K/umt5-thai-g2p-v2-0.5k by @wannaphong in #1140
- Bump sentencepiece from 0.2.0 to 0.2.1 by @dependabot[bot] in #1142
- Bump transformers from 4.53.2 to 4.55.2 by @dependabot[bot] in #1141
- Add pythainlp.util.analyze_thai_text by @wannaphong in #1149
- Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #1146
- Bump actions/stale from 9 to 10 by @dependabot[bot] in #1145
- Bump transformers from 4.55.2 to 4.57.0 by @dependabot[bot] in #1152
- Bump github/codeql-action from 3 to 4 by @dependabot[bot] in #1153
- Add get_hf_hub and make_safe_directory_name by @wannaphong in #1156
- fix the connectivity of cli commands by @kartik912 in #1154
- Add get_words_spell_suggestion by @wannaphong in #1157
- Bump transformers from 4.57.0 to 4.57.1 by @dependabot[bot] in #1158
- Add budoux by @wannaphong in #1161
- Bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #1163
- Bump actions/download-artifact from 4 to 6 by @dependabot[bot] in #1162
- Bump transformers from 4.57.1 to 4.57.3 by @dependabot[bot] in #1165
New Contributors
- @allrob23 made their first contribution in #1087
- @jkingd0n made their first contribution in #1089
- @tassa-yoniso-manasi-karoto made their first contribution in #1132
- @kartik912 made their first contribution in #1154
Full Changelog: v5.1.2...v5.2.0-beta1
PyThaiNLP v5.1.2 Released!
PyThaiNLP v5.1.2 is a bug fix release of PyThaiNLP v5.1.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.1
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.1 Change Log: #900.
What's Changed
- Update romanize docs and keep space #1110
Full Changelog: v5.1.1...v5.1.2
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
PyThaiNLP v5.1.1 Released!
PyThaiNLP v5.1.1 is a bug fix release of PyThaiNLP v5.1.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.1
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.1 Change Log: #900.
What's Changed
- PR Description: Refactor thai_consonants_all to Use set in syllable.py #1087 by @allrob23
- ThaiTransliterator: Select 1D CPU int64 tensor device #1089 by @jkingd0n
Full Changelog: v5.1.0...v5.1.1
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP
PyThaiNLP v5.1.0 Released!
We released PyThaiNLP v5.1.0! This version has increased features and fixed problems such as Thai Discourse Treebank (TDTB), Thai Solar Date converted to Thai Lunar Date, and others.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.1
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.1 Change Log: #900
What is new?
New features
- Add Thai Discourse Treebank postag #910
- Add Thai Universal Dependency Treebank postag #916
- Add Thai G2P v2 Grapheme-to-Phoneme model #923
- Add support for list of strings as input to sent_tokenize() #927
- Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console #969
- Add Thai Solar Date convert to Thai Lunar Date #998
- Add Thai pangram text #1045
- Add pythainlp.llm #1043
Bug fixes
- Fix collate() to consider tonemark in ordering #926
- Fix maiyamok() that expanding the wrong word #962
- Fix nlpo3.load_dict() that never print error msg when not success #979
Remove
- Remove clause_tokenize #1024
Deprecation and other API changes
- 5.1
pythainlp.util.is_native_thai, use insteadpythainlp.morpheme.is_native_thai
- 5.2
pythainlp.cls, use insteadpythainlp.classifypythainlp.corpus.thai_synonym, use insteadpythainlp.corpus.thai_synonymspythainlp.util.maiyamok, use insteadpythainlp.util.expand_maiyamok
Improve
- Add more Thailand political party to Thai dictionary 2252dee
- Fix inconsistency in newmm-safe engine by copilot #1063
- Update warn_deprecation to get deprecated and removal versions #1028
- Remove unnecessary enumerate in expand_maiyamok #1029
- Add SPDX FileType #1032
- Fix bug in Longest Matching tokenizer to preprocess spaces consistently #1062
- Add codemeta.json file to root directory #1053
Full Changelog: v5.0.0...v5.1.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
We build Thai NLP.
PyThaiNLP