LLMs perform poorly on arithmetic tasks, requiring excessive reasoning tokens to achieve good performance. We propose BitTokens,
a novel encoding strategy that represents any number as a single token using its IEEE 754 binary floating-point representation. This single-token number encoding allows language models to solve arithmetic tasks both effectively and efficiently.

To get started check out our interactive Jupyter notebook.
A more detailed implementation of BitTokens can be found in the bittoken_embedding.py file.
Tip
We recommend using the fast package manager uv for dependency management, but you may use any other package manager. We provide an additional requirements.txt file for this. Replace uv run with python in the commands.
- Download and install the fast package manager UV.
# Download and install uv with python version >=3.13 curl -LsSf https://astral.sh/uv/install.sh | sh
- Sync uv environment
# Installs python 3.13, torch 2.11, and other dependencies uv sync
Note
At the time of writing, there exists no official pre-built wheel for FlashAttention with torch=2.11 and python=3.13. We use this approach instead.
Tip
Sometimes FlashAttention causes trouble when installing. If you run into an error, please refer to the official install guide.
uv pip install "flash_attn-2.8.3+cu12torch2.11cxx11abiTRUE-cp313-cp313-linux_x86_64.whl" # replace with `flash-attn==2.8.3 --no-build-isolation` once official wheel available
uv pip install git+https://github.com/KellerJordan/Muon-
Create an
.envfile and define the following variables:PROJECT_PATH=... # Absolute path to the 'BitTokens/' folder DATA_PATH=... # Absolute path to data folder # [Optional] If you want to use the eval_scripts OPENROUTER_API_KEY=...
-
For convenience, load the
.envfile to execute the next commands.source .env
To reproduce the manuscript-style training commands below, download the exact synthetic number-problem dataset used by the paper. Set DATA_PATH to the directory where the files should be placed, then run:
uv run --with huggingface_hub hf download KreitnerL/BitTokens-dataset --repo-type dataset --local-dir "$DATA_PATH"Dataset page: https://huggingface.co/datasets/KreitnerL/BitTokens-dataset
The dataset contains all synthetic number-problem CSV files referenced by the BitToken configs and the FoNE, xVal, significant-digit, token-digit, and base-10 baseline configs. It includes the standard arithmetic tasks plus the hard tasks: Exponentiation, Mean, and Std. It also includes the binary-uniform curriculum files used by BitTokens where referenced by the configs.
The hosted dataset has 37 CSV files: 14 train CSVs, 14 validation CSVs, and 9 test CSVs. It intentionally does not include FineWeb-derived .txt files; those should be downloaded from the public FineWeb dataset instead.
The hosted CSV files keep only the columns required for training and evaluation: prompt, text_prompt, answer, difficulty, and difficulty_sd.
The multitask configs mix the synthetic number-problem data with text data. Download FineWeb from its original public Hugging Face dataset rather than from this repo:
uv run --with huggingface_hub hf download HuggingFaceFW/fineweb \
--repo-type dataset \
--include "sample/10BT/*.parquet" \
--local-dir "$DATA_PATH"Decode the downloaded parquet files to text files:
uv run $PROJECT_PATH/data_generation/decode_fineweb.py \
--folder_dir "$DATA_PATH/sample/10BT/" \
--save_path "$DATA_PATH/"The training configs expect the FineWeb text files at $DATA_PATH/000_00000_train.txt and $DATA_PATH/val_text.txt. If your decoded files have different names, create those train/validation text files under $DATA_PATH before launching training.
You can also generate fresh number problems locally. This is useful for development, but it will not produce the exact same examples used in the paper, so training results may differ.
- Generate the number problems for each task for each phase:
# Decimal version (used for all base-10 baselines and for testing) uv run $PROJECT_PATH/data_generation/data_generation_v2.py --save_dir $DATA_PATH # Binary version (used for BitToken training) uv run $PROJECT_PATH/data_generation/data_generation_v2.py --save_dir $DATA_PATH --significant_digits_distribution binary_uniform
- Download and decode FineWeb as described above if you want to run the mixed numeric/text multitask configs.
To recreate a BitToken model in a multiTask setting similar to the manuscript, run:
uv run $PROJECT_PATH/train.py --load_config_from $PROJECT_PATH/configs/config_bittoken_multiTask.py --tqdm --verbose --deterministic --seed 999Note
The first run has a longer startup time because it tokenizes the entire dataset first and stores it in a cache directory under $DATA_PATH/.
This has been tested on a Nvidia DGX A100 80GB GPU. The results will be stored in the folder $PROJECT_PATH/trained.
If you find our work useful, please cite our ICML 2026 paper:
@inproceedings{
kreitner2026bittokens,
title={Efficient numeracy in language models through single-token number embeddings},
author={Linus Kreitner and Paul Hager and Jonathan Mengedoht and Georgios Kaissis and Daniel Rueckert and Martin J. Menten},
booktitle={Forty-third International Conference on Machine Learning},
year={2026},
url={https://openreview.net/forum?id=Bh4Ubk80M8}
}