|
| 1 | +# Argument Reference |
| 2 | + |
| 3 | +_Auto-generated — do not edit by hand._ |
| 4 | + |
| 5 | +## DistillArguments |
| 6 | + |
| 7 | +| Argument | Type | Default | Description | |
| 8 | +|----------|------|---------|-------------| |
| 9 | +| `--distill` | `bool` | `False` | Enable training with knowledge distillation. | |
| 10 | +| `--teacher_model` | `str` | `None` | The name or path of the teacher model to use for distillation. | |
| 11 | +| `--criterion` | `str` | `"logits_loss"` | Distillation loss criterion. Currently only 'logits_loss' is supported. | |
| 12 | + |
| 13 | +## DataArguments |
| 14 | + |
| 15 | +| Argument | Type | Default | Description | |
| 16 | +|----------|------|---------|-------------| |
| 17 | +| `--dataset_config` | `str` | `"configs/dataset/blend.yaml"` | Path to a dataset blend YAML config file. | |
| 18 | +| `--train_samples` | `int` | `20000` | Number of training samples to draw from the blend. | |
| 19 | +| `--eval_samples` | `int` | `2000` | Number of evaluation samples to draw from the blend. | |
| 20 | +| `--dataset_seed` | `int` | `42` | Random seed for dataset shuffling. | |
| 21 | +| `--dataset_cache_dir` | `str` | `".dataset_cache/tokenized"` | Directory for caching tokenized datasets. | |
| 22 | +| `--shuffle` | `bool` | `True` | Whether to shuffle dataset sources (reservoir sampling). | |
| 23 | +| `--shuffle_buffer` | `int` | `10000` | Buffer size for streaming shuffle. | |
| 24 | +| `--num_proc` | `int` | `16` | Number of CPU workers for tokenization. | |
| 25 | + |
| 26 | +## ModelArguments |
| 27 | + |
| 28 | +| Argument | Type | Default | Description | |
| 29 | +|----------|------|---------|-------------| |
| 30 | +| `--model_name_or_path` | `str` | `"meta-llama/Llama-2-7b-hf"` | | |
| 31 | +| `--model_max_length` | `int` | `4096` | Maximum sequence length. Sequences will be right padded (and possibly truncated). | |
| 32 | + |
| 33 | +## QuantizeArguments |
| 34 | + |
| 35 | +| Argument | Type | Default | Description | |
| 36 | +|----------|------|---------|-------------| |
| 37 | +| `--recipe` | `str` | `None` | Path to a quantization recipe YAML file (built-in or custom). Built-in recipes can be specified by relative path, e.g. 'general/ptq/nvfp4_default-fp8_kv'. | |
| 38 | +| `--calib_size` | `int` | `512` | Specify the calibration size for quantization. The calibration dataset is used to setup the quantization scale parameters. | |
| 39 | +| `--calib_batch_size` | `int` | `1` | Batch size for calibration data during quantization. | |
| 40 | +| `--compress` | `bool` | `False` | Whether to compress the model weights after quantization for QLoRA. This is useful for reducing the model size. | |
| 41 | +| `--quantize_output_dir` | `str` | `"quantized_model"` | Directory to save the quantized model checkpoint. | |
| 42 | + |
| 43 | +## TrainingArguments |
| 44 | + |
| 45 | +Extends [HuggingFace TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments). Only additional/overridden arguments are shown below. |
| 46 | + |
| 47 | +| Argument | Type | Default | Description | |
| 48 | +|----------|------|---------|-------------| |
| 49 | +| `--cache_dir` | `str` | `None` | | |
| 50 | +| `--lora` | `bool` | `False` | Whether to add LoRA (Low-Rank Adaptation) adapter before training. When using real quantization, the LoRA adapter must be set, as quantized weights will be frozen during training. | |
0 commit comments