Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions docs/guides/fine_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,8 @@ from llmengine import FineTune

response = FineTune.create(
model="llama-2-7b",
training_file="file-7DLVeLdN2Ty4M2m",
validation_file="file-ezSRtpgKQyItI26",
training_file="file-AbCDeLdN2Ty4M2m",
validation_file="file-ezSRpgtKQyItI26",
)

print(response.json())
Expand All @@ -152,7 +152,35 @@ See the [Model Zoo](../../model_zoo) to see which models have fine-tuning suppor

See [Integrations](../integrations.md) to see how to track fine-tuning metrics.

Once the fine-tune is launched, you can also [get the status of your fine-tune](../../api/python_client/#llmengine.fine_tuning.FineTune.get). You can also [list events that your fine-tune produces](../../api/python_client/#llmengine.fine_tuning.FineTune.get_events).
## Monitoring the fine-tune

Once the fine-tune is launched, you can also [get the status of your fine-tune](../../api/python_client/#llmengine.fine_tuning.FineTune.get).
You can also [list events that your fine-tune produces](../../api/python_client/#llmengine.fine_tuning.FineTune.get_events).
```python
from llmengine import FineTune

fine_tune_id = "ft-cabcdefghi1234567890"
fine_tune = FineTune.get(fine_tune_id)
print(fine_tune.status) # BatchJobStatus.RUNNING
print(fine_tune.fine_tuned_model) # "llama-2-7b.700101-000000

fine_tune_events = FineTune.get_events(fine_tune_id)
for event in fine_tune_events.events:
print(event)
# Prints something like:
# timestamp=1697590000.0 message="{'loss': 12.345, 'learning_rate': 0.0, 'epoch': 0.97}" level='info'
# timestamp=1697590000.0 message="{'eval_loss': 23.456, 'eval_runtime': 19.876, 'eval_samples_per_second': 4.9, 'eval_steps_per_second': 4.9, 'epoch': 0.97}" level='info'
# timestamp=1697590020.0 message="{'train_runtime': 421.234, 'train_samples_per_second': 2.042, 'train_steps_per_second': 0.042, 'total_flos': 123.45, 'train_loss': 34.567, 'epoch': 0.97}" level='info'


```

The status of your fine-tune will give a high-level overview of the fine-tune's progress.
The events of your fine-tune will give more detail, such as the training loss and validation loss at each epoch,
as well as any errors that may have occurred. If you encounter any errors with your fine-tune,
the events are a good place to start debugging. For example, if you see `Unable to read training or validation dataset`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a separate error for malformed files, e.g. files that don't have prompt,response headers?

Would it make sense to put these in a table?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: prompt,response header validation would happen earlier, at FineTune.create time.

you may need to make your files accessible to LLM Engine. If you see `Invalid value received for lora parameter 'lora_alpha'!`,
you should [check that your hyperparameters are valid](../../api/python_client/#llmengine.fine_tuning.FineTune.create).

## Making inference calls to your fine-tune

Expand Down
4 changes: 2 additions & 2 deletions docs/model_zoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ Scale hosts the following models in the LLM Engine Model Zoo:
| `mpt-7b` | ✅ | | deepspeed, text-generation-inference, vllm |
| `mpt-7b-instruct` | ✅ | ✅ | deepspeed, text-generation-inference, vllm |
| `flan-t5-xxl` | ✅ | | deepspeed, text-generation-inference |
| `mistral-7b` | ✅ | | vllm |
| `mistral-7b-instruct` | ✅ | | vllm |
| `mistral-7b` | ✅ | | vllm |
| `mistral-7b-instruct` | ✅ | | vllm |

## Usage

Expand Down