Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/quality.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,7 @@ jobs:
run: nohup ollama serve &
- name: Pull models
run: |
ollama pull granite4:micro
ollama pull granite4:micro-h
ollama pull granite4.1:3b
- name: Run Tests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

granite4:micro-h was dropped from the pull list. Confirm no ollama-marked integration tests still use IBM_GRANITE_4_HYBRID_MICRO, or they'll cold-start/fail in CI.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think I confirmed that in a nightly run.

id: tests
run: uv run -m pytest -v --junit-xml=/tmp/pytest-results.xml test
Expand Down
5 changes: 2 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,8 +373,7 @@ models must be pulled locally before running the tests that need them.

**CI (unit + integration tests):**

- `granite4:micro` — default model for `start_session()` and most examples
- `granite4:micro-h` — hybrid variant used by conftest fixtures
- `granite4.1:3b` — default model for `start_session()` and most examples

**Examples (`docs/examples/`):**

Expand All @@ -399,7 +398,7 @@ models must be pulled locally before running the tests that need them.
Pull everything:

```bash
for m in granite4:micro granite4:micro-h deepseek-r1:8b \
for m in granite4.1:3b deepseek-r1:8b \
granite3-guardian:2b granite3.2-vision granite3.3:8b granite4:latest \
llama3.2 llama3.2:3b \
qwen2.5vl:7b granite4:small-h llama3.2:1b llama3:8b llava mistral:7b \
Expand Down
4 changes: 2 additions & 2 deletions cli/alora/README_TEMPLATE.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ def {{ intrinsic_name }}({{ arglist }}, ctx: Context, backend: Backend | Adapter

if __name__ == "__main__":
from mellea.backends.huggingface import LocalHFBackend
from mellea.backends.model_ids import IBM_GRANITE_4_MICRO_3B
from mellea.backends.model_ids import IBM_GRANITE_4_1_3B
from mellea.stdlib.context import ChatContext
backend = LocalHFBackend(IBM_GRANITE_4_MICRO_3B)
backend = LocalHFBackend(IBM_GRANITE_4_1_3B)
result, ctx = {{ intrinsic_name }}({{ example_call_kwargs }}, ctx=ChatContext(), backend=backend)
print(result.value)
```
2 changes: 1 addition & 1 deletion cli/eval/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def create_session(
else:
model_id = model
else:
model_id = mellea.model_ids.IBM_GRANITE_4_MICRO_3B
model_id = mellea.model_ids.IBM_GRANITE_4_1_3B

try:
backend_lower = backend.lower()
Expand Down
4 changes: 2 additions & 2 deletions docs/alora.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Use the `m alora train` command to fine-tune a LoRA or aLoRA adapter requirement

```bash
m alora train path/to/data.jsonl \
--basemodel ibm-granite/granite-4.0-micro \
--basemodel ibm-granite/granite-4.1-3b \
--outfile ./checkpoints/alora_adapter \
--adapter alora \
--device auto \
Expand All @@ -48,7 +48,7 @@ m alora train path/to/data.jsonl \
--grad-accum 4
```

> **Note on Model Selection**: Only non-hybrid models (e.g., `granite-4.0-micro`) are
> **Note on Model Selection**: Only non-hybrid models (e.g., `granite-4.1-3b`) are
> currently supported for LoRA or aLoRA training.
> Mamba/Transformers hybrid models like `granite-4.0-h-micro` will produce low-quality
> results with Mellea's current hard-coded settings for parameter-efficient fine tuning.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/concepts/context-and-sessions.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.context import SimpleContext

backend = OllamaModelBackend(
"granite4:micro",
"granite4.1:3b",
model_options={"temperature": 0.2},
)
m = MelleaSession(backend, SimpleContext())
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/examples/data-extraction-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ runtime exactly what shape the result must have.
## Prerequisites

- [Quick Start](../getting-started/quickstart) complete
- Ollama running locally with `granite4:micro` pulled
- Ollama running locally with `granite4.1:3b` pulled

## The full example

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,4 @@ uv run docs/examples/<folder>/<file>.py

**Default backend:** `start_session()` with no arguments connects to a local
[Ollama](https://ollama.ai) instance running **IBM Granite 4 Micro**
(`granite4:micro`). Make sure Ollama is running before you execute any example.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same branding mismatch.

(`granite4.1:3b`). Make sure Ollama is running before you execute any example.
2 changes: 1 addition & 1 deletion docs/docs/examples/legacy-code-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class or instance so you can pass it directly to session methods like `m.act()`,

- [Quick Start](../getting-started/quickstart) complete
- [MObjects and mify](../concepts/mobjects-and-mify) concept page (recommended background)
- Ollama running locally with `granite4:micro` pulled
- Ollama running locally with `granite4.1:3b` pulled

## The full example

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/examples/resilient-rag-fallback.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ the survivors to a grounded `m.instruct()` call.
- [Quick Start](../getting-started/quickstart) complete
- `faiss-cpu` and `sentence-transformers` installed, **or** run via `uv run`
which installs them automatically from the inline script block
- Ollama running locally with `granite4:micro` pulled (or a Mistral model — see
- Ollama running locally with `granite4.1:3b` pulled (or a Mistral model — see
the session setup section below)

Install dependencies manually if you are not using `uv run`:
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/examples/traced-generation-loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ calls.
## Prerequisites

- [Quick Start](../getting-started/quickstart) complete
- Ollama running locally with `granite4:micro` pulled
- Ollama running locally with `granite4.1:3b` pulled
- (Optional) [Jaeger](https://www.jaegertracing.io/) running locally for span
visualisation — see the Jaeger section below

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,5 +57,5 @@ The default session connects to [Ollama](https://ollama.ai) running locally.
Install Ollama and pull the default model before running any examples:

```bash
ollama pull granite4:micro
ollama pull granite4.1:3b
```
4 changes: 2 additions & 2 deletions docs/docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ description: "Run your first generative program in minutes."
## Hello world

By default, `start_session()` connects to Ollama and uses **IBM Granite 4 Micro**
(`granite4:micro`). Make sure Ollama is running before you run this:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branding mismatch. granite4.1:3b is Granite 4.1, not "Granite 4 Micro" — that name belonged to 4.0-micro. This line and line 85 should be updated, e.g. to IBM Granite 4.1 3B.

(`granite4.1:3b`). Make sure Ollama is running before you run this:

```python
import mellea
Expand Down Expand Up @@ -191,7 +191,7 @@ HuggingFace, and WatsonX are also supported. See

## Troubleshooting

**`granite4:micro` not found** — run `ollama pull granite4:micro` before starting.
**`granite4.1:3b` not found** — run `ollama pull granite4.1:3b` before starting.

**Python 3.13 `outlines` install failure** — `outlines` requires a Rust compiler.
Either install [Rust](https://www.rust-lang.org/tools/install) or pin Python to 3.12.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/guide/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ Or a section-level callout if multiple blocks share the caveat:
All code — fenced blocks AND inline backtick references — must match current source:

- Import paths, class names, method names exact.
- Model IDs current (e.g., `ibm-granite/granite-4.0-micro`).
- Model IDs current (e.g., `ibm-granite/granite-4.1-3b`).
- Inline prose fragments consistent with adjacent code blocks.

If the source itself has inconsistencies, document as-is and note in the glossary.
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how-to/backends-and-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ configure the backend when you create a session.

## Default backend

`start_session()` defaults to **Ollama** with **IBM Granite 4 Micro** (`granite4:micro`).
`start_session()` defaults to **Ollama** with **IBM Granite 4 Micro** (`granite4.1:3b`).
No API keys needed — just have Ollama running:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same branding mismatch: points to granite4.1:3b but still says "IBM Granite 4 Micro".


```python
Expand Down Expand Up @@ -142,7 +142,7 @@ Run models locally using HuggingFace transformers:
from mellea import MelleaSession
from mellea.backends.huggingface import LocalHFBackend

backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
m = MelleaSession(backend=backend)
```

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how-to/handling-exceptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ from mellea.backends import model_ids
from mellea.stdlib.sampling import RejectionSamplingStrategy

def instruct_with_fallback(text: str) -> str:
m_fast = MelleaSession(OllamaModelBackend(model_ids.IBM_GRANITE_4_MICRO_3B))
m_fast = MelleaSession(OllamaModelBackend(model_ids.IBM_GRANITE_4_1_3B))
result = m_fast.instruct(
text,
strategy=RejectionSamplingStrategy(loop_budget=3),
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/how-to/m-decompose.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ m decompose run --input-file task.txt --out-dir ./output/
> **Note:** The output directory must already exist — the command will error if it
> does not. On first run with Ollama, the default model will be downloaded
> automatically (~15 GB for the full model). Use `--model-id` with a smaller model
> (e.g. `granite4:micro`) to avoid the large download.
> (e.g. `granite4.1:3b`) to avoid the large download.

This produces a subdirectory under `./output/` (one per task job):

Expand Down Expand Up @@ -59,7 +59,7 @@ python output/m_decomp_result/m_decomp_result.py

## Backend options

`m decompose` defaults to Ollama with `granite4:micro`. Pass `--backend` and
`m decompose` defaults to Ollama with `granite4.1:3b`. Pass `--backend` and
`--model-id` to use a different inference engine:

```bash
Expand All @@ -86,7 +86,7 @@ from cli.decompose.pipeline import DecompBackend, decompose

result = decompose(
task_prompt="Write a short blog post about morning exercise.",
model_id="granite4:micro",
model_id="granite4.1:3b",
backend=DecompBackend.ollama,
)

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/how-to/unit-test-generative-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ import pytest
from mellea import MelleaSession
from mellea.backends.ollama import OllamaModelBackend

_MODEL_ID = "granite4:micro"
_MODEL_ID = "granite4.1:3b"


@pytest.fixture(scope="module")
Expand Down Expand Up @@ -358,8 +358,8 @@ from mellea.stdlib.components.unit_test_eval import TestBasedEval

test_evals = TestBasedEval.from_json_file("tests/eval_data/email_writer.json")

judge_session = start_session(backend_name="ollama", model_id="granite4:micro")
generation_session = start_session(backend_name="ollama", model_id="granite4:micro")
judge_session = start_session(backend_name="ollama", model_id="granite4.1:3b")
generation_session = start_session(backend_name="ollama", model_id="granite4.1:3b")

for eval_case in test_evals:
for idx, input_text in enumerate(eval_case.inputs):
Expand All @@ -380,7 +380,7 @@ for eval_case in test_evals:
> **Note:** `TestBasedEval` calls the judge model once per input. For large
> evaluation sets, consider batching or running evaluations asynchronously.
> **CLI alternative:** The same evaluation can be run without writing Python:
> `m eval run tests/eval_data/email_writer.json --backend ollama --model granite4:micro`
> `m eval run tests/eval_data/email_writer.json --backend ollama --model granite4.1:3b`
> See `m eval run --help` for full options.

## CI strategy
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how-to/use-images-and-vision.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Mellea supports multimodal input: pass images alongside your text prompt to any
**Prerequisites:** `pip install mellea pillow`, a vision-capable model downloaded and
running.

> **Backend note:** The default Ollama model (`granite4:micro`) does not support image
> **Backend note:** The default Ollama model (`granite4.1:3b`) does not support image
> input. You must switch to a vision-capable model such as `granite3.2-vision` or
> `llava`. Not all backends support vision — see backend notes below.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/integrations/langchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ instance, so any tool that follows the LangChain `BaseTool` interface works with
further configuration.

> **Backend note:** Tool calling requires a backend and model that support function
> calling (e.g., Ollama with `granite4:micro`, OpenAI with `gpt-4o`). The default
> calling (e.g., Ollama with `granite4.1:3b`, OpenAI with `gpt-4o`). The default
> Ollama setup supports this.

## Seeding a session with LangChain message history
Expand Down
18 changes: 9 additions & 9 deletions docs/docs/integrations/ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ background service.
## Default setup

`start_session()` connects to Ollama on `localhost:11434` and uses
**IBM Granite 4 Micro** (`granite4:micro`) by default. On first run, Mellea
**IBM Granite 4 Micro** (`granite4.1:3b`) by default. On first run, Mellea
automatically pulls the model if it is not already downloaded:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same branding mismatch.


```python
Expand All @@ -47,7 +47,7 @@ print(str(email))
# Output will vary — LLM responses depend on model and temperature.
```

> **Note:** The first run pulls `granite4:micro` (~2 GB). Subsequent runs start
> **Note:** The first run pulls `granite4.1:3b` (~2 GB). Subsequent runs start
> immediately from the local cache.

## Switching models
Expand Down Expand Up @@ -75,7 +75,7 @@ m = start_session(model_id=model_ids.IBM_GRANITE_3_3_8B)
Pull models before using them (or let Mellea pull on first use):

```bash
ollama pull granite4:micro
ollama pull granite4.1:3b
ollama pull llama3.2:3b
ollama pull mistral:7b
```
Expand All @@ -84,8 +84,8 @@ ollama pull mistral:7b

| `model_ids` constant | Ollama name | Notes |
| -------------------- | ----------- | ----- |
| `IBM_GRANITE_4_MICRO_3B` | `granite4:micro` | Default. Fast, low memory (~2 GB). |
| `IBM_GRANITE_4_HYBRID_MICRO` | `granite4:micro-h` | Hybrid variant with extended thinking. |
| `IBM_GRANITE_4_1_3B` | `granite4.1:3b` | Default. Fast, low memory (~2 GB). |
| `IBM_GRANITE_4_1_8B` | `granite4.1:8b` | Higher quality, ~5 GB. |
| `IBM_GRANITE_3_3_8B` | `granite3.3:8b` | Higher quality, ~5 GB. |
| `IBM_GRANITE_3_3_VISION_2B` | `ibm/granite3.3-vision:2b` | Vision model for image inputs. |
| `META_LLAMA_3_2_3B` | `llama3.2:3b` | Compact Llama model. |
Expand Down Expand Up @@ -131,7 +131,7 @@ from mellea.backends.ollama import OllamaModelBackend

m = MelleaSession(
OllamaModelBackend(
model_id="granite4:micro",
model_id="granite4.1:3b",
base_url="http://my-gpu-server:11434",
)
)
Expand All @@ -152,7 +152,7 @@ from mellea.backends.ollama import OllamaModelBackend

m = MelleaSession(
OllamaModelBackend(
model_id=model_ids.IBM_GRANITE_4_MICRO_3B,
model_id=model_ids.IBM_GRANITE_4_1_3B,
model_options={
ModelOption.TEMPERATURE: 0.1,
ModelOption.SEED: 42,
Expand Down Expand Up @@ -193,7 +193,7 @@ print(str(response))
```

> **Backend note:** Vision requires a model that supports image inputs. The default
> `granite4:micro` is text-only. Pull a vision model explicitly before using images:
> `granite4.1:3b` is text-only. Pull a vision model explicitly before using images:
> `ollama pull ibm/granite3.3-vision:2b`.

## Ollama's OpenAI-compatible endpoint
Expand Down Expand Up @@ -236,7 +236,7 @@ let Mellea pull it automatically on first use.

Ollama loads the model into memory on the first request. Subsequent requests in the
same session are much faster. On machines with less than 8 GB RAM, consider using
`granite4:micro` or `llama3.2:1b`.
`granite4.1:3b` or `llama3.2:1b`.

### Intel Mac torch errors

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/integrations/smolagents.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if result.tool_calls:
description and parameter types are preserved exactly.

> **Backend note:** Tool calling requires a backend and model that support function
> calling (e.g., Ollama with `granite4:micro`, OpenAI with `gpt-4o`). The default
> calling (e.g., Ollama with `granite4.1:3b`, OpenAI with `gpt-4o`). The default
> Ollama setup supports this.
>
> **Full example:** [`docs/examples/tools/smolagents_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/tools/smolagents_example.py)
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/observability/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ With structured JSON output enabled, the same `SUCCESS` record looks like:
"thread_id": 6179762176,
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"backend": "OllamaModelBackend",
"model_id": "granite4:micro",
"model_id": "granite4.1:3b",
"strategy": "RejectionSamplingStrategy",
"loop_budget": 3
}
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/observability/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ from mellea.telemetry import create_counter, create_histogram, create_up_down_co

# Monotonically increasing values
requests = create_counter("myapp.requests", unit="1", description="Total requests")
requests.add(1, {"backend": "ollama", "model": "granite4:micro"})
requests.add(1, {"backend": "ollama", "model": "granite4.1:3b"})

# Value distributions
latency = create_histogram("myapp.latency", unit="ms", description="Request latency")
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/observability/tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ session_context (mellea.application)
│ │ [mellea.backend=OllamaModelBackend]
│ ├── chat (mellea.backend)
│ │ [gen_ai.system=ollama]
│ │ [gen_ai.request.model=granite4:micro]
│ │ [gen_ai.request.model=granite4.1:3b]
│ │ [gen_ai.usage.input_tokens=150]
│ │ [gen_ai.usage.output_tokens=42]
│ └── requirement_validation (mellea.application)
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/troubleshooting/common-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ description: "Common errors, diagnostic steps, and fixes for Mellea programs."

## Installation

### `granite4:micro` not found
### `granite4.1:3b` not found

```text
Error: model "granite4:micro" not found
Error: model "granite4.1:3b" not found
```

Pull the model before running:

```bash
ollama pull granite4:micro
ollama pull granite4.1:3b
```

### Python 3.13: `outlines` install failure
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/troubleshooting/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ m = MelleaSession(
)
```

## How do I use a model other than `granite4:micro`?
## How do I use a model other than `granite4.1:3b`?

Pass the `model_id` parameter to `start_session()`:

Expand Down
Loading
Loading