generative-computing · avinash2692 · Apr 30, 2026 · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
@@ -62,8 +62,7 @@ jobs:
         run: nohup ollama serve &
       - name: Pull models
         run: |
-          ollama pull granite4:micro
-          ollama pull granite4:micro-h
+          ollama pull granite4.1:3b
       - name: Run Tests
         id: tests
         run: uv run -m pytest -v --junit-xml=/tmp/pytest-results.xml test

@@ -373,8 +373,7 @@ models must be pulled locally before running the tests that need them.
 
 **CI (unit + integration tests):**
 
-- `granite4:micro` — default model for `start_session()` and most examples
-- `granite4:micro-h` — hybrid variant used by conftest fixtures
+- `granite4.1:3b` — default model for `start_session()` and most examples
 
 **Examples (`docs/examples/`):**
 
@@ -399,7 +398,7 @@ models must be pulled locally before running the tests that need them.
 Pull everything:
 
 ```bash
-for m in granite4:micro granite4:micro-h deepseek-r1:8b \
+for m in granite4.1:3b deepseek-r1:8b \
   granite3-guardian:2b granite3.2-vision granite3.3:8b granite4:latest \
   llama3.2 llama3.2:3b \
   qwen2.5vl:7b granite4:small-h llama3.2:1b llama3:8b llava mistral:7b \

@@ -85,9 +85,9 @@ def {{ intrinsic_name }}({{ arglist }}, ctx: Context, backend: Backend | Adapter
 
 if __name__ == "__main__":
     from mellea.backends.huggingface import LocalHFBackend
-    from mellea.backends.model_ids import IBM_GRANITE_4_MICRO_3B
+    from mellea.backends.model_ids import IBM_GRANITE_4_1_3B
     from mellea.stdlib.context import ChatContext
-    backend = LocalHFBackend(IBM_GRANITE_4_MICRO_3B)
+    backend = LocalHFBackend(IBM_GRANITE_4_1_3B)
     result, ctx = {{ intrinsic_name }}({{ example_call_kwargs }}, ctx=ChatContext(), backend=backend)
     print(result.value)
 ```
@@ -150,7 +150,7 @@ def create_session(
         else:
             model_id = model
     else:
-        model_id = mellea.model_ids.IBM_GRANITE_4_MICRO_3B
+        model_id = mellea.model_ids.IBM_GRANITE_4_1_3B
 
     try:
         backend_lower = backend.lower()

@@ -37,7 +37,7 @@ Use the `m alora train` command to fine-tune a LoRA or aLoRA adapter requirement
 
 ```bash
 m alora train path/to/data.jsonl \
-  --basemodel ibm-granite/granite-4.0-micro \
+  --basemodel ibm-granite/granite-4.1-3b \
   --outfile ./checkpoints/alora_adapter \
   --adapter alora \
   --device auto \
@@ -48,7 +48,7 @@ m alora train path/to/data.jsonl \
   --grad-accum 4
 ```
 
-> **Note on Model Selection**: Only non-hybrid models (e.g., `granite-4.0-micro`) are 
+> **Note on Model Selection**: Only non-hybrid models (e.g., `granite-4.1-3b`) are 
 > currently supported for LoRA or aLoRA training.
 > Mamba/Transformers hybrid models like `granite-4.0-h-micro` will produce low-quality 
 > results with Mellea's current hard-coded settings for parameter-efficient fine tuning.

@@ -129,7 +129,7 @@ from mellea.backends.ollama import OllamaModelBackend
 from mellea.stdlib.context import SimpleContext
 
 backend = OllamaModelBackend(
-    "granite4:micro",
+    "granite4.1:3b",
     model_options={"temperature": 0.2},
 )
 m = MelleaSession(backend, SimpleContext())

@@ -20,7 +20,7 @@ runtime exactly what shape the result must have.
 ## Prerequisites
 
 - [Quick Start](../getting-started/quickstart) complete
-- Ollama running locally with `granite4:micro` pulled
+- Ollama running locally with `granite4.1:3b` pulled
 
 ## The full example
 

@@ -128,4 +128,4 @@ uv run docs/examples/<folder>/<file>.py
 
 **Default backend:** `start_session()` with no arguments connects to a local
 [Ollama](https://ollama.ai) instance running **IBM Granite 4 Micro**
-(`granite4:micro`). Make sure Ollama is running before you execute any example.
+(`granite4.1:3b`). Make sure Ollama is running before you execute any example.
@@ -24,7 +24,7 @@ class or instance so you can pass it directly to session methods like `m.act()`,
 
 - [Quick Start](../getting-started/quickstart) complete
 - [MObjects and mify](../concepts/mobjects-and-mify) concept page (recommended background)
-- Ollama running locally with `granite4:micro` pulled
+- Ollama running locally with `granite4.1:3b` pulled
 
 ## The full example
 

@@ -23,7 +23,7 @@ the survivors to a grounded `m.instruct()` call.
 - [Quick Start](../getting-started/quickstart) complete
 - `faiss-cpu` and `sentence-transformers` installed, **or** run via `uv run`
   which installs them automatically from the inline script block
-- Ollama running locally with `granite4:micro` pulled (or a Mistral model — see
+- Ollama running locally with `granite4.1:3b` pulled (or a Mistral model — see
   the session setup section below)
 
 Install dependencies manually if you are not using `uv run`:

@@ -25,7 +25,7 @@ calls.
 ## Prerequisites
 
 - [Quick Start](../getting-started/quickstart) complete
-- Ollama running locally with `granite4:micro` pulled
+- Ollama running locally with `granite4.1:3b` pulled
 - (Optional) [Jaeger](https://www.jaegertracing.io/) running locally for span
   visualisation — see the Jaeger section below
 

@@ -57,5 +57,5 @@ The default session connects to [Ollama](https://ollama.ai) running locally.
 Install Ollama and pull the default model before running any examples:
 
 ```bash
-ollama pull granite4:micro
+ollama pull granite4.1:3b
 ```
@@ -10,7 +10,7 @@ description: "Run your first generative program in minutes."
 ## Hello world
 
 By default, `start_session()` connects to Ollama and uses **IBM Granite 4 Micro**
-(`granite4:micro`). Make sure Ollama is running before you run this:
+(`granite4.1:3b`). Make sure Ollama is running before you run this:
 
 ```python
 import mellea
@@ -191,7 +191,7 @@ HuggingFace, and WatsonX are also supported. See
 
 ## Troubleshooting
 
-**`granite4:micro` not found** — run `ollama pull granite4:micro` before starting.
+**`granite4.1:3b` not found** — run `ollama pull granite4.1:3b` before starting.
 
 **Python 3.13 `outlines` install failure** — `outlines` requires a Rust compiler.
 Either install [Rust](https://www.rust-lang.org/tools/install) or pin Python to 3.12.

@@ -140,7 +140,7 @@ Or a section-level callout if multiple blocks share the caveat:
 All code — fenced blocks AND inline backtick references — must match current source:
 
 - Import paths, class names, method names exact.
-- Model IDs current (e.g., `ibm-granite/granite-4.0-micro`).
+- Model IDs current (e.g., `ibm-granite/granite-4.1-3b`).
 - Inline prose fragments consistent with adjacent code blocks.
 
 If the source itself has inconsistencies, document as-is and note in the glossary.

@@ -13,7 +13,7 @@ configure the backend when you create a session.
 
 ## Default backend
 
-`start_session()` defaults to **Ollama** with **IBM Granite 4 Micro** (`granite4:micro`).
+`start_session()` defaults to **Ollama** with **IBM Granite 4 Micro** (`granite4.1:3b`).
 No API keys needed — just have Ollama running:
 
 ```python
@@ -142,7 +142,7 @@ Run models locally using HuggingFace transformers:
 from mellea import MelleaSession
 from mellea.backends.huggingface import LocalHFBackend
 
-backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
+backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
 m = MelleaSession(backend=backend)
 ```
 

@@ -262,7 +262,7 @@ from mellea.backends import model_ids
 from mellea.stdlib.sampling import RejectionSamplingStrategy
 
 def instruct_with_fallback(text: str) -> str:
-    m_fast = MelleaSession(OllamaModelBackend(model_ids.IBM_GRANITE_4_MICRO_3B))
+    m_fast = MelleaSession(OllamaModelBackend(model_ids.IBM_GRANITE_4_1_3B))
     result = m_fast.instruct(
         text,
         strategy=RejectionSamplingStrategy(loop_budget=3),

@@ -25,7 +25,7 @@ m decompose run --input-file task.txt --out-dir ./output/
 > **Note:** The output directory must already exist — the command will error if it
 > does not. On first run with Ollama, the default model will be downloaded
 > automatically (~15 GB for the full model). Use `--model-id` with a smaller model
-> (e.g. `granite4:micro`) to avoid the large download.
+> (e.g. `granite4.1:3b`) to avoid the large download.
 
 This produces a subdirectory under `./output/` (one per task job):
 
@@ -59,7 +59,7 @@ python output/m_decomp_result/m_decomp_result.py
 
 ## Backend options
 
-`m decompose` defaults to Ollama with `granite4:micro`. Pass `--backend` and
+`m decompose` defaults to Ollama with `granite4.1:3b`. Pass `--backend` and
 `--model-id` to use a different inference engine:
 
 ```bash
@@ -86,7 +86,7 @@ from cli.decompose.pipeline import DecompBackend, decompose
 
 result = decompose(
     task_prompt="Write a short blog post about morning exercise.",
-    model_id="granite4:micro",
+    model_id="granite4.1:3b",
     backend=DecompBackend.ollama,
 )
 

@@ -45,7 +45,7 @@ import pytest
 from mellea import MelleaSession
 from mellea.backends.ollama import OllamaModelBackend
 
-_MODEL_ID = "granite4:micro"
+_MODEL_ID = "granite4.1:3b"
 
 
 @pytest.fixture(scope="module")
@@ -358,8 +358,8 @@ from mellea.stdlib.components.unit_test_eval import TestBasedEval
 
 test_evals = TestBasedEval.from_json_file("tests/eval_data/email_writer.json")
 
-judge_session = start_session(backend_name="ollama", model_id="granite4:micro")
-generation_session = start_session(backend_name="ollama", model_id="granite4:micro")
+judge_session = start_session(backend_name="ollama", model_id="granite4.1:3b")
+generation_session = start_session(backend_name="ollama", model_id="granite4.1:3b")
 
 for eval_case in test_evals:
     for idx, input_text in enumerate(eval_case.inputs):
@@ -380,7 +380,7 @@ for eval_case in test_evals:
 > **Note:** `TestBasedEval` calls the judge model once per input. For large
 > evaluation sets, consider batching or running evaluations asynchronously.
 > **CLI alternative:** The same evaluation can be run without writing Python:
-> `m eval run tests/eval_data/email_writer.json --backend ollama --model granite4:micro`
+> `m eval run tests/eval_data/email_writer.json --backend ollama --model granite4.1:3b`
 > See `m eval run --help` for full options.
 
 ## CI strategy

@@ -10,7 +10,7 @@ Mellea supports multimodal input: pass images alongside your text prompt to any
 **Prerequisites:** `pip install mellea pillow`, a vision-capable model downloaded and
 running.
 
-> **Backend note:** The default Ollama model (`granite4:micro`) does not support image
+> **Backend note:** The default Ollama model (`granite4.1:3b`) does not support image
 > input. You must switch to a vision-capable model such as `granite3.2-vision` or
 > `llava`. Not all backends support vision — see backend notes below.
 

@@ -53,7 +53,7 @@ instance, so any tool that follows the LangChain `BaseTool` interface works with
 further configuration.
 
 > **Backend note:** Tool calling requires a backend and model that support function
-> calling (e.g., Ollama with `granite4:micro`, OpenAI with `gpt-4o`). The default
+> calling (e.g., Ollama with `granite4.1:3b`, OpenAI with `gpt-4o`). The default
 > Ollama setup supports this.
 
 ## Seeding a session with LangChain message history

@@ -34,7 +34,7 @@ background service.
 ## Default setup
 
 `start_session()` connects to Ollama on `localhost:11434` and uses
-**IBM Granite 4 Micro** (`granite4:micro`) by default. On first run, Mellea
+**IBM Granite 4 Micro** (`granite4.1:3b`) by default. On first run, Mellea
 automatically pulls the model if it is not already downloaded:
 
 ```python
@@ -47,7 +47,7 @@ print(str(email))
 # Output will vary — LLM responses depend on model and temperature.
 ```
 
-> **Note:** The first run pulls `granite4:micro` (~2 GB). Subsequent runs start
+> **Note:** The first run pulls `granite4.1:3b` (~2 GB). Subsequent runs start
 > immediately from the local cache.
 
 ## Switching models
@@ -75,7 +75,7 @@ m = start_session(model_id=model_ids.IBM_GRANITE_3_3_8B)
 Pull models before using them (or let Mellea pull on first use):
 
 ```bash
-ollama pull granite4:micro
+ollama pull granite4.1:3b
 ollama pull llama3.2:3b
 ollama pull mistral:7b
 ```
@@ -84,8 +84,8 @@ ollama pull mistral:7b
 
 | `model_ids` constant | Ollama name | Notes |
 | -------------------- | ----------- | ----- |
-| `IBM_GRANITE_4_MICRO_3B` | `granite4:micro` | Default. Fast, low memory (~2 GB). |
-| `IBM_GRANITE_4_HYBRID_MICRO` | `granite4:micro-h` | Hybrid variant with extended thinking. |
+| `IBM_GRANITE_4_1_3B` | `granite4.1:3b` | Default. Fast, low memory (~2 GB). |
+| `IBM_GRANITE_4_1_8B` | `granite4.1:8b` | Higher quality, ~5 GB. |
 | `IBM_GRANITE_3_3_8B` | `granite3.3:8b` | Higher quality, ~5 GB. |
 | `IBM_GRANITE_3_3_VISION_2B` | `ibm/granite3.3-vision:2b` | Vision model for image inputs. |
 | `META_LLAMA_3_2_3B` | `llama3.2:3b` | Compact Llama model. |
@@ -131,7 +131,7 @@ from mellea.backends.ollama import OllamaModelBackend
 
 m = MelleaSession(
     OllamaModelBackend(
-        model_id="granite4:micro",
+        model_id="granite4.1:3b",
         base_url="http://my-gpu-server:11434",
     )
 )
@@ -152,7 +152,7 @@ from mellea.backends.ollama import OllamaModelBackend
 
 m = MelleaSession(
     OllamaModelBackend(
-        model_id=model_ids.IBM_GRANITE_4_MICRO_3B,
+        model_id=model_ids.IBM_GRANITE_4_1_3B,
         model_options={
             ModelOption.TEMPERATURE: 0.1,
             ModelOption.SEED: 42,
@@ -193,7 +193,7 @@ print(str(response))
 ```
 
 > **Backend note:** Vision requires a model that supports image inputs. The default
-> `granite4:micro` is text-only. Pull a vision model explicitly before using images:
+> `granite4.1:3b` is text-only. Pull a vision model explicitly before using images:
 > `ollama pull ibm/granite3.3-vision:2b`.
 
 ## Ollama's OpenAI-compatible endpoint
@@ -236,7 +236,7 @@ let Mellea pull it automatically on first use.
 
 Ollama loads the model into memory on the first request. Subsequent requests in the
 same session are much faster. On machines with less than 8 GB RAM, consider using
-`granite4:micro` or `llama3.2:1b`.
+`granite4.1:3b` or `llama3.2:1b`.
 
 ### Intel Mac torch errors
 

@@ -46,7 +46,7 @@ if result.tool_calls:
 description and parameter types are preserved exactly.
 
 > **Backend note:** Tool calling requires a backend and model that support function
-> calling (e.g., Ollama with `granite4:micro`, OpenAI with `gpt-4o`). The default
+> calling (e.g., Ollama with `granite4.1:3b`, OpenAI with `gpt-4o`). The default
 > Ollama setup supports this.
 >
 > **Full example:** [`docs/examples/tools/smolagents_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/tools/smolagents_example.py)

@@ -90,7 +90,7 @@ With structured JSON output enabled, the same `SUCCESS` record looks like:
   "thread_id": 6179762176,
   "session_id": "550e8400-e29b-41d4-a716-446655440000",
   "backend": "OllamaModelBackend",
-  "model_id": "granite4:micro",
+  "model_id": "granite4.1:3b",
   "strategy": "RejectionSamplingStrategy",
   "loop_budget": 3
 }

@@ -459,7 +459,7 @@ from mellea.telemetry import create_counter, create_histogram, create_up_down_co
 
 # Monotonically increasing values
 requests = create_counter("myapp.requests", unit="1", description="Total requests")
-requests.add(1, {"backend": "ollama", "model": "granite4:micro"})
+requests.add(1, {"backend": "ollama", "model": "granite4.1:3b"})
 
 # Value distributions
 latency = create_histogram("myapp.latency", unit="ms", description="Request latency")

@@ -158,7 +158,7 @@ session_context           (mellea.application)
 │   │                     [mellea.backend=OllamaModelBackend]
 │   ├── chat              (mellea.backend)
 │   │                     [gen_ai.system=ollama]
-│   │                     [gen_ai.request.model=granite4:micro]
+│   │                     [gen_ai.request.model=granite4.1:3b]
 │   │                     [gen_ai.usage.input_tokens=150]
 │   │                     [gen_ai.usage.output_tokens=42]
 │   └── requirement_validation  (mellea.application)

@@ -6,16 +6,16 @@ description: "Common errors, diagnostic steps, and fixes for Mellea programs."
 
 ## Installation
 
-### `granite4:micro` not found
+### `granite4.1:3b` not found
 
 ```text
-Error: model "granite4:micro" not found
+Error: model "granite4.1:3b" not found
 ```
 
 Pull the model before running:
 
 ```bash
-ollama pull granite4:micro
+ollama pull granite4.1:3b
 ```
 
 ### Python 3.13: `outlines` install failure

@@ -38,7 +38,7 @@ m = MelleaSession(
 )
 ```
 
-## How do I use a model other than `granite4:micro`?
+## How do I use a model other than `granite4.1:3b`?
 
 Pass the `model_id` parameter to `start_session()`:
-Original file line number
+Diff line change
@@ Expand Up / @@ -38,7 +38,7 @@ m = MelleaSession( @@
     )
     ```
-    ## How do I use a model other than `granite4:micro`?
+    ## How do I use a model other than `granite4.1:3b`?
     Pass the `model_id` parameter to `start_session()`:
@@ Expand Down @@