test: fix flaky ollama tests, remove stale xfails, add diagnostic logging by planetf1 · Pull Request #598 · generative-computing/mellea

planetf1 · 2026-03-06T10:12:30Z

Summary

Remove xfail from test_generate_from_raw_with_format — consistently passing; xfail was masking real failures
Remove xfail from test_multiple_async_funcs (watsonx/litellm bug resolved)
Add CONTEXT_WINDOW: 2048 to both generate_from_raw tests to reduce memory pressure on Ollama
Strengthen assertions in both tests (assert all(r.value for r in results) with diagnostic message)
Add pytest.mark.timeout(150) to test_generate_from_raw to bound worst-case flake
Validate all 4 results (not just index 0) in test_generate_from_raw_with_format
Increase MAX_NEW_TOKENS from 2**8 to 2**10 in format tests (ollama/openai-ollama)
Add FancyLogger.warning diagnostic when generate_from_raw catches an exception
Mark researcher.py example as slow; add markers to query_clarification.py
Update slow marker description to ">1 minute"

Notes

There is one remaining known issue with Ollama under sustained load: empty-body responses that are not exceptions and therefore not caught by the new logging. A separate issue will track that investigation.

Test plan

uv run pytest test/backends/test_ollama.py passes without xfail noise
uv run pytest test/backends/test_litellm_watsonx.py passes (or fails for a real reason)
Full test/ suite shows no regressions

…puting#565)  # Misc PR ## Type of PR - [ ] Bug Fix - [ ] New Feature - [ ] Documentation - [x] Other ## Description - [x] Link to Issue: Fixes generative-computing#565  Removed `--cov-report=term` from the `[tool.pytest.ini_options]` configuration in `pyproject.toml` to prevent test runs from dumping large code coverage tables to the terminal. Test coverage is still generated and output to files `htmlcov/` and `coverage.json`. ### Testing - [ ] Tests added to the respective file if code was changed - [ ] New code has 100% coverage if code as added - [ ] Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Fixes generative-computing#562

Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.

Pop exception from chunks list (like we do for the None sentinel) so _process doesn't receive it. Guard chat_response access in ollama post_processing with .get() for when no valid chunks arrived. Signed-off-by: 0xCUB3 <skula@mit.edu>

Signed-off-by: 0xCUB3 <skula@mit.edu>

… key exists Signed-off-by: 0xCUB3 <skula@mit.edu>

Unit tests that verify exceptions in the async queue are cleanly propagated without reaching _process, and that _post_process still runs for telemetry cleanup.

…ging - Remove xfail from test_generate_from_raw_with_format (consistently passing) - Remove xfail from test_multiple_async_funcs (watsonx litellm bug resolved) - Add CONTEXT_WINDOW: 2048 and stronger assertions to generate_from_raw tests - Add pytest.mark.timeout(150) to test_generate_from_raw - Increase MAX_NEW_TOKENS to 2**10 in format tests - Add FancyLogger warning when generate_from_raw catches an exception - Mark researcher example as slow; add markers to query_clarification - Update slow marker description in pyproject.toml

mergify · 2026-03-06T10:13:05Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

planetf1 · 2026-03-06T10:15:41Z

Closing in favour of a clean branch rebased on upstream/main

planetf1 and others added 11 commits March 4, 2026 09:12

test: isolate astream_incremental tests from CI

e6bd942

Fixes generative-computing#562

test: add deterministic mock tests for astream incremental logic

69a5889

Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.

fix: safe _meta access in post_processing for all backends

16a34e9

Signed-off-by: 0xCUB3 <skula@mit.edu>

fix: skip Exception and None chunks in astream before _process

837a25a

Signed-off-by: 0xCUB3 <skula@mit.edu>

fix: revert hf_output to bracket access where isinstance guard proves…

e9d2644

… key exists Signed-off-by: 0xCUB3 <skula@mit.edu>

test: add regression tests for astream exception handling

00eb95b

Unit tests that verify exceptions in the async queue are cleanly propagated without reaching _process, and that _post_process still runs for telemetry cleanup.

test: clean up asyncio markers and inline import

feadd25

fix: issues with tests (alora example, rag intrinsics, mistral tool use)

3b59bba

planetf1 requested review from a team, jakelorocco and nrfulton as code owners March 6, 2026 10:12

planetf1 mentioned this pull request Mar 6, 2026

Ollama returns empty-body responses under sustained load in generate_from_raw #599

Open

5 tasks

planetf1 closed this Mar 6, 2026

planetf1 deleted the fix/test-flaky-ollama-remove-xfails branch March 6, 2026 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: fix flaky ollama tests, remove stale xfails, add diagnostic logging#598

test: fix flaky ollama tests, remove stale xfails, add diagnostic logging#598
planetf1 wants to merge 11 commits into
generative-computing:mainfrom
planetf1:fix/test-flaky-ollama-remove-xfails

planetf1 commented Mar 6, 2026

Uh oh!

mergify Bot commented Mar 6, 2026

Uh oh!

planetf1 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

planetf1 commented Mar 6, 2026

Summary

Notes

Test plan

Uh oh!

mergify Bot commented Mar 6, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

planetf1 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants