Skip to content

test: fix flaky ollama tests, remove stale xfails, add diagnostic logging#598

Closed
planetf1 wants to merge 11 commits into
generative-computing:mainfrom
planetf1:fix/test-flaky-ollama-remove-xfails
Closed

test: fix flaky ollama tests, remove stale xfails, add diagnostic logging#598
planetf1 wants to merge 11 commits into
generative-computing:mainfrom
planetf1:fix/test-flaky-ollama-remove-xfails

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Mar 6, 2026

Summary

  • Remove xfail from test_generate_from_raw_with_format — consistently passing; xfail was masking real failures
  • Remove xfail from test_multiple_async_funcs (watsonx/litellm bug resolved)
  • Add CONTEXT_WINDOW: 2048 to both generate_from_raw tests to reduce memory pressure on Ollama
  • Strengthen assertions in both tests (assert all(r.value for r in results) with diagnostic message)
  • Add pytest.mark.timeout(150) to test_generate_from_raw to bound worst-case flake
  • Validate all 4 results (not just index 0) in test_generate_from_raw_with_format
  • Increase MAX_NEW_TOKENS from 2**8 to 2**10 in format tests (ollama/openai-ollama)
  • Add FancyLogger.warning diagnostic when generate_from_raw catches an exception
  • Mark researcher.py example as slow; add markers to query_clarification.py
  • Update slow marker description to ">1 minute"

Notes

There is one remaining known issue with Ollama under sustained load: empty-body responses that are not exceptions and therefore not caught by the new logging. A separate issue will track that investigation.

Test plan

  • uv run pytest test/backends/test_ollama.py passes without xfail noise
  • uv run pytest test/backends/test_litellm_watsonx.py passes (or fails for a real reason)
  • Full test/ suite shows no regressions

planetf1 and others added 11 commits March 4, 2026 09:12
…puting#565)

<!-- mellea-pr-edited-marker: do not remove this marker -->
# Misc PR

## Type of PR

- [ ] Bug Fix
- [ ] New Feature
- [ ] Documentation
- [x] Other

## Description
- [x] Link to Issue: Fixes generative-computing#565

<!-- Brief description of the change being made along with an explanation. -->
Removed `--cov-report=term` from the `[tool.pytest.ini_options]` configuration in `pyproject.toml` to prevent test runs from dumping large code coverage tables to the terminal. Test coverage is still generated and output to files `htmlcov/` and `coverage.json`.

### Testing
- [ ] Tests added to the respective file if code was changed
- [ ] New code has 100% coverage if code as added
- [ ] Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)
Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.
Pop exception from chunks list (like we do for the None sentinel) so
_process doesn't receive it. Guard chat_response access in ollama
post_processing with .get() for when no valid chunks arrived.

Signed-off-by: 0xCUB3 <skula@mit.edu>
… key exists

Signed-off-by: 0xCUB3 <skula@mit.edu>
Unit tests that verify exceptions in the async queue are cleanly
propagated without reaching _process, and that _post_process still
runs for telemetry cleanup.
…ging

- Remove xfail from test_generate_from_raw_with_format (consistently passing)
- Remove xfail from test_multiple_async_funcs (watsonx litellm bug resolved)
- Add CONTEXT_WINDOW: 2048 and stronger assertions to generate_from_raw tests
- Add pytest.mark.timeout(150) to test_generate_from_raw
- Increase MAX_NEW_TOKENS to 2**10 in format tests
- Add FancyLogger warning when generate_from_raw catches an exception
- Mark researcher example as slow; add markers to query_clarification
- Update slow marker description in pyproject.toml
@planetf1 planetf1 requested review from a team, jakelorocco and nrfulton as code owners March 6, 2026 10:12
@mergify
Copy link
Copy Markdown

mergify Bot commented Mar 6, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@planetf1
Copy link
Copy Markdown
Contributor Author

planetf1 commented Mar 6, 2026

Closing in favour of a clean branch rebased on upstream/main

@planetf1 planetf1 closed this Mar 6, 2026
@planetf1 planetf1 deleted the fix/test-flaky-ollama-remove-xfails branch March 6, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants