Skip to content

fix: change hallucination detection tests to accept differences in explanation #1059

Merged
jakelorocco merged 3 commits into
generative-computing:mainfrom
jakelorocco:fix/hallucination-detection
May 26, 2026
Merged

fix: change hallucination detection tests to accept differences in explanation #1059
jakelorocco merged 3 commits into
generative-computing:mainfrom
jakelorocco:fix/hallucination-detection

Conversation

@jakelorocco
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco commented May 11, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Made the test more permissive in accepting output. Also fixed one unnecessary warning from the open ai backend.

New versions of the tests pass both locally on an apple silicon mac and on a linux machine:

test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection PASSED                                                              [ 50%]
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection_resolve PASSED                                                      [100%]

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

…planation

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
@github-actions github-actions Bot added the bug Something isn't working label May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@jakelorocco jakelorocco marked this pull request as ready for review May 12, 2026 12:08
@jakelorocco jakelorocco requested a review from a team as a code owner May 12, 2026 12:08
@jakelorocco jakelorocco linked an issue May 12, 2026 that may be closed by this pull request
@jakelorocco jakelorocco added the enhancement New feature or request label May 20, 2026
@jakelorocco jakelorocco changed the title fix: change hallucination detection tests to accept differences in explanation fix: change hallucination detection tests to accept differences in explanation May 20, 2026
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a few minor notes from a panel review — nothing blocking.

mellea/backends/openai.py (removed block): The removed guard was the only explicit signal when an ALoraRequirement reaches standard generation via a non-generate caller. The upstream warning at lines 500–505 covers the normal fallback path, but a short comment would make the new contract clear and prevent the guard being re-introduced later:

messages.extend(self.formatter.to_chat_messages([action]))
# ALoraRequirement may arrive here when no adapter is registered;
# _generate is responsible for logging a warning in that case.


# Specifically don't check the explanation due to mentioned differences.
# assert result["explanation"] == expected["explanation"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out assert references result["explanation"] and expected["explanation"], but inside the loop those are lists, not dicts — the per-element variables are r and e. If anyone uncomments this to re-enable the check it'll raise TypeError rather than AssertionError, which is a confusing failure to debug. Either fix the identifiers or remove the line:

# To re-enable: assert r["explanation"] == e["explanation"]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Thank you.

assert r["response_end"] == e["response_end"]
assert r["response_text"] == e["response_text"]
assert r["faithfulness"] == e["faithfulness"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explanation is dropped from the comparison entirely. The cross-platform variation justifies skipping an exact-string match, but for a qualitative test it's worth keeping a minimal shape check so a regression that returns an empty or None explanation doesn't pass silently:

assert isinstance(r["explanation"], str) and r["explanation"].strip()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Made this change.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good — the openai.py removal correctly resolves a contradiction where the upstream code promised a fallback but the inner guard was raising instead, and the test relaxation is a reasonable cross-platform fix with preserving length-mismatch detection. Minor notes left inline.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
@jakelorocco jakelorocco added this pull request to the merge queue May 26, 2026
Merged via the queue into generative-computing:main with commit d37dc13 May 26, 2026
9 checks passed
@jakelorocco jakelorocco deleted the fix/hallucination-detection branch May 26, 2026 15:35
akihikokuroda pushed a commit to akihikokuroda/mellea that referenced this pull request May 27, 2026
…planation (generative-computing#1059)

* fix: change hallucination detection tests to accept differences in explanation

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

* fix: remove unnecessary warning from openai standard generation

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

* fix: pr comments

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

---------

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: hallucination detection tests

2 participants