fix: change hallucination detection tests to accept differences in explanation by jakelorocco · Pull Request #1059 · generative-computing/mellea

jakelorocco · 2026-05-11T19:35:05Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes fix: hallucination detection tests #1006

Made the test more permissive in accepting output. Also fixed one unnecessary warning from the open ai backend.

New versions of the tests pass both locally on an apple silicon mac and on a linux machine:

test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection PASSED                                                              [ 50%]
test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection_resolve PASSED                                                      [100%]

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

…planation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

github-actions · 2026-05-11T19:35:23Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1

Leaving a few minor notes from a panel review — nothing blocking.

mellea/backends/openai.py (removed block): The removed guard was the only explicit signal when an ALoraRequirement reaches standard generation via a non-generate caller. The upstream warning at lines 500–505 covers the normal fallback path, but a short comment would make the new contract clear and prevent the guard being re-introduced later:

messages.extend(self.formatter.to_chat_messages([action]))
# ALoraRequirement may arrive here when no adapter is registered;
# _generate is responsible for logging a warning in that case.

planetf1 · 2026-05-26T11:35:01Z

+
+        # Specifically don't check the explanation due to mentioned differences.
+        # assert result["explanation"] == expected["explanation"]
+


The commented-out assert references result["explanation"] and expected["explanation"], but inside the loop those are lists, not dicts — the per-element variables are r and e. If anyone uncomments this to re-enable the check it'll raise TypeError rather than AssertionError, which is a confusing failure to debug. Either fix the identifiers or remove the line:

# To re-enable: assert r["explanation"] == e["explanation"]

Fixed it. Thank you.

planetf1 · 2026-05-26T11:35:01Z

+        assert r["response_end"] == e["response_end"]
+        assert r["response_text"] == e["response_text"]
+        assert r["faithfulness"] == e["faithfulness"]
+


explanation is dropped from the comparison entirely. The cross-platform variation justifies skipping an exact-string match, but for a qualitative test it's worth keeping a minimal shape check so a regression that returns an empty or None explanation doesn't pass silently:

assert isinstance(r["explanation"], str) and r["explanation"].strip()

Good idea. Made this change.

planetf1

Changes look good — the openai.py removal correctly resolves a contradiction where the upstream code promised a fallback but the inner guard was raising instead, and the test relaxation is a reasonable cross-platform fix with preserving length-mismatch detection. Minor notes left inline.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

…planation (generative-computing#1059) * fix: change hallucination detection tests to accept differences in explanation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> * fix: remove unnecessary warning from openai standard generation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> * fix: pr comments Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> --------- Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

jakelorocco added 2 commits May 11, 2026 15:33

fix: change hallucination detection tests to accept differences in ex…

231ba22

…planation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

fix: remove unnecessary warning from openai standard generation

75b27eb

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

github-actions Bot added the bug Something isn't working label May 11, 2026

jakelorocco marked this pull request as ready for review May 12, 2026 12:08

jakelorocco requested a review from a team as a code owner May 12, 2026 12:08

jakelorocco requested review from AngeloDanducci and nrfulton May 12, 2026 12:08

jakelorocco linked an issue May 12, 2026 that may be closed by this pull request

fix: hallucination detection tests #1006

Closed

jakelorocco added the enhancement New feature or request label May 20, 2026

jakelorocco changed the title ~~fix: change hallucination detection tests to accept differences in explanation~~ fix: change hallucination detection tests to accept differences in explanation May 20, 2026

planetf1 reviewed May 26, 2026

View reviewed changes

planetf1 approved these changes May 26, 2026

View reviewed changes

fix: pr comments

cc8226d

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

jakelorocco added this pull request to the merge queue May 26, 2026

Merged via the queue into generative-computing:main with commit d37dc13 May 26, 2026
9 checks passed

jakelorocco deleted the fix/hallucination-detection branch May 26, 2026 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: change hallucination detection tests to accept differences in explanation #1059

fix: change hallucination detection tests to accept differences in explanation #1059
jakelorocco merged 3 commits into
generative-computing:mainfrom
jakelorocco:fix/hallucination-detection

jakelorocco commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

planetf1 left a comment

Uh oh!

planetf1 May 26, 2026

Uh oh!

jakelorocco May 26, 2026

Uh oh!

planetf1 May 26, 2026

Uh oh!

jakelorocco May 26, 2026

Uh oh!

planetf1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		# Specifically don't check the explanation due to mentioned differences.
		# assert result["explanation"] == expected["explanation"]

Conversation

jakelorocco commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

planetf1 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco May 26, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco May 26, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakelorocco commented May 11, 2026 •

edited

Loading