fix: change hallucination detection tests to accept differences in explanation #1059
Conversation
…planation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
planetf1
left a comment
There was a problem hiding this comment.
Leaving a few minor notes from a panel review — nothing blocking.
mellea/backends/openai.py (removed block): The removed guard was the only explicit signal when an ALoraRequirement reaches standard generation via a non-generate caller. The upstream warning at lines 500–505 covers the normal fallback path, but a short comment would make the new contract clear and prevent the guard being re-introduced later:
messages.extend(self.formatter.to_chat_messages([action]))
# ALoraRequirement may arrive here when no adapter is registered;
# _generate is responsible for logging a warning in that case.|
|
||
| # Specifically don't check the explanation due to mentioned differences. | ||
| # assert result["explanation"] == expected["explanation"] | ||
|
|
There was a problem hiding this comment.
The commented-out assert references result["explanation"] and expected["explanation"], but inside the loop those are lists, not dicts — the per-element variables are r and e. If anyone uncomments this to re-enable the check it'll raise TypeError rather than AssertionError, which is a confusing failure to debug. Either fix the identifiers or remove the line:
# To re-enable: assert r["explanation"] == e["explanation"]There was a problem hiding this comment.
Fixed it. Thank you.
| assert r["response_end"] == e["response_end"] | ||
| assert r["response_text"] == e["response_text"] | ||
| assert r["faithfulness"] == e["faithfulness"] | ||
|
|
There was a problem hiding this comment.
explanation is dropped from the comparison entirely. The cross-platform variation justifies skipping an exact-string match, but for a qualitative test it's worth keeping a minimal shape check so a regression that returns an empty or None explanation doesn't pass silently:
assert isinstance(r["explanation"], str) and r["explanation"].strip()There was a problem hiding this comment.
Good idea. Made this change.
planetf1
left a comment
There was a problem hiding this comment.
Changes look good — the openai.py removal correctly resolves a contradiction where the upstream code promised a fallback but the inner guard was raising instead, and the test relaxation is a reasonable cross-platform fix with preserving length-mismatch detection. Minor notes left inline.
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
d37dc13
…planation (generative-computing#1059) * fix: change hallucination detection tests to accept differences in explanation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> * fix: remove unnecessary warning from openai standard generation Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> * fix: pr comments Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> --------- Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Misc PR
Type of PR
Description
Made the test more permissive in accepting output. Also fixed one unnecessary warning from the open ai backend.
New versions of the tests pass both locally on an apple silicon mac and on a linux machine:
Testing
Attribution