LCORE-956: Bump llama stack to 0.3.0#866
Conversation
WalkthroughAdds Conversations v3 endpoints; integrates conversation lifecycle (normalize/convert IDs, persist) with Responses API for query and streaming flows; introduces ToolCallSummary and ToolResultSummary, moves RAGChunk to utils, enriches SSE end-events with quotas and referenced documents; bumps llama-stack to 0.3.0 and updates configs/tests/OpenAPI. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant App as Lightspeed App
participant Llama as Llama Stack (Responses API)
participant DB as Local DB
Client->>App: POST /v1/query or /v1/streaming_query (may include conversation_id)
alt no conversation_id
App->>Llama: POST /conversations.create
Llama-->>App: { id: "conv_<hex>" }
App->>DB: store mapping (normalized id)
DB-->>App: ack
else conversation_id provided
App->>App: normalize / to_llama_stack_conversation_id
end
App->>Llama: Call Responses API (include "conversation": llama_conv_id)
alt streaming
Llama-->>App: streaming chunks (tokens, tool events...)
App->>Client: SSE chunks (START with normalized id, token/tool events...)
Note over App: compute referenced_documents & available_quotas before end
App->>Client: SSE end-event (token usage, available_quotas, referenced_documents)
else non-streaming
Llama-->>App: full response (tool_calls, tool_results, references)
App->>Client: JSON response (tool_calls, tool_results, referenced_documents, token usage, quotas)
end
App->>DB: persist transcript (includes tool_results)
DB-->>App: ack
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–75 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5099065 to
3d4e2c5
Compare
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/models/responses.py (1)
417-451: Updatemodel_configexample to match the newToolCallSummaryandToolResultSummarystructure.The example includes
rag_chunks(which is no longer a field) and uses the oldtool_callsformat (tool_name,arguments,result) instead of the newToolCallSummarystructure (id,name,args,type). The example should also includetool_results.model_config = { "json_schema_extra": { "examples": [ { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", "response": "Operator Lifecycle Manager (OLM) helps users install...", - "rag_chunks": [ - { - "content": "OLM is a component of the Operator Framework toolkit...", - "source": "kubernetes-docs/operators.md", - "score": 0.95, - } - ], "tool_calls": [ { - "tool_name": "knowledge_search", - "arguments": {"query": "operator lifecycle manager"}, - "result": {"chunks_found": 5}, + "id": "call_123", + "name": "knowledge_search", + "args": {"query": "operator lifecycle manager"}, + "type": "tool_call", + } + ], + "tool_results": [ + { + "id": "call_123", + "status": "success", + "content": {"chunks_found": 5}, + "type": "tool_result", + "round": 1, } ], "referenced_documents": [src/utils/suid.py (1)
19-44: Aligncheck_suidtype handling with its docstring (bytes vs str)The extended
check_suidlogic forconv_...and plain hex conversation IDs looks good, and the newnormalize_conversation_id/to_llama_stack_conversation_idhelpers are straightforward and correct for theconv_<hex>vs normalized formats.However, the docstring now advertises:
suid (str | bytes)and- acceptance of a “byte representation”,
while the function is annotated as
check_suid(suid: str) -> booland, in practice, treats non‑strinputs as invalid (abytesvalue falls through touuid.UUID(suid)and is rejected withTypeError, returningFalse).To avoid confusion for callers, consider either:
- Updating the signature and implementation to genuinely support bytes, e.g.:
-def check_suid(suid: str) -> bool: +def check_suid(suid: str | bytes) -> bool: @@ - try: + try: + if isinstance(suid, bytes): + try: + suid = suid.decode("ascii") + except UnicodeDecodeError: + return False + # Accept llama-stack conversation IDs (conv_<hex> format) - if isinstance(suid, str) and suid.startswith("conv_"): + if suid.startswith("conv_"): @@ - if isinstance(suid, str) and len(suid) >= 32: + if len(suid) >= 32: @@ - uuid.UUID(suid) + uuid.UUID(str(suid))
- Or, if bytes are not needed, simplifying the docstring to mention only
strinputs.Either way, keeping type hints and documentation consistent will make this utility safer to reuse.
Also applies to: 45-98, 101-145
🧹 Nitpick comments (29)
tests/e2e/features/info.feature (1)
19-19: Exact llama-stack version assertion may be too brittleThe step now hard-codes
0.3.0rc3+rhai0. If the underlying stack moves to another rc/build of 0.3.0, this e2e will fail even though the service is healthy and still on the intended major/minor version.Consider relaxing this to:
- Match just the semantic version prefix (e.g.,
0.3.0), or- Delegate the exact expected value to a shared constant/env so it’s updated in one place when bumping the stack.
Also please confirm that the
/infoendpoint actually returns0.3.0rc3+rhai0in your target deployment before merging.pyproject.toml (1)
64-70: Track technical debt for type checking exclusions.Adding 4 more files to pyright exclusions increases the untyped surface area. The comment indicates these are deprecated v1 endpoints - consider adding a TODO with a tracking issue to either fix the type errors or remove these endpoints when v1 is fully deprecated.
src/utils/llama_stack_version.py (1)
43-60: Update docstring to reflect new behavior.The docstring states
version_info"must be parseable by semver.Version.parse", but the implementation now silently skips the check on parse failure rather than raising an exception. Update the docstring to document this behavior accurately.src/utils/types.py (2)
113-124: Consider using Literal types for status field.The
statusfield accepts any string, but based on usage it appears to only be "success" or "failure". Consider usingLiteral["success", "failure"]for better type safety.+from typing import Literal + class ToolResultSummary(BaseModel): """Model representing a result from a tool call (for tool_results list).""" id: str = Field( description="ID of the tool call/result, matches the corresponding tool call 'id'" ) - status: str = Field( + status: Literal["success", "failure"] = Field( ..., description="Status of the tool execution (e.g., 'success')" )
163-171: Address the TODO comment about theroundattribute.The comment
# clarify meaning of this attributeon line 169 indicates uncertainty about theroundfield's semantics. Either document its intended meaning or track this as a follow-up issue.Do you want me to open an issue to track clarifying the
roundattribute's meaning?src/models/responses.py (1)
28-41: Remove commented-out code instead of leaving it in the codebase.Dead code should be removed rather than commented out. Version control preserves history if these definitions are needed later.
-# class ToolCall(BaseModel): -# """Model representing a tool call made during response generation.""" - -# tool_name: str = Field(description="Name of the tool called") -# arguments: dict[str, Any] = Field(description="Arguments passed to the tool") -# result: dict[str, Any] | None = Field(None, description="Result from the tool") - - -# class ToolResult(BaseModel): -# """Model representing a tool result.""" - -# tool_name: str = Field(description="Name of the tool") -# result: dict[str, Any] = Field(description="Result from the tool") -src/app/endpoints/streaming_query_v2.py (3)
76-76: Enable or remove the commented-outPromptTooLongResponseentry.The 413 response is commented out but
PromptTooLongResponsewas added in this PR. If prompt-too-long errors can occur in this endpoint, enable the response; otherwise, remove the comment.- # 413: PromptTooLongResponse.openapi_response(), + 413: PromptTooLongResponse.openapi_response(),If enabling, also add the import:
from models.responses import ( ... PromptTooLongResponse, ... )
296-296: Fix typo: double closing parenthesis in comment.- # Perform cleanup tasks (database and cache operations)) + # Perform cleanup tasks (database and cache operations)
457-461: Remove commented-out debug code.Debug logging statements should be removed rather than left commented in production code.
response_stream = cast(AsyncIterator[OpenAIResponseObjectStream], response) - # async for chunk in response_stream: - # logger.error("Chunk: %s", chunk.model_dump_json()) - # Return the normalized conversation_id (already normalized above) - # The response_generator will emit it in the start event + # Return the normalized conversation_id (already normalized above). + # The response_generator will emit it in the start event. return response_stream, conversation_idsrc/utils/token_counter.py (1)
72-83: Input-token comment vs. implementation mismatch (system vs. user role)The comment says this block uses the same logic as
metrics.utils.update_llm_token_count_from_turn, but here thesystem_promptis wrapped asRawMessage(role="system", ...)while the metrics helper usesrole="user". This may slightly change tokenization and contradicts the “same logic” claim. Consider either aligning the role for true parity or updating the comment to reflect the intentional difference.tests/unit/utils/test_endpoints.py (1)
261-803: Deprecated agent/temp-agent tests are now skippedMarking these legacy
get_agent/get_temp_agentasync tests with@pytest.mark.skip(reason="Deprecated API test")is a pragmatic way to avoid failures against removed/changed APIs while preserving the scenarios for reference. You may eventually want to replace these with coverage for the new v2/v3 flows or group them under a custom marker (e.g.@pytest.mark.deprecated_api) to make later cleanup easier, but the current change is fine.tests/e2e/features/conversations.feature (1)
173-181: Non-existent DELETE now returns 200 with explanatory payloadThe DELETE
/conversations/{conversation_id}scenario for a non-existent ID now expects a 200 with{"success": true, "response": "Conversation cannot be deleted"}(ignoringconversation_id). This matches the new idempotent delete behavior but is a semantic shift from the earlier 404. Please ensure client logic and API docs are updated to rely on the response body/message for “not found” vs. “deleted” distinctions rather than the status code alone.docs/conversations_api.md (1)
14-31: Fix markdownlint issues: list indentation and fenced code languagesThe doc reads well, but the markdownlint findings are valid and may break CI:
- MD007: nested list items in the TOC (Lines ~16–27) are indented with three spaces instead of two.
- MD040: several fenced code blocks lack an explicit language (e.g., the
conv_...example, SSE snippets, error snippets).You can address both with a small style-only patch, for example:
- * [Llama Stack Format](#llama-stack-format) - * [Normalized Format](#normalized-format) - * [ID Conversion Utilities](#id-conversion-utilities) + * [Llama Stack Format](#llama-stack-format) + * [Normalized Format](#normalized-format) + * [ID Conversion Utilities](#id-conversion-utilities) @@ -``` +```text conv_<48-character-hex-string>@@
-+text
conv_0d21ba731f21f798dc9680125d5d6f493e4a7ab79f25670e@@ -``` +```text data: {"event": "start", "data": {"conversation_id": "0d21ba7..."}} @@ -``` +```text Error: Conversation not found (HTTP 404)Apply the same `text` (or another appropriate language like `bash`, `json`, or `sql`) to the remaining unlabeled fenced blocks. Also applies to: 58-80, 234-245, 369-379, 469-471 </blockquote></details> <details> <summary>tests/unit/models/responses/test_successful_responses.py (1)</summary><blockquote> `41-45`: **Strengthening QueryResponse and StreamingQueryResponse tests** The switch to `ToolCallSummary` / `ToolResultSummary` and the new `TestStreamingQueryResponse` look consistent with the updated response models: - `test_constructor_minimal` correctly expects `tool_calls` and `tool_results` to default to `None`. - `TestStreamingQueryResponse` validates the SSE `text/event-stream` OpenAPI shape and example presence exactly as implemented. One small improvement: in `test_constructor_full`, you already construct `tool_results` and pass them into `QueryResponse`, but never assert they are preserved. Adding a single check would tighten coverage on the new field: ```diff @@ - response = QueryResponse( # type: ignore[call-arg] + response = QueryResponse( # type: ignore[call-arg] @@ - assert response.conversation_id == "conv-123" - assert response.tool_calls == tool_calls + assert response.conversation_id == "conv-123" + assert response.tool_calls == tool_calls + assert response.tool_results == tool_results assert response.referenced_documents == referenced_docsAlso applies to: 262-317, 968-996
tests/integration/endpoints/test_query_v2_integration.py (1)
84-90: Integration tests correctly exercise ID normalization, tool calls, and transcript handling
- Mocking
conversations.createto returnconv_+ 48 hex chars and assertingresponse.conversation_id == "a" * 48gives good end‑to‑end coverage of the new normalization flow.- The updated tool‑call tests now operate on structured
resultsand asserttool_calls[0].name/ aggregatedtool_names, matching the new ToolCallSummary shape.- In the transcript behavior test, introducing
mockerand patchingstore_transcriptavoids real file I/O while still verifying that conversations are persisted regardless of transcript configuration.You might optionally also assert
len(response.conversation_id) == 48in the normalization checks to make the invariant explicit, but the current tests are already solid.Also applies to: 166-170, 346-355, 375-378, 508-510, 1198-1227
tests/unit/app/endpoints/test_query_v2.py (1)
108-146: Unit tests comprehensively cover query v2 behavior; tighten one referenced-docs testOverall, these unit tests give strong coverage of the new query v2 behavior:
retrieve_response_*tests now mockconversations.createand assert that conversation IDs are normalized (e.g.,"conv_abc123def456"→"abc123def456"), exercise RAG/MCP tool preparation, shields/guardrails wiring, and token‑usage extraction.- Output parsing tests correctly handle content parts with
annotationslists and ensure tool calls are surfaced asToolCallSummaryinstances.- The new
test_retrieve_response_parses_referenced_documentsnicely validates URL citations, file citations, andfile_search_callresults feeding intoreferenced_docs.One small consistency improvement: in
test_retrieve_response_parses_referenced_documentsyou don’t mockconversations.create, so the returnedconv_iddepends on AsyncMock’s default behavior and is unused. For clarity and future robustness (if the helper ever starts validating or persisting that ID), you could align it with the other tests:@@ - mock_client = mocker.AsyncMock() + mock_client = mocker.AsyncMock() @@ - mock_client.responses.create = mocker.AsyncMock(return_value=response_obj) + mock_client.responses.create = mocker.AsyncMock(return_value=response_obj) + # Ensure deterministic conversation_id normalization + mock_conversation = mocker.Mock() + mock_conversation.id = "conv_abc123def456" + mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation) @@ - _summary, _conv_id, referenced_docs, _token_usage = await retrieve_response( + _summary, conv_id, referenced_docs, _token_usage = await retrieve_response( mock_client, "model-docs", qr, token="tkn", provider_id="test-provider" ) + + assert conv_id == "abc123def456"This keeps the conversation‑ID behavior under test consistent with the rest of the suite without changing the core assertions around referenced documents.
Also applies to: 151-199, 209-265, 268-399, 400-444, 468-475, 535-579, 582-629, 676-775, 779-860
src/app/endpoints/query.py (2)
197-201: Consider removing commented-out code.The
# toolgroups=None,comment on line 201 and commented# documents=documentson line 754 appear to be temporary placeholders. If these parameters are intentionally removed from the API call, the comments should be cleaned up. If they represent pending work, consider adding a TODO with a ticket reference.
742-757: Clarify status of commented-out code blocks.Multiple code sections are commented out with no clear explanation:
- Lines 742-749:
documentsconstruction- Lines 754:
# documents=documents- Lines 756:
# toolgroups=toolgroupsIf this represents intentional API changes in llama-stack 0.3.0, consider adding brief comments explaining why. If temporary, add TODOs with ticket references.
tests/unit/app/endpoints/test_streaming_query.py (1)
46-144: Mock classes provide backward compatibility for deprecated Agent API tests.These mock classes (
TextDelta,ToolCallDelta,TurnResponseEvent, etc.) emulate the Agent API types that no longer exist in llama-stack-client 0.3.x. The docstrings clearly indicate their purpose for backward compatibility.Consider consolidating these mocks into a shared test utilities module if they're needed across multiple test files, to reduce duplication.
src/utils/endpoints.py (2)
313-316: Clarify intent of commented-out agent retrieval code.Multiple sections of agent retrieval and session management code are commented out with
...ellipsis placeholders:
- Lines 313-315: Agent retrieval
- Lines 335-342: Orphan agent deletion and session retrieval
If this represents a temporary state during the llama-stack 0.3.0 migration, add a TODO comment with a ticket reference. If these features are intentionally removed, the dead code should be cleaned up.
if conversation_id: with suppress(ValueError): - # agent_response = await client.agents.retrieve(agent_id=conversation_id) - # existing_agent_id = agent_response.agent_id - ... + # TODO(LCORE-XXX): Agent retrieval disabled during llama-stack 0.3.0 migration + passAlso applies to: 335-342
323-329: Shield parameters commented out with type:ignore.The
input_shieldsandoutput_shieldsparameters are commented out with# type: ignore[call-arg]annotations. If shields are intentionally disabled in llama-stack 0.3.0, add a clear comment explaining why. Theenable_session_persistenceparameter has a similar pattern.src/app/endpoints/conversations_v3.py (1)
47-48: Logger name should use__name__pattern; router tag inconsistent with filename.Per coding guidelines, use
logger = logging.getLogger(__name__)for module logging. Additionally, the router tag"conversations_v1"is inconsistent with the filenameconversations_v3.py, which may cause confusion in API documentation.-logger = logging.getLogger("app.endpoints.handlers") -router = APIRouter(tags=["conversations_v1"]) +logger = logging.getLogger(__name__) +router = APIRouter(tags=["conversations_v3"])tests/unit/app/endpoints/test_query.py (1)
609-609: Consider tracking or removing deprecated tests.Multiple tests are marked as skipped with "Deprecated API test". While this is acceptable during migration, consider:
- Creating an issue to track removal of these deprecated tests
- Converting them to test the new Responses API equivalents
- Removing them if the functionality is covered elsewhere
Leaving many skipped tests long-term can obscure coverage metrics and create maintenance burden.
Would you like me to open an issue to track the cleanup of these deprecated tests?
src/app/endpoints/streaming_query.py (2)
363-378: Improve error logging message in stream_http_error.
logger.exception(error)logs the error object as the message. While this works, it's more conventional to provide a descriptive message string:async def stream_http_error(error: AbstractErrorResponse) -> AsyncGenerator[str, None]: ... logger.error("Error while obtaining answer for user question") - logger.exception(error) + logger.exception("Streaming error: %s", type(error).__name__) yield format_stream_data({"event": "error", "data": {**error.detail.model_dump()}})
1110-1116: Commented-out parameters need clarification or cleanup.The
documentsparameter has a TODO reference (LCORE-881), but thetoolgroupsparameter is also commented out without explanation. Either add a tracking reference or remove if intentionally disabled.response = await agent.create_turn( messages=[UserMessage(role="user", content=query_request.query).model_dump()], session_id=session_id, - # documents=documents, + # TODO: LCORE-881 - documents temporarily disabled + # documents=documents, stream=True, - # toolgroups=toolgroups, + # TODO: Add ticket reference for toolgroups disabled status + # toolgroups=toolgroups, )src/app/endpoints/query_v2.py (1)
390-390: Consider reducing log verbosity for response object.Logging the entire
responseobject at INFO level could produce large log entries and impact performance. Consider using DEBUG level or logging only summary information:- logger.info("Response: %s", response) + logger.debug("Response object: %s", response)docs/openapi.json (3)
3431-3442: Minor docs polish: consistent error examples and typos
- Ensure example labels/messages are consistent (e.g., “cannot be deleted” vs “can not be deleted” elsewhere).
- Typo “attatchment” appears in validation examples; prefer “attachment”.
No schema change needed; search/replace in example texts is sufficient.
Also applies to: 3451-3470, 3479-3492, 3497-3511, 3526-3539, 3542-3551
6990-7014: ToolCallSummary: consider making args optional-but-documented and align example usageargs is optional in schema but not shown in examples anywhere else. Add a minimal example or clarify default (empty object) to improve client codegen hints. Also ensure all endpoint examples use name/args/id consistently.
1517-1525: SSE schema representation — optional enhancementOpenAPI can’t fully model SSE; consider linking to an explicit event schema (start/token/turn_complete/end) in description for stronger client expectations.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (40)
docs/conversations_api.md(1 hunks)docs/openapi.json(11 hunks)pyproject.toml(2 hunks)run.yaml(1 hunks)src/app/endpoints/conversations_v2.py(0 hunks)src/app/endpoints/conversations_v3.py(1 hunks)src/app/endpoints/query.py(9 hunks)src/app/endpoints/query_v2.py(16 hunks)src/app/endpoints/streaming_query.py(19 hunks)src/app/endpoints/streaming_query_v2.py(14 hunks)src/app/routers.py(2 hunks)src/constants.py(1 hunks)src/metrics/utils.py(1 hunks)src/models/requests.py(1 hunks)src/models/responses.py(7 hunks)src/utils/endpoints.py(5 hunks)src/utils/llama_stack_version.py(2 hunks)src/utils/suid.py(1 hunks)src/utils/token_counter.py(1 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(3 hunks)test.containerfile(1 hunks)tests/configuration/minimal-stack.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(1 hunks)tests/e2e/features/conversations.feature(1 hunks)tests/e2e/features/info.feature(1 hunks)tests/integration/endpoints/test_query_v2_integration.py(8 hunks)tests/integration/test_openapi_json.py(4 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(30 hunks)tests/unit/app/endpoints/test_query_v2.py(27 hunks)tests/unit/app/endpoints/test_streaming_query.py(28 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(13 hunks)tests/unit/app/test_routers.py(3 hunks)tests/unit/models/responses/test_error_responses.py(3 hunks)tests/unit/models/responses/test_query_response.py(3 hunks)tests/unit/models/responses/test_rag_chunk.py(1 hunks)tests/unit/models/responses/test_successful_responses.py(6 hunks)tests/unit/utils/test_endpoints.py(11 hunks)tests/unit/utils/test_transcripts.py(3 hunks)
💤 Files with no reviewable changes (1)
- src/app/endpoints/conversations_v2.py
🧰 Additional context used
📓 Path-based instructions (9)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/conversations.featuretests/e2e/features/info.feature
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/suid.pysrc/utils/llama_stack_version.pysrc/metrics/utils.pysrc/utils/transcripts.pysrc/utils/token_counter.pysrc/app/endpoints/conversations_v3.pysrc/constants.pysrc/utils/types.pysrc/utils/endpoints.pysrc/app/routers.pysrc/models/responses.pysrc/models/requests.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/utils/test_endpoints.pytests/unit/models/responses/test_successful_responses.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/responses/test_rag_chunk.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.pytests/integration/test_openapi_json.pytests/unit/utils/test_transcripts.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/utils/test_endpoints.pytests/unit/models/responses/test_successful_responses.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/responses/test_rag_chunk.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.pytests/integration/test_openapi_json.pytests/unit/utils/test_transcripts.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_query.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Configure pylint withsource-roots = "src"
Excludesrc/auth/k8s.pyfrom pyright type checking
Files:
pyproject.toml
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/conversations_v3.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/conversations_v3.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/constants.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define shared constants in central
constants.pyfile with descriptive comments
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.pysrc/models/requests.py
🧠 Learnings (7)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to pyproject.toml : Exclude `src/auth/k8s.py` from pyright type checking
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
Applied to files:
tests/unit/utils/test_endpoints.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/app/endpoints/test_query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Use pytest for all unit and integration tests; do not use unittest framework
Applied to files:
tests/unit/utils/test_endpoints.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.
Applied to files:
tests/unit/utils/test_endpoints.pysrc/utils/endpoints.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/metrics/utils.pysrc/app/endpoints/streaming_query.pytests/unit/app/endpoints/test_query.pysrc/app/endpoints/query.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
test.containerfile
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (16)
tests/unit/app/endpoints/test_conversations_v2.py (4)
tests/unit/app/endpoints/test_tools.py (1)
mock_configuration(28-47)src/cache/postgres_cache.py (1)
delete(312-347)src/cache/sqlite_cache.py (1)
delete(310-343)src/app/endpoints/conversations_v2.py (1)
delete_conversation_endpoint_handler(145-168)
tests/unit/models/responses/test_successful_responses.py (2)
src/models/responses.py (5)
StreamingQueryResponse(454-519)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/utils/endpoints.py (3)
src/models/responses.py (2)
model(1624-1628)NotFoundResponse(1431-1486)src/utils/types.py (2)
GraniteToolParser(60-99)get_parser(82-99)src/utils/suid.py (1)
get_suid(6-16)
tests/unit/models/responses/test_error_responses.py (1)
src/models/responses.py (7)
PromptTooLongResponse(1489-1518)AbstractErrorResponse(1157-1210)DetailModel(1150-1154)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)
tests/unit/app/endpoints/test_streaming_query.py (3)
src/models/responses.py (2)
ReferencedDocument(328-338)model(1624-1628)src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/streaming_query.py (2)
streaming_query_endpoint_handler(961-995)stream_end_event(148-197)
tests/unit/models/responses/test_rag_chunk.py (1)
src/utils/types.py (1)
RAGChunk(127-132)
tests/unit/app/endpoints/test_query_v2.py (2)
src/app/endpoints/streaming_query.py (1)
retrieve_response(998-1119)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(364-461)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/app/endpoints/streaming_query_v2.py (1)
streaming_query_endpoint_handler_v2(327-361)tests/unit/app/endpoints/test_query.py (1)
dummy_request(59-68)tests/unit/app/endpoints/test_query_v2.py (1)
dummy_request(31-34)src/models/responses.py (1)
model(1624-1628)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
tests/unit/utils/test_transcripts.py (1)
src/utils/types.py (3)
ToolCallSummary(102-110)ToolResultSummary(113-124)TurnSummary(135-220)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/app/endpoints/query_v2.py (4)
src/utils/suid.py (2)
normalize_conversation_id(101-122)to_llama_stack_conversation_id(125-145)src/utils/responses.py (1)
extract_text_from_response_output_item(6-56)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)src/models/responses.py (2)
conversation(1383-1392)ReferencedDocument(328-338)
src/app/endpoints/streaming_query.py (2)
src/app/endpoints/query.py (1)
parse_referenced_documents(605-632)src/utils/types.py (1)
content_to_str(20-39)
tests/unit/models/responses/test_query_response.py (2)
src/models/responses.py (2)
QueryResponse(341-451)ReferencedDocument(328-338)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
tests/unit/app/endpoints/test_query.py (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
ToolExecutionStep(128-133)ToolResponse(136-141)src/utils/types.py (1)
TurnSummary(135-220)
src/app/endpoints/query.py (2)
src/models/responses.py (7)
PromptTooLongResponse(1489-1518)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)QuotaExceededResponse(1563-1643)model(1624-1628)src/utils/types.py (2)
TurnSummary(135-220)content_to_str(20-39)
🪛 markdownlint-cli2 (0.18.1)
docs/conversations_api.md
16-16: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
63-63: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
77-77: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
369-369: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
469-469: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
11d5075 to
d4bda49
Compare
bfa516e to
0a41992
Compare
Implements the function parse_referenced_documents_from_responses_api checking at the Response API output at: - file_search_call objects (filename and attributes) - annotations within messages content (type, url, title) - 2 type of annoations, url_citation and file_citation
69c53b9 to
ff31c6a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/openapi.json (1)
6364-6370: QueryResponse description drift (mentions rag_chunks; omits tool_results)Public surface removed rag_chunks and added tool_results. Update description for accuracy.
- "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n rag_chunks: List of RAG chunks used to generate the response.\n referenced_documents: The URLs and titles for the documents used to generate the response.\n tool_calls: List of tool calls made during response generation.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.", + "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n referenced_documents: Documents referenced in generating the response.\n tool_calls: List of tool calls made during response generation.\n tool_results: List of tool call results.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.",
♻️ Duplicate comments (4)
src/models/responses.py (1)
425-426: Fix line length violation.The URL string exceeds the 100-character line limit (102 characters). This was flagged in a previous review but remains unresolved.
Apply this diff:
{ - "doc_url": "https://docs.openshift.com/container-platform/4.15/" - "operators/understanding/olm/olm-understanding-olm.html", + "doc_url": ( + "https://docs.openshift.com/container-platform/4.15/" + "operators/understanding/olm/olm-understanding-olm.html" + ), "doc_title": "Operator Lifecycle Manager concepts and resources", },docs/openapi.json (3)
1499-1524: Streaming: fix operationId and SSE example (null boolean, field placement)
- Keep v1-only operationId.
- SSE example: truncated must be boolean; available_quotas should be inside data; add empty tool arrays for parity.
- "operationId": "streaming_query_endpoint_handler_v2_v1_streaming_query_post", + "operationId": "streaming_query_endpoint_handler_v1_streaming_query_post", @@ - "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n" + "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 11, \"output_tokens\": 19, \"available_quotas\": {}, \"tool_calls\": [], \"tool_results\": []}}\n\n"Optional: declare standard SSE headers for client guidance.
"text/event-stream": { "schema": { "type": "string", "format": "text/event-stream" }, + "headers": { + "Cache-Control": { "schema": { "type": "string" }, "description": "no-cache" }, + "Connection": { "schema": { "type": "string" }, "description": "keep-alive" } + },
1189-1194: operationId leaks impl version; keep v1-only nameUse a stable v1 identifier to avoid SDK/codegen churn.
- "operationId": "query_endpoint_handler_v2_v1_query_post", + "operationId": "query_endpoint_handler_v1_query_post",
7091-7115: ToolResultSummary.content lacks type; make it an explicit any‑JSON unionWithout a type, validation/codegen is ambiguous. Define a permissive union.
- "content": { - "title": "Content", - "description": "Content/result returned from the tool" - }, + "content": { + "anyOf": [ + { "type": "string" }, + { "type": "number" }, + { "type": "boolean" }, + { "type": "object", "additionalProperties": true }, + { "type": "array", "items": {} }, + { "type": "null" } + ], + "title": "Content", + "description": "Content/result returned from the tool" + },
🧹 Nitpick comments (1)
docs/openapi.json (1)
1516-1524: SSE content-type OK; add headers for correctness and interopAdvertise standard SSE headers to prevent intermediate caching/connection close surprises.
"text/event-stream": { "schema": { "type": "string", "format": "text/event-stream" }, + "headers": { + "Cache-Control": { "schema": { "type": "string" }, "description": "no-cache" }, + "Connection": { "schema": { "type": "string" }, "description": "keep-alive" } + },
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/openapi.json(17 hunks)src/models/responses.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.py
🧬 Code graph analysis (1)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: server mode / ci
tisnik
left a comment
There was a problem hiding this comment.
LGTM, a lot of work done there
| from llama_stack.models.llama.llama3.tokenizer import Tokenizer | ||
| from llama_stack_client import APIConnectionError, APIStatusError | ||
| from llama_stack_client.types.agents.turn import Turn | ||
| from llama_stack_client.types.alpha.agents.turn import Turn |
There was a problem hiding this comment.
"nice" package name, sounds like it is super stable ;)
Description
Bump to llama stack version 0.3.0. Adds query and conversation endpoints using responses API that are compatible with OLS endpoints.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.