LCORE-471: Remove data collector service#377
Conversation
WalkthroughThis PR removes the Data Collector feature across code, configuration, and docs; simplifies startup to always run the web service; updates configuration schemas; and expands OpenAPI with a new conversations endpoint and database configuration (sqlite/postgres). Numerous tests and documentation sections related to Data Collector are deleted. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant App as lightspeed_stack.py
participant Web as Uvicorn/Web Service
CLI->>App: main(args)
App->>App: parse args (no --data-collector)
alt dump configuration
App-->>CLI: write config and exit
else start service
App->>Web: start_uvicorn()
Web-->>CLI: service running
end
sequenceDiagram
participant Client as Client
participant API as Web Service
participant DB as Database (sqlite/postgres)
Client->>API: GET /v1/conversations
API->>DB: Query user conversations
DB-->>API: Conversations list
API-->>Client: 200 ConversationsListResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (7)
docs/testing.md (1)
68-71: Align integration test coverage threshold in docs with MakefileThe docs in
docs/testing.md(lines 68–71) state that the integration‐test coverage threshold is 60%, but the Makefile actually sets--cov-fail-under=10fortest-integration. Update one or the other to keep them in sync.• File: docs/testing.md (around line 69)
• Makefile (line 21) sets integration threshold to 10%Proposed doc fix:
-Currently code coverage threshold for integration tests is set to 60%. This value is specified directly in Makefile, because the coverage threshold is different from threshold required for unit tests. +Currently code coverage threshold for integration tests is set to 10%. This value is specified directly in Makefile, because the coverage threshold is different from threshold required for unit tests..github/workflows/e2e_tests.yaml (1)
85-93: lightspeed-stack.yaml keys inconsistent with repository/docs (feedback_disabledvsfeedback_enabled)Other configs/docs use
feedback_enabled/transcripts_enabled. Here the workflow writesfeedback_disabled/transcripts_disabled, which may not be recognized by the config model.Apply this diff to align with the rest of the repo:
user_data_collection: - feedback_disabled: false - feedback_storage: "/tmp/data/feedback" - transcripts_disabled: false - transcripts_storage: "/tmp/data/transcripts" + feedback_enabled: true + feedback_storage: "/tmp/data/feedback" + transcripts_enabled: true + transcripts_storage: "/tmp/data/transcripts" authentication: module: "noop"src/lightspeed_stack.py (2)
61-64: Avoid logging full configuration objects (risk of leaking secrets like API keys)
logger.info("Configuration: %s", configuration.configuration)likely prints sensitive fields (e.g.,llama_stack.api_key). Reduce detail or redact.Apply a minimal hardening:
- logger.info("Configuration: %s", configuration.configuration) - logger.info( - "Llama stack configuration: %s", configuration.llama_stack_configuration - ) + # Avoid dumping full configuration at INFO level to prevent secret leakage + logger.debug("Configuration loaded.") + logger.debug("Llama stack configuration loaded.")If you have a redaction-safe serializer, prefer logging that instead.
66-68: Wrap client initialization with error handlingNetwork or config errors will currently bubble up unhandled. Fail fast with a clear message and non-zero exit.
- asyncio.run( - AsyncLlamaStackClientHolder().load(configuration.configuration.llama_stack) - ) + try: + asyncio.run( + AsyncLlamaStackClientHolder().load(configuration.configuration.llama_stack) + ) + except Exception as exc: + logger.error("Failed to initialize Llama Stack client: %s", exc, exc_info=True) + raise SystemExit(1)src/models/config.py (1)
241-246: Avoid assert for runtime validation in jwk_configuration propertyUsing assert may raise AssertionError and get optimized away. Raise a ValueError to keep error semantics consistent.
def jwk_configuration(self) -> JwkConfiguration: """Return JWK configuration if the module is JWK token.""" if self.module != constants.AUTH_MOD_JWK_TOKEN: raise ValueError( "JWK configuration is only available for JWK token authentication module" ) - assert self.jwk_config is not None, "JWK configuration should not be None" - return self.jwk_config + if self.jwk_config is None: + raise ValueError("JWK configuration must be provided for JWK token module") + return self.jwk_configdocs/config.puml (1)
12-22: Add DatabaseConfiguration and DB classes to diagram (docs out of sync with code and OpenAPI)Configuration now includes a database field with SQLite/PostgreSQL options. The diagram should reflect:
- Class DatabaseConfiguration with sqlite/postgres.
- Classes SQLiteDatabaseConfiguration and PostgreSQLDatabaseConfiguration.
- Composition relationships.
Proposed additions:
class "Configuration" as src.models.config.Configuration { authentication : Optional[AuthenticationConfiguration] customization : Optional[Customization] + database inference llama_stack mcp_servers : list[ModelContextProtocolServer] name : str service user_data_collection dump(filename: str) -> None } @@ +class "DatabaseConfiguration" as src.models.config.DatabaseConfiguration { + sqlite : Optional[SQLiteDatabaseConfiguration] + postgres : Optional[PostgreSQLDatabaseConfiguration] + check_database_configuration() -> Self + db_type + config +} +class "SQLiteDatabaseConfiguration" as src.models.config.SQLiteDatabaseConfiguration { + db_path : str +} +class "PostgreSQLDatabaseConfiguration" as src.models.config.PostgreSQLDatabaseConfiguration { + host : str + port : int + db : str + user : str + password : str + namespace : Optional[str] + ssl_mode : str + gss_encmode : str + ca_cert_path : Optional[FilePath] + check_postgres_configuration() -> Self +} @@ +src.models.config.DatabaseConfiguration --* src.models.config.Configuration : database +src.models.config.SQLiteDatabaseConfiguration --* src.models.config.DatabaseConfiguration : sqlite +src.models.config.PostgreSQLDatabaseConfiguration --* src.models.config.DatabaseConfiguration : postgresAlso applies to: 71-77, 79-85
docs/openapi.json (1)
41-55: Invalid OpenAPI response format – extraneousname/versionfields
The spec validator fails because onlydescription,headers,content, andlinksare allowed at the response object level. Move your sample fields under the media‐type’sexample(orexamples) property.Please update
docs/openapi.jsonat the following locations:
- Lines 41–55 (
/v1/infoGET response)- Lines 67–99
- Lines 124–137
- Lines 235–275
- Lines 388–405
- Lines 446–463
- Lines 518–521
- Lines 555–577
- Lines 599–609
- Lines 665–674
Apply a patch similar to:
"responses": { "200": { "description": "Successful Response", "content": { - "application/json": { - "schema": { "$ref": "#/components/schemas/InfoResponse" }, - "name": "Service name", - "version": "Service version" - } + "application/json": { + "schema": { "$ref": "#/components/schemas/InfoResponse" }, + "example": { + "name": "Service name", + "version": "Service version" + } + } } } }After making these changes, rerun your OpenAPI validation to confirm there are no more unevaluated‐properties errors.
🧹 Nitpick comments (11)
README.md (1)
45-45: Add a short migration note for Data CollectorConsider adding a “Where did Data Collector go?” note linking to the new repo and minimal migration steps. This will help users who have old configs/scripts.
.github/workflows/e2e_tests.yaml (1)
4-4: Security: pull_request_target runs with secretsThis workflow uses pull_request_target and provides OPENAI_API_KEY as a job-level secret. Even with persist-credentials off, secrets are accessible to steps that operate on PR-provided code/content (e.g., docker compose). Consider moving to pull_request with a hardened approach (or workflow_run from a trusted branch) and using a scoped token or ephemeral secret provisioning.
docs/getting_started.md (1)
267-270: Auth block addition LGTM; add note on precedence with service.auth_enabledGood addition. Please add a one-liner clarifying how
service.auth_enabledinteracts with the new top-levelauthentication(which one takes precedence), to avoid confusion.lightspeed-stack.yaml (1)
23-26: Top-level authentication block added; clarify/deprecate overlapping auth flagsThis is good. Given
service.auth_enabledstill exists, please document or enforce precedence (e.g., deprecateservice.auth_enabledin favor ofauthentication.module) to avoid dual-source-of-truth confusion.docs/deployment_guide.md (3)
572-575: Auth config snippet updated — add brief usage noteConsider adding one sentence under this snippet explaining what
"noop"does and how to switch providers.
885-887: Auth config snippet updated — keep examples consistentLooks good. Repeat a short note about auth provider selection here for consistency.
1652-1654: Auth block addition LGTM (container flow)Consistent with other examples. A single centralized “Authentication” subsection referenced by these snippets could reduce duplication.
src/lightspeed_stack.py (1)
56-59: Wire the --verbose flag to logging levelVerbose currently doesn’t affect log level. Set root and handler levels to DEBUG when requested.
parser = create_argument_parser() args = parser.parse_args() + if args.verbose: + logging.getLogger().setLevel(logging.DEBUG) + for h in logging.getLogger().handlers: + h.setLevel(logging.DEBUG) + configuration.load_configuration(args.config_file)src/models/config.py (3)
49-55: Tighten PostgreSQL port validation messagesThe logic is correct, but messages can be clearer and aligned with the checks.
- if self.port <= 0: - raise ValueError("Port value should not be negative") - if self.port > 65535: - raise ValueError("Port value should be less than 65536") + if self.port <= 0: + raise ValueError("Port must be greater than 0") + if self.port > 65535: + raise ValueError("Port must be <= 65535")
73-75: Make default SQLite path portable across platformsHardcoding /tmp may not be portable (e.g., Windows) and could be restricted in some environments. Use tempfile.gettempdir().
+import tempfile @@ - sqlite_file_name = "/tmp/lightspeed-stack.db" - self.sqlite = SQLiteDatabaseConfiguration(db_path=sqlite_file_name) + tmp_dir = Path(tempfile.gettempdir()) + sqlite_file_name = tmp_dir / "lightspeed-stack.db" + self.sqlite = SQLiteDatabaseConfiguration(db_path=str(sqlite_file_name))
152-160: Nit: unify “Llama” casing in error messagesMaintain consistent casing (“Llama”) across messages.
- "LLama stack URL is not specified and library client mode is not specified" + "Llama stack URL is not specified and library client mode is not specified" @@ - "LLama stack URL is not specified and library client mode is not enabled" + "Llama stack URL is not specified and library client mode is not enabled" @@ - "LLama stack library client mode is enabled but a configuration file path is not specified" # noqa: E501 + "Llama stack library client mode is enabled but a configuration file path is not specified" # noqa: E501Also applies to: 166-168
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.github/workflows/e2e_tests.yaml(1 hunks)Makefile(1 hunks)README.md(1 hunks)docs/config.puml(2 hunks)docs/deployment_guide.md(4 hunks)docs/getting_started.md(1 hunks)docs/openapi.json(13 hunks)docs/openapi.md(0 hunks)docs/output.md(0 hunks)docs/testing.md(1 hunks)lightspeed-stack.yaml(1 hunks)src/constants.py(0 hunks)src/lightspeed_stack.py(1 hunks)src/models/config.py(1 hunks)src/runners/data_collector.py(0 hunks)src/services/data_collector.py(0 hunks)tests/unit/models/test_config.py(0 hunks)tests/unit/runners/test_data_collector_runner.py(0 hunks)tests/unit/services/test_data_collector.py(0 hunks)
💤 Files with no reviewable changes (8)
- src/runners/data_collector.py
- tests/unit/runners/test_data_collector_runner.py
- src/constants.py
- docs/openapi.md
- src/services/data_collector.py
- docs/output.md
- tests/unit/models/test_config.py
- tests/unit/services/test_data_collector.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (8)
docs/testing.md (1)
108-110: Unit test tree update for runners looks correctRemoval of data collector runner tests is reflected here. Matches the PR objective.
Makefile (1)
11-11: Removed run-data-collector targetTarget removal aligns with the feature deprecation. README Make targets are consistent.
docs/deployment_guide.md (1)
1497-1499: Auth block addition LGTMMatches the new configuration surface.
src/lightspeed_stack.py (1)
50-50: All data-collector references successfully removedVerified across the repository with:
rg -n --hidden --glob '!*venv*' 'data_collector|start_data_collector|--data-collector'(no matches found)No further action required.
src/models/config.py (2)
42-44: PostgreSQL default constants verified
The constantsPOSTGRES_DEFAULT_SSL_MODE(line 53) andPOSTGRES_DEFAULT_GSS_ENCMODE(line 55) are present insrc/constants.py. No further action required.
1-305: No Data Collector references detected
A repository-wide search for common Data Collector identifiers (data_collector,DataCollector,ingress_server_url,ingress_server_auth_token,ingress_content_service_name,collection_interval,cleanup_after_send,connection_timeout,DATA_COLLECTOR) returned zero matches. All references have been removed.docs/config.puml (1)
29-34: LGTM: InferenceConfiguration method surfacedThe addition of check_default_model_and_provider in the diagram matches the model. No issues.
docs/openapi.json (1)
825-832: LGTM: Configuration now includes database with default SQLiteThis aligns with code defaults and DatabaseConfiguration schema.
| "/v1/conversations": { | ||
| "get": { | ||
| "tags": [ | ||
| "conversations" | ||
| ], | ||
| "summary": "Get Conversations List Endpoint Handler", | ||
| "description": "Handle request to retrieve all conversations for the authenticated user.", | ||
| "operationId": "get_conversations_list_endpoint_handler_v1_conversations_get", | ||
| "responses": { | ||
| "200": { | ||
| "description": "Successful Response", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/ConversationsListResponse" | ||
| } | ||
| } | ||
| }, | ||
| "conversations": [ | ||
| { | ||
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | ||
| "created_at": "2024-01-01T00:00:00Z", | ||
| "last_message_at": "2024-01-01T00:05:00Z", | ||
| "last_used_model": "gemini/gemini-1.5-flash", | ||
| "last_used_provider": "gemini", | ||
| "message_count": 5 | ||
| }, | ||
| { | ||
| "conversation_id": "456e7890-e12b-34d5-a678-901234567890", | ||
| "created_at": "2024-01-01T01:00:00Z", | ||
| "last_message_at": "2024-01-01T01:02:00Z", | ||
| "last_used_model": "gemini/gemini-2.0-flash", | ||
| "last_used_provider": "gemini", | ||
| "message_count": 2 | ||
| } | ||
| ] | ||
| }, | ||
| "503": { | ||
| "description": "Service Unavailable", | ||
| "detail": { | ||
| "response": "Unable to connect to Llama Stack", | ||
| "cause": "Connection error." | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add 400/403 responses to /v1/conversations (endpoint states “authenticated user”)
The description says it returns conversations for the authenticated user, but only 200/503 are documented. Include 400 (missing/invalid credentials) and 403 (unauthorized) like other endpoints.
"/v1/conversations": {
"get": {
"tags": ["conversations"],
"summary": "Get Conversations List Endpoint Handler",
"description": "Handle request to retrieve all conversations for the authenticated user.",
"operationId": "get_conversations_list_endpoint_handler_v1_conversations_get",
"responses": {
"200": { ... },
+ "400": {
+ "description": "Missing or invalid credentials provided by client",
+ "content": {
+ "application/json": {
+ "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" }
+ }
+ }
+ },
+ "403": {
+ "description": "User is not authorized",
+ "content": {
+ "application/json": {
+ "schema": { "$ref": "#/components/schemas/ForbiddenResponse" }
+ }
+ }
+ },
"503": { ... }
}
}
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "/v1/conversations": { | |
| "get": { | |
| "tags": [ | |
| "conversations" | |
| ], | |
| "summary": "Get Conversations List Endpoint Handler", | |
| "description": "Handle request to retrieve all conversations for the authenticated user.", | |
| "operationId": "get_conversations_list_endpoint_handler_v1_conversations_get", | |
| "responses": { | |
| "200": { | |
| "description": "Successful Response", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/ConversationsListResponse" | |
| } | |
| } | |
| }, | |
| "conversations": [ | |
| { | |
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | |
| "created_at": "2024-01-01T00:00:00Z", | |
| "last_message_at": "2024-01-01T00:05:00Z", | |
| "last_used_model": "gemini/gemini-1.5-flash", | |
| "last_used_provider": "gemini", | |
| "message_count": 5 | |
| }, | |
| { | |
| "conversation_id": "456e7890-e12b-34d5-a678-901234567890", | |
| "created_at": "2024-01-01T01:00:00Z", | |
| "last_message_at": "2024-01-01T01:02:00Z", | |
| "last_used_model": "gemini/gemini-2.0-flash", | |
| "last_used_provider": "gemini", | |
| "message_count": 2 | |
| } | |
| ] | |
| }, | |
| "503": { | |
| "description": "Service Unavailable", | |
| "detail": { | |
| "response": "Unable to connect to Llama Stack", | |
| "cause": "Connection error." | |
| } | |
| } | |
| } | |
| } | |
| }, | |
| "/v1/conversations": { | |
| "get": { | |
| "tags": [ | |
| "conversations" | |
| ], | |
| "summary": "Get Conversations List Endpoint Handler", | |
| "description": "Handle request to retrieve all conversations for the authenticated user.", | |
| "operationId": "get_conversations_list_endpoint_handler_v1_conversations_get", | |
| "responses": { | |
| "200": { | |
| "description": "Successful Response", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/ConversationsListResponse" | |
| } | |
| } | |
| }, | |
| "conversations": [ | |
| { | |
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | |
| "created_at": "2024-01-01T00:00:00Z", | |
| "last_message_at": "2024-01-01T00:05:00Z", | |
| "last_used_model": "gemini/gemini-1.5-flash", | |
| "last_used_provider": "gemini", | |
| "message_count": 5 | |
| }, | |
| { | |
| "conversation_id": "456e7890-e12b-34d5-a678-901234567890", | |
| "created_at": "2024-01-01T01:00:00Z", | |
| "last_message_at": "2024-01-01T01:02:00Z", | |
| "last_used_model": "gemini/gemini-2.0-flash", | |
| "last_used_provider": "gemini", | |
| "message_count": 2 | |
| } | |
| ] | |
| }, | |
| "400": { | |
| "description": "Missing or invalid credentials provided by client", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/UnauthorizedResponse" | |
| } | |
| } | |
| } | |
| }, | |
| "403": { | |
| "description": "User is not authorized", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/ForbiddenResponse" | |
| } | |
| } | |
| } | |
| }, | |
| "503": { | |
| "description": "Service Unavailable", | |
| "detail": { | |
| "response": "Unable to connect to Llama Stack", | |
| "cause": "Connection error." | |
| } | |
| } | |
| } | |
| } | |
| }, |
🤖 Prompt for AI Agents
In docs/openapi.json around lines 370 to 416, the /v1/conversations GET
currently documents only 200 and 503 responses but the endpoint is for an
authenticated user; add standard 400 and 403 responses consistent with other
endpoints by adding a 400 response for missing/invalid credentials (brief
description and reference to the existing error schema or inline error object)
and a 403 response for unauthorized access (description and same error schema
reference), ensuring the response objects match the OpenAPI structure used
elsewhere (content/application/json with the proper error schema $ref or
fields).
| "ConversationDetails": { | ||
| "properties": { | ||
| "conversation_id": { | ||
| "type": "string", | ||
| "title": "Conversation Id" | ||
| }, | ||
| "created_at": { | ||
| "anyOf": [ | ||
| { | ||
| "type": "string" | ||
| }, | ||
| { | ||
| "type": "null" | ||
| } | ||
| ], | ||
| "title": "Created At" | ||
| }, | ||
| "last_message_at": { | ||
| "anyOf": [ | ||
| { | ||
| "type": "string" | ||
| }, | ||
| { | ||
| "type": "null" | ||
| } | ||
| ], | ||
| "title": "Last Message At" | ||
| }, | ||
| "message_count": { | ||
| "anyOf": [ | ||
| { | ||
| "type": "integer" | ||
| }, | ||
| { | ||
| "type": "null" | ||
| } | ||
| ], | ||
| "title": "Message Count" | ||
| }, | ||
| "last_used_model": { | ||
| "anyOf": [ | ||
| { | ||
| "type": "string" | ||
| }, | ||
| { | ||
| "type": "null" | ||
| } | ||
| ], | ||
| "title": "Last Used Model" | ||
| }, | ||
| "last_used_provider": { | ||
| "anyOf": [ | ||
| { | ||
| "type": "string" | ||
| }, | ||
| { | ||
| "type": "null" | ||
| } | ||
| ], | ||
| "title": "Last Used Provider" | ||
| } | ||
| }, | ||
| "type": "object", | ||
| "required": [ | ||
| "conversation_id" | ||
| ], | ||
| "title": "ConversationDetails", | ||
| "description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n model: The model used for the conversation.\n\nExample:\n ```python\n conversation = ConversationSummary(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\"\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n model=\"gemini/gemini-2.0-flash\"\n )\n ```" | ||
| }, |
There was a problem hiding this comment.
Fix ConversationDetails description: use last_used_model/last_used_provider; update examples
The schema fields are last_used_model and last_used_provider, but the description refers to model and examples omit last_used_provider.
"ConversationDetails": {
"properties": { ... },
"type": "object",
"required": ["conversation_id"],
"title": "ConversationDetails",
- "description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n model: The model used for the conversation.\n\nExample:\n ```python\n conversation = ConversationSummary(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\"\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n model=\"gemini/gemini-2.0-flash\"\n )\n ```"
+ "description": "Model representing the details of a user conversation.\n\nAttributes:\n conversation_id: The conversation ID (UUID).\n created_at: When the conversation was created.\n last_message_at: When the last message was sent.\n message_count: Number of user messages in the conversation.\n last_used_model: The last model used in the conversation.\n last_used_provider: The provider of the last model.\n\nExample:\n ```python\n conversation = ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n last_used_model=\"gemini/gemini-2.0-flash\",\n last_used_provider=\"gemini\"\n )\n ```"
},🤖 Prompt for AI Agents
In docs/openapi.json around lines 911 to 979, the ConversationDetails
description and example refer to "model" and omit "last_used_provider" while the
schema uses last_used_model and last_used_provider; update the description to
mention last_used_model and last_used_provider and change the example to
construct a ConversationDetails object (not ConversationSummary) including both
last_used_model and last_used_provider with appropriate sample values, and
ensure the attribute names in the docstring match the schema exactly.
| "ConversationsListResponse": { | ||
| "properties": { | ||
| "conversations": { | ||
| "items": { | ||
| "$ref": "#/components/schemas/ConversationDetails" | ||
| }, | ||
| "type": "array", | ||
| "title": "Conversations" | ||
| } | ||
| }, | ||
| "type": "object", | ||
| "required": [ | ||
| "conversations" | ||
| ], | ||
| "title": "ConversationsListResponse", | ||
| "description": "Model representing a response for listing conversations of a user.\n\nAttributes:\n conversations: List of conversation details associated with the user.\n\nExample:\n ```python\n conversations_list = ConversationsListResponse(\n conversations=[\n ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n model=\"gemini/gemini-2.0-flash\"\n ),\n ConversationDetails(\n conversation_id=\"456e7890-e12b-34d5-a678-901234567890\"\n created_at=\"2024-01-01T01:00:00Z\",\n message_count=2,\n model=\"gemini/gemini-2.5-flash\"\n )\n ]\n )\n ```", | ||
| "examples": [ | ||
| { | ||
| "conversations": [ | ||
| { | ||
| "conversation_id": "123e4567-e89b-12d3-a456-426614174000", | ||
| "created_at": "2024-01-01T00:00:00Z", | ||
| "last_message_at": "2024-01-01T00:05:00Z", | ||
| "message_count": 5, | ||
| "model": "gemini/gemini-2.0-flash" | ||
| }, | ||
| { | ||
| "conversation_id": "456e7890-e12b-34d5-a678-901234567890", | ||
| "created_at": "2024-01-01T01:00:00Z", | ||
| "message_count": 2, | ||
| "model": "gemini/gemini-2.5-flash" | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, |
There was a problem hiding this comment.
Fix ConversationsListResponse examples: replace model with last_used_model and include last_used_provider
Examples use a non-existent model field.
"ConversationsListResponse": {
"properties": { ... },
"type": "object",
"required": ["conversations"],
"title": "ConversationsListResponse",
- "description": "Model representing a response for listing conversations of a user.\n\nAttributes:\n conversations: List of conversation details associated with the user.\n\nExample:\n ```python\n conversations_list = ConversationsListResponse(\n conversations=[\n ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n model=\"gemini/gemini-2.0-flash\"\n ),\n ConversationDetails(\n conversation_id=\"456e7890-e12b-34d5-a678-901234567890\"\n created_at=\"2024-01-01T01:00:00Z\",\n message_count=2,\n model=\"gemini/gemini-2.5-flash\"\n )\n ]\n )\n ```",
+ "description": "Model representing a response for listing conversations of a user.\n\nAttributes:\n conversations: List of conversation details associated with the user.\n\nExample:\n ```python\n conversations_list = ConversationsListResponse(\n conversations=[\n ConversationDetails(\n conversation_id=\"123e4567-e89b-12d3-a456-426614174000\",\n created_at=\"2024-01-01T00:00:00Z\",\n last_message_at=\"2024-01-01T00:05:00Z\",\n message_count=5,\n last_used_model=\"gemini/gemini-2.0-flash\",\n last_used_provider=\"gemini\"\n ),\n ConversationDetails(\n conversation_id=\"456e7890-e12b-34d5-a678-901234567890\",\n created_at=\"2024-01-01T01:00:00Z\",\n message_count=2,\n last_used_model=\"gemini/gemini-2.5-flash\",\n last_used_provider=\"gemini\"\n )\n ]\n )\n ```",
"examples": [
{
"conversations": [
{
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"created_at": "2024-01-01T00:00:00Z",
"last_message_at": "2024-01-01T00:05:00Z",
- "message_count": 5,
- "model": "gemini/gemini-2.0-flash"
+ "message_count": 5,
+ "last_used_model": "gemini/gemini-2.0-flash",
+ "last_used_provider": "gemini"
},
{
"conversation_id": "456e7890-e12b-34d5-a678-901234567890",
"created_at": "2024-01-01T01:00:00Z",
- "message_count": 2,
- "model": "gemini/gemini-2.5-flash"
+ "message_count": 2,
+ "last_used_model": "gemini/gemini-2.5-flash",
+ "last_used_provider": "gemini"
}
]
}
]
}🤖 Prompt for AI Agents
In docs/openapi.json around lines 1024 to 1059, the ConversationsListResponse
examples and example Python snippet reference a non-existent "model" field;
update both the description/example block and the JSON examples to replace
"model" with "last_used_model" and add "last_used_provider" (e.g.,
last_used_model: "gemini/gemini-2.0-flash", last_used_provider: "gemini") for
each ConversationDetails entry so the examples match the actual schema.
| from typing import Optional | ||
|
|
||
| from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl, PositiveInt | ||
| from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use Field(default_factory=...) to avoid shared default instances and mutable defaults
Several fields instantiate BaseModel instances or lists as defaults. In pydantic, prefer Field(default_factory=...) to prevent shared state across instances.
Apply:
-from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl
+from pydantic import BaseModel, model_validator, FilePath, AnyHttpUrl, Field
@@
- tls_config: TLSConfiguration = TLSConfiguration()
+ tls_config: TLSConfiguration = Field(default_factory=TLSConfiguration)
@@
-class JwkConfiguration(BaseModel):
+class JwkConfiguration(BaseModel):
@@
- jwt_configuration: JwtConfiguration = JwtConfiguration()
+ jwt_configuration: JwtConfiguration = Field(default_factory=JwtConfiguration)
@@
class Configuration(BaseModel):
@@
- database: DatabaseConfiguration = DatabaseConfiguration()
- mcp_servers: list[ModelContextProtocolServer] = []
- authentication: Optional[AuthenticationConfiguration] = (
- AuthenticationConfiguration()
- )
+ database: DatabaseConfiguration = Field(default_factory=DatabaseConfiguration)
+ mcp_servers: list[ModelContextProtocolServer] = Field(default_factory=list)
+ authentication: Optional[AuthenticationConfiguration] = Field(
+ default_factory=AuthenticationConfiguration
+ )
customization: Optional[Customization] = None
- inference: InferenceConfiguration = InferenceConfiguration()
+ inference: InferenceConfiguration = Field(default_factory=InferenceConfiguration)Also applies to: 108-109, 199-208, 289-300
🤖 Prompt for AI Agents
In src/models/config.py around lines 6 (and also apply same change at 108-109,
199-208, 289-300), several Pydantic model fields are using mutable default
values (BaseModel instances or lists) directly; replace those direct defaults
with Field(default_factory=...) so each model instance gets its own fresh
object; update the import to include Field if missing and change each offending
field declaration to use Field(default_factory=lambda: <default>) or
Field(default_factory=SomeModel) as appropriate.
|
There is a bigger change in |
f961d3f to
a9c1097
Compare
Description
Complete removal of the data collector service and all its references from the codebase. The data collector functionality has been separated into its own service/repository.
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Chores
Documentation
Tests