diff --git a/README.md b/README.md index 89c0784ec..779c4ac31 100644 --- a/README.md +++ b/README.md @@ -80,6 +80,7 @@ The service includes comprehensive user data collection capabilities for various * [REST API](#rest-api) * [Sequence diagrams](#sequence-diagrams) * [Query endpoint REST API handler](#query-endpoint-rest-api-handler) + * [Streaming query endpoint REST API handler](#streaming-query-endpoint-rest-api-handler) @@ -900,3 +901,7 @@ For complete integration setup, deployment options, and configuration details, s ### Query endpoint REST API handler ![Query endpoint](docs/query_endpoint.svg) + +## Streaming query endpoint REST API handler + +![Streaming query endpoint](docs/streaming_query_endpoint.svg) diff --git a/docs/streaming_query_endpoint.puml b/docs/streaming_query_endpoint.puml new file mode 100644 index 000000000..316895def --- /dev/null +++ b/docs/streaming_query_endpoint.puml @@ -0,0 +1,43 @@ +@startuml + +participant Client +participant Endpoint as "Streaming query endpoint handler" +participant Auth +participant LlamaStack as "Llama Stack Client" +participant EventHandler as "Stream build event" +participant SSE as "SSE Response Stream" + +Client->>Endpoint: HTTP POST /stream_query +Endpoint->>Auth: Validate auth, user, conversation access +Auth-->>Endpoint: Access granted +Endpoint->>LlamaStack: Call retrieve_response(model, query) +LlamaStack-->>Endpoint: AsyncIterator[AgentTurnResponseStreamChunk] + +Endpoint->>SSE: stream_start_event(conversation_id) +SSE-->>Client: SSE: start + +loop For each chunk from LlamaStack + Endpoint->>EventHandler: stream_build_event(chunk, chunk_id, metadata) + alt Chunk Type: turn_start + EventHandler->>SSE: emit turn_start event + else Chunk Type: inference + EventHandler->>SSE: emit inference (token) event + else Chunk Type: tool_execution + EventHandler->>SSE: emit tool_call + tool_result events + else Chunk Type: shield + EventHandler->>SSE: emit shield validation event + else Chunk Type: turn_complete + EventHandler->>SSE: emit turn_complete event + else Error + EventHandler->>SSE: emit error event + end + SSE-->>Client: SSE event(s) +end + +Endpoint->>SSE: stream_end_event(metadata, summary, token_usage) +SSE-->>Client: SSE: end (with metadata) + +Endpoint->>Endpoint: Conditionally persist transcript & cache +Endpoint-->>Client: Close stream + +@enduml diff --git a/docs/streaming_query_endpoint.svg b/docs/streaming_query_endpoint.svg new file mode 100644 index 000000000..ed35f7439 --- /dev/null +++ b/docs/streaming_query_endpoint.svg @@ -0,0 +1,135 @@ + + + + + + + + + + + + + + Client + + Client + + Streaming query endpoint handler + + Streaming query endpoint handler + + Auth + + Auth + + Llama Stack Client + + Llama Stack Client + + Stream build event + + Stream build event + + SSE Response Stream + + SSE Response Stream + + + + HTTP POST /stream_query + + + + Validate auth, user, conversation access + + + + Access granted + + + + Call retrieve_response(model, query) + + + + AsyncIterator[AgentTurnResponseStreamChunk] + + + + stream_start_event(conversation_id) + + + + SSE: start + + + loop + [For each chunk from LlamaStack] + + + + stream_build_event(chunk, chunk_id, metadata) + + + alt + [Chunk Type: turn_start] + + + + emit turn_start event + + [Chunk Type: inference] + + + + emit inference (token) event + + [Chunk Type: tool_execution] + + + + emit tool_call + tool_result events + + [Chunk Type: shield] + + + + emit shield validation event + + [Chunk Type: turn_complete] + + + + emit turn_complete event + + [Error] + + + + emit error event + + + + SSE event(s) + + + + stream_end_event(metadata, summary, token_usage) + + + + SSE: end (with metadata) + + + + + + Conditionally persist transcript & cache + + + + Close stream + + +