UN-2807 [FEAT] Add user_data variable support for Prompt Studio by jaags-dev · Pull Request #1544 · Zipstack/unstract

jaags-dev · 2025-09-19T07:17:10Z

What

Add user_data variable support for Prompt Studio to enable JSON data usage in prompts with dot notation
Implement user_data field validation and processing for API deployments
Support nested JSON object access in variable replacement (e.g., {{user_data.name}}, {{user_data.address.city}})
Enable user_data variables in both direct API deployments and exported Prompt Studio tools

Why

Users need ability to pass dynamic JSON data to prompts during API deployment execution
Current variable system only supports static and dynamic variables, missing support for user-provided JSON data
Prompt Studio tools exported as containers need access to the same user_data functionality as direct API calls
Enhances flexibility for users to create more dynamic and data-driven prompt templates

How

Enhanced ExecutionRequestSerializer with user_data JSONField including JSON validation
Added USER_DATA variable type and regex pattern matching in prompt service constants
Implemented dot notation parsing for nested JSON traversal in variable replacement engine
Updated workflow execution pipeline to pass user_data through metadata files to tool containers
Modified tool structure to extract user_data from metadata and forward to prompt service
Added user_data to workflow execution context while excluding from incompatible method parameters
Fixed dumb-init issue in runner container execution for improved tool container reliability

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, this PR should not break any existing features because:

user_data field is optional (required=False, allow_null=True) in API serializer
Added to EXECUTION_EXCLUDED_PARAMS to prevent passing to incompatible methods
Variable replacement maintains backward compatibility with existing static/dynamic variables
New USER_DATA variable type is additive and doesn't modify existing variable processing logic
Metadata file changes are additive (only adds user_data key if provided)
Tool structure changes are backward compatible (handles missing user_data gracefully)

Database Migrations

No database schema changes required
All changes are at the application/service layer

Env Config

No new environment variables required
Uses existing workflow execution and metadata infrastructure

Relevant Docs

Added comprehensive implementation documentation for user_data variable system
Variable replacement logic documented in prompt service helpers

Related Issues or PRs

Dependencies Versions

Notes on Testing

Tested user_data JSON validation in API serializer
Verified dot notation variable replacement for nested objects (user_data.address.city)
Confirmed exported Prompt Studio tools receive user_data through metadata
Validated backward compatibility with existing static/dynamic variables
Tested error handling for invalid JSON and missing object keys
Verified workflow execution pipeline passes user_data correctly

Screenshots

N/A - Backend API feature

Checklist

I have read and understood the Contribution Guidelines.

…ides

coderabbitai · 2025-09-19T07:17:18Z

Summary by CodeRabbit

New Features
- Added support for an optional user_data payload across API deployments and workflow execution.
- Enabled {{user_data.*}} variable substitution in prompts for dynamic, user-specific content.
- Propagates user_data into file metadata and tool execution payloads.
Refactor/Style
- Minor log and message formatting improvements.
Chores
- Upgraded unstract-sdk dependencies to 0.77.2 across services.
- Updated Structure Tool image and metadata to version 0.0.87.

Walkthrough

Threads an optional user_data payload from API requests through deployment and workflow paths into file metadata and prompt variable substitution. Adds constants/regex/enums, serializer and service parameters, updates storage metadata handling, and bumps unstract-sdk versions; plus minor formatting tweaks.

Changes

Cohort / File(s)	Summary of changes
API v2: input & forwarding `backend/api_v2/api_deployment_views.py`, `backend/api_v2/constants.py`, `backend/api_v2/deployment_helper.py`, `backend/api_v2/serializers.py`	Accept optional `user_data` in requests; add `ApiExecution.USER_DATA`; pass `user_data` into `DeploymentHelper.execute_workflow`; add serializer JSONField and validator.
Workflow manager: propagate user_data `backend/workflow_manager/workflow_v2/workflow_helper.py`, `backend/workflow_manager/workflow_v2/dto.py`, `backend/workflow_manager/workflow_v2/file_execution_tasks.py`, `backend/workflow_manager/endpoint_v2/source.py`	Add `user_data` parameter to public helpers (`process_input_files`, `run_workflow`, `execute_workflow_async`), include `user_data` in `FileData`, forward to `SourceConnector.add_file_to_volume` and async task kwargs; exclude `"user_data"` from execution params.
Workflow execution lib: metadata API `unstract/workflow-execution/src/unstract/workflow_execution/constants.py`, `unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py`	Add `MetaDataKey.USER_DATA`; `ExecutionFileHandler.add_metadata_to_volume` now accepts `user_data` and writes `USER_DATA` metadata (replaces previous llm_profile_id handling).
Prompt service: variable substitution `prompt-service/src/unstract/prompt_service/constants.py`, `prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py`, `prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py`, `prompt-service/src/unstract/prompt_service/services/variable_replacement.py`	Add `USER_DATA` constants/enum/regex; controllers extract/pass `user_data`; detect `USER_DATA` variables first; new `replace_user_data_variable` method and `USER_DATA` branch in replacement flow; services accept/propagate `user_data`.
Tools: structure tool wiring `tools/structure/src/constants.py`, `tools/structure/src/main.py`	Add `SettingsKeys.USER_DATA`; include `user_data` from exec metadata in StructureTool payload.
Dependency bumps `backend/pyproject.toml`, `platform-service/pyproject.toml`, `prompt-service/pyproject.toml`, `tools/structure/requirements.txt`	Update `unstract-sdk` version constraints to ~=0.77.2 (various extras).
Minor formatting/refactor `backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py`, `backend/tool_instance_v2/views.py`, `backend/workflow_manager/workflow_v2/models/execution.py`, `backend/workflow_manager/workflow_v2/execution.py`	Small string/log formatting changes and a call-site refactor to use class staticmethod; behavior unchanged.
Version/metadata updates `backend/sample.env`, `tools/structure/src/config/properties.json`	Bump structure tool image and `toolVersion` from 0.0.86 → 0.0.87.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant API as API v2
  participant DH as DeploymentHelper
  participant WH as WorkflowHelper
  participant Q as Async Queue
  participant FT as FileExecutionTasks
  participant SC as SourceConnector
  participant EFH as ExecutionFileHandler

  Client->>API: POST /execute { files, user_data? }
  API->>DH: execute_workflow(..., user_data)
  DH->>WH: execute_workflow_async(..., user_data)
  WH-->>Q: enqueue task with user_data
  Q-->>FT: run batch with FileData(user_data)
  FT->>SC: add_file_to_volume(..., user_data)
  SC->>EFH: add_metadata_to_volume(..., user_data)
  EFH-->>SC: write metadata (USER_DATA if provided)
  SC-->>FT: ok
  FT-->>Client: results via API callback/response

sequenceDiagram
  autonumber
  participant Client
  participant PS as Prompt Service
  participant VRS as VariableReplacementService
  participant VRH as VariableReplacementHelper

  Client->>PS: Request answer_prompt { ..., user_data? }
  PS->>VRS: replace_variables_in_prompt(..., user_data)
  VRS->>VRS: detect variable types (USER_DATA first)
  alt USER_DATA variable
    VRS->>VRH: replace_user_data_variable(prompt, "{{user_data.path}}", user_data)
    VRH-->>VRS: prompt text with substitution
  else DYNAMIC/STATIC
    VRS->>VRH: existing replacement paths
  end
  VRS-->>PS: replaced prompt
  PS-->>Client: response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description follows the repository template and provides detailed entries for What, Why, How, compatibility, migrations, env config, docs, testing notes, and screenshots. However, the Related Issues or PRs and Dependencies Versions sections are present but contain no content, which is incomplete. All other sections are filled with appropriate details, but these two required fields must be addressed before merging.	Please update the Related Issues or PRs section to either list relevant issue numbers or explicitly state None, and fill the Dependencies Versions section with the current SDK dependencies or indicate None if no new dependencies were added. Ensuring these two sections are populated will satisfy the template requirements.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly describes the main change by referencing the feature and issue key, clearly stating that it adds user_data variable support for Prompt Studio. It avoids extraneous details or vague terms, making it easy for reviewers to understand the PR’s purpose at a glance.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/user-data-variable-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (1)
656-659: Bug: possible UnboundLocalError for challenge_llm (and context)

If enable_challenge is true but no challenge plugin is available (or fails early), challenge_llm is undefined in finally. Same for context when retrieval didn’t run. Initialize both.

Apply this diff:
@@
-    for output in prompts:  # type:ignore
+    challenge_llm = None
+    context: list[str] = []
+    for output in prompts:  # type:ignore
@@
-        try:
+        try:
             answer = "NA"
@@
-                answer, context = RetrievalService.perform_retrieval(
+                answer, context = RetrievalService.perform_retrieval(
@@
-        finally:
-            challenge_metrics = (
-                {f"{challenge_llm.get_usage_reason()}_llm": challenge_llm.get_metrics()}
-                if enable_challenge and challenge_llm
-                else {}
-            )
+        finally:
+            challenge_metrics = {}
+            if enable_challenge and challenge_llm is not None:
+                challenge_metrics = {
+                    f"{challenge_llm.get_usage_reason()}_llm": challenge_llm.get_metrics()
+                }
backend/api_v2/api_deployment_views.py (1)

78-91: Critical: file-history/execution dedupe ignores user_data — cached API results can be returned for different user_data

Verified: EXECUTION_EXCLUDED_PARAMS includes "user_data" (backend/workflow_manager/workflow_v2/workflow_helper.py) and file-history keys/lookups are content-only (file_execution_tasks.py calls FileHistoryHelper.get_file_history(..., cache_key=content_hash) and file_history_helper.create_file_history persists cache_key=file_hash.file_hash).
Action: either include a stable hash of user_data in the file-history/cache key (combine content-hash + hash(json.dumps(user_data, sort_keys=True))) or disable caching when user_data is present.
Update points: backend/workflow_manager/workflow_v2/workflow_helper.py (EXECUTION_EXCLUDED_PARAMS / filtered kwargs), backend/workflow_manager/workflow_v2/file_execution_tasks.py (get_file_history call), backend/workflow_manager/workflow_v2/file_history_helper.py (create_file_history / persisted cache_key).

backend/api_v2/serializers.py (1)

229-259: Harden user_data validation and prevent cache leakage when user_data is present

Apply the serializer hardening from your diff: import json, add MAX_USER_DATA_BYTES (e.g. 65536), require dict, ensure JSON-serializable and reject oversized payloads — backend/api_v2/serializers.py (validate_user_data).

Keep the validate() change that sets data["use_file_history"] = False when user_data is present — this prevents reuse of FileHistory-based results (FileHistoryHelper / file_history records).

Critical: api-results caching is independent of FileHistory. ResultCacheUtils uses key "api_results:{workflow_id}:{execution_id}" (backend/workflow_manager/endpoint_v2/result_cache_utils.py) and DOES NOT include user_data — this can leak results if execution_id is reused. Fix (pick one):

Skip reading/writing api_results when workflow_execution.user_data is present, or

Include a stable hash of user_data in the api_results cache key.
Minimal, targeted change: wrap the call to ResultCacheUtils.update_api_results/get_api_results in backend/workflow_manager/workflow_v2/file_execution_tasks.py::_build_final_result (where update_api_results is invoked ~lines 1133–1137) with a guard that skips caching when workflow_execution.user_data exists.

🧹 Nitpick comments (20)

backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py (2)
123-131: Prefer parameterized logging; drop trailing space and include traceback for debuggability.

Use logger’s lazy formatting and capture the stack trace from the except block. Current f-string also has a trailing space after the exception message.

Apply:
-            logger.warning(
-                f"Error while fetching for prompt registry ID {prompt_registry_id}: {e} "
-            )
+            logger.warning(
+                "Error fetching tool for prompt registry ID %s",
+                prompt_registry_id,
+                exc_info=True,
+            )
215-220: Chain the original IntegrityError and use logger.exception with context.

Preserve the root cause via exception chaining and let logging capture the traceback. Also add tool_id for quick correlation.

Apply:
-            logger.error(
-                f"Integrity Error - Error occurred while exporting custom tool : {error}"
-            )
-            raise ToolSaveError
+            logger.exception(
+                "IntegrityError while exporting custom tool %s",
+                custom_tool.tool_id,
+            )
+            raise ToolSaveError from error
backend/workflow_manager/workflow_v2/execution.py (1)

54-56: Code style improvement: Use class name for static method calls.

Calling static methods using the class name is more common and can be called as either a class method or an instance method. The change from self.convert_tool_instance_model_to_data_class() to WorkflowExecutionServiceHelper.convert_tool_instance_model_to_data_class() improves code clarity by making it explicit that this is a static method call.

backend/workflow_manager/workflow_v2/models/execution.py (1)

121-125: Prefer keeping feature PRs focused (cosmetic change unrelated to user_data).

This drive-by formatting tweak is harmless, but consider moving non-functional comment cleanups to a separate “lint/comments” PR to reduce noise in feature diffs.

tools/structure/src/constants.py (1)

83-83: USER_DATA constant addition — OK; consider de-duplicating nearby keys.
Addition is consistent with the platform keys. Noting this file contains duplicate constants (e.g., NAME, OUTPUTS, TOOL_ID defined twice). Consider consolidating to avoid future confusion.
tools/structure/src/main.py (1)
223-224: Normalize user_data to a dict to avoid None/invalid types reaching downstream.
If the exec metadata contains user_data=None or a non-object, downstream substitution may break. Normalize defensively.

Apply this diff:
-        user_data = self.get_exec_metadata.get(SettingsKeys.USER_DATA, {})
-        payload["user_data"] = user_data
+        user_data = self.get_exec_metadata.get(SettingsKeys.USER_DATA)
+        if not isinstance(user_data, dict):
+            user_data = {}  # Normalize None/non-dicts
+        payload["user_data"] = user_data
Optional: enforce a size cap at the API layer; tool-side normalization here is sufficient.
backend/tool_instance_v2/views.py (1)

121-123: Minor: centralize error message in the exception class

Raising DuplicateData with a composed string is fine, but consider moving the message into the exception to satisfy TRY003 and keep messages DRY.

backend/api_v2/api_deployment_views.py (1)

73-74: Bound user_data size before propagation

Even with serializer validation, consider guarding here (or in serializer) with a conservative max payload size to prevent oversized prompt expansions and metadata bloat.

backend/api_v2/serializers.py (1)

213-215: Doc clarity: show nested example and limitations

Add a nested example (e.g., {{user_data.address.city}}) and clarify that array indexing isn’t supported in dot-notation.
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (1)
56-57: Guard user_data before use

Validate type and cap size here (in addition to API layer) to prevent oversized expansions if this endpoint is called directly.

Apply this minimal guard:
@@
-    user_data: dict[str, Any] = payload.get(PSKeys.USER_DATA, {})
+    user_data_raw = payload.get(PSKeys.USER_DATA, {})
+    if not isinstance(user_data_raw, dict):
+        user_data: dict[str, Any] = {}
+    else:
+        try:
+            import json
+            b = len(json.dumps(user_data_raw, separators=(",", ":"), ensure_ascii=False).encode("utf-8"))
+            user_data = user_data_raw if b <= 65536 else {}
+        except Exception:
+            user_data = {}
backend/workflow_manager/endpoint_v2/source.py (1)
924-925: Add user_data param: looks good; please document it in the docstring.

Signature change is fine. Add a brief docstring entry so downstream readers know it’s persisted to METADATA.json.
         Args:
             workflow_file_execution: WorkflowFileExecution model
             tags (list[str]): Tag names associated with the workflow execution.
             file_hash: FileHash model
             llm_profile_id (str, optional): LLM profile ID for overriding tool settings.
+            user_data (dict[str, Any] | None): Per-request user metadata to persist in METADATA.json.
backend/workflow_manager/workflow_v2/file_execution_tasks.py (1)

754-755: Forwarding user_data into volume metadata: LGTM; consider size guard upstream.

Looks correct. As a future hardening, consider enforcing a max size for user_data to avoid bloating per-file METADATA.json.
prompt-service/src/unstract/prompt_service/constants.py (1)
171-176: Remove unused noqa and keep the URL regex multiline.

Ruff flags the E501 noqa as unused; safe to drop.
     DYNAMIC_VARIABLE_URL_REGEX = (
         r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|$([^\s()<>]+|(\([^\s()<>]+$))*\))+(?:$([^\s()<>]+|(\([^\s()<>]+$))*\)|[^\s`!()\[\]{};:'\".,<>?«»"
-        "'']))"
-    )  # noqa: E501
+        "'']))"
+    )
Optional: if USER_DATA detection runs on pre-tokenized variables, you may tighten the pattern with start/end anchors to avoid accidental matches in free text.
unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py (2)
105-106: Signature addition: LGTM; update docstring to mention user_data.

The param is present but undocumented in the Parameters section.
         Parameters:
             input_file_path (str): The path of the input file.
             file_execution_id (str): Unique execution id for the file.
             source_hash (str): The hash value of the source/input file.
             tags (list[str]): Tag names associated with the workflow execution.
             llm_profile_id (str, optional): LLM profile ID for overriding tool settings.
+            user_data (dict[str, Any] | None): User-provided JSON metadata to store under MetaDataKey.USER_DATA.
145-148: Make user_data write robust to non-JSON types (defensive).

Serializer paths should provide JSON, but a defensive guard prevents unexpected TypeError during json_dump.
-        # Add user_data to metadata if provided
-        if user_data:
-            content[MetaDataKey.USER_DATA] = user_data
+        # Add user_data to metadata if provided (validate JSON-serializability)
+        if user_data is not None:
+            try:
+                json.dumps(user_data)  # validate
+                content[MetaDataKey.USER_DATA] = user_data
+            except TypeError:
+                logger.warning(
+                    "user_data contains non-JSON-serializable types; coercing with default=str"
+                )
+                content[MetaDataKey.USER_DATA] = json.loads(
+                    json.dumps(user_data, default=str)
+                )
prompt-service/src/unstract/prompt_service/services/variable_replacement.py (2)
37-37: Consider using explicit T | None type annotations

The static analysis hints suggest using explicit T | None instead of implicit Optional. While functionally equivalent, the modern type annotation style improves code clarity.

Apply this diff to use explicit union types:
-        user_data: dict[str, Any] = None,
+        user_data: dict[str, Any] | None = None,
-        prompt_text: str, variable_map: dict[str, Any], user_data: dict[str, Any] = None
+        prompt_text: str, variable_map: dict[str, Any], user_data: dict[str, Any] | None = None
Also applies to: 101-101

124-130: Consider handling the case where user_data is None for USER_DATA variables

When a USER_DATA variable is detected but user_data is None, the variable remains unreplaced in the prompt. This could lead to prompts with visible placeholder variables like {{user_data.name}}.

Consider logging a warning or replacing with an empty string when USER_DATA variables are found but no user_data is provided:
             elif variable_type == VariableType.USER_DATA:
                 if user_data:
                     prompt_text = VariableReplacementHelper.replace_user_data_variable(
                         prompt=prompt_text,
                         variable=variable_data,
                         user_data=user_data,
                     )
+                else:
+                    app.logger.warning(
+                        f"USER_DATA variable '{variable}' found but no user_data provided. "
+                        "Variable will remain unreplaced."
+                    )
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (2)
64-75: Potential for regex compilation optimization

The regex patterns are compiled on every call to identify_variable_type. Since these patterns are constant, consider compiling them once at module level or using lru_cache for better performance, especially when processing prompts with many variables.

Consider caching the compiled patterns:
+from functools import lru_cache
+
+@lru_cache(maxsize=2)
+def _get_compiled_patterns():
+    return {
+        'user_data': re.compile(VariableConstants.USER_DATA_VARIABLE_REGEX),
+        'dynamic': re.compile(VariableConstants.DYNAMIC_VARIABLE_URL_REGEX)
+    }
+
 @staticmethod
 def identify_variable_type(variable: str) -> VariableType:
     variable_type: VariableType

-    # Check for user_data variable type first
-    user_data_pattern = re.compile(VariableConstants.USER_DATA_VARIABLE_REGEX)
-    if re.findall(user_data_pattern, variable):
+    patterns = _get_compiled_patterns()
+    
+    # Check for user_data variable type first
+    if patterns['user_data'].findall(variable):
         variable_type = VariableType.USER_DATA
     else:
         # Check for dynamic variable type
-        dynamic_pattern = re.compile(VariableConstants.DYNAMIC_VARIABLE_URL_REGEX)
-        if re.findall(dynamic_pattern, variable):
+        if patterns['dynamic'].findall(variable):
             variable_type = VariableType.DYNAMIC
         else:
             variable_type = VariableType.STATIC
     return variable_type
132-140: Enhance error handling for nested path traversal

The current implementation catches KeyError and TypeError but doesn't distinguish between missing keys and attempting to access properties on non-dict values (e.g., trying to access user_data.name.city when name is a string).

Consider providing more specific error messages:
     # Navigate through the nested dictionary
     try:
         value = user_data
         for part in path_parts:
-            value = value[part]
-    except (KeyError, TypeError):
+            if not isinstance(value, dict):
+                app.logger.warning(
+                    f"Cannot traverse path {path_str}: '{part}' is not accessible on non-dict type {type(value).__name__}"
+                )
+                return prompt
+            value = value[part]
+    except KeyError as e:
         app.logger.warning(
-            f"Path {path_str} not found in user_data. Unable to replace variable {variable}"
+            f"Key '{e.args[0]}' not found in path {path_str}. Unable to replace variable {variable}"
         )
         return prompt
backend/workflow_manager/workflow_v2/workflow_helper.py (1)

689-702: user_data validated as JSON object in API — add size/schema limits if required

backend/api_v2/serializers.py defines user_data = JSONField(required=False, allow_null=True) and a validate_user_data that ensures the value is a dict. No explicit size limits or JSON schema validation were found; add payload size limits or schema validation here if you need to constrain shape/size.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 923502f and 7352c17.

📒 Files selected for processing (24)

backend/api_v2/api_deployment_views.py (2 hunks)
backend/api_v2/constants.py (1 hunks)
backend/api_v2/deployment_helper.py (2 hunks)
backend/api_v2/serializers.py (3 hunks)
backend/prompt_studio/prompt_studio_registry_v2/prompt_studio_registry_helper.py (2 hunks)
backend/pyproject.toml (1 hunks)
backend/tool_instance_v2/views.py (1 hunks)
backend/workflow_manager/endpoint_v2/source.py (2 hunks)
backend/workflow_manager/workflow_v2/dto.py (1 hunks)
backend/workflow_manager/workflow_v2/execution.py (1 hunks)
backend/workflow_manager/workflow_v2/file_execution_tasks.py (2 hunks)
backend/workflow_manager/workflow_v2/models/execution.py (1 hunks)
backend/workflow_manager/workflow_v2/workflow_helper.py (8 hunks)
platform-service/pyproject.toml (1 hunks)
prompt-service/pyproject.toml (1 hunks)
prompt-service/src/unstract/prompt_service/constants.py (3 hunks)
prompt-service/src/unstract/prompt_service/controllers/answer_prompt.py (2 hunks)
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (2 hunks)
prompt-service/src/unstract/prompt_service/services/variable_replacement.py (4 hunks)
tools/structure/requirements.txt (1 hunks)
tools/structure/src/constants.py (1 hunks)
tools/structure/src/main.py (1 hunks)
unstract/workflow-execution/src/unstract/workflow_execution/constants.py (1 hunks)
unstract/workflow-execution/src/unstract/workflow_execution/execution_file_handler.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

backend/api_v2/serializers.py

256-256: Avoid specifying long messages outside the exception class

(TRY003)

prompt-service/src/unstract/prompt_service/services/variable_replacement.py

37-37: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

101-101: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

prompt-service/src/unstract/prompt_service/constants.py

174-174: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (11)

backend/workflow_manager/workflow_v2/models/execution.py (1)

121-125: LGTM — confirm no migration created

Comment-only change; no functional impact. Verification here failed: manage.py not found. Run from the repository root and paste output:

python manage.py makemigrations --check --dry-run

(Or run via your environment: e.g. poetry run python manage.py ..., pipenv run python ..., or docker-compose run --rm web python ....)

prompt-service/pyproject.toml (1)

20-20: Azure extra bumped to 0.77.2 — OK.
Matches the cross-repo version; no issues spotted here.

backend/pyproject.toml (1)

40-40: Backend SDK to 0.77.2 — OK.
Version parity maintained with other services.

platform-service/pyproject.toml (1)

18-18: Platform-service SDK to 0.77.2 — OK.
Multi-cloud extras preserved; looks good.

backend/workflow_manager/endpoint_v2/source.py (1)

966-967: Pass-through to metadata: LGTM.

The user_data forwarding is correct and preserves backward compatibility with llm_profile_id.

backend/api_v2/deployment_helper.py (1)

238-239: Async propagation: LGTM.

user_data is correctly forwarded to execute_workflow_async.

prompt-service/src/unstract/prompt_service/constants.py (1)

70-71: New USER_DATA constants/enums: LGTM.

Also applies to: 151-152

prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (1)

102-152: Good implementation of dot-notation traversal

The implementation correctly handles nested dictionary access using dot notation (e.g., user_data.address.city) and properly integrates with existing helper methods for value formatting and replacement. The error handling ensures the prompt remains unchanged when paths are invalid.

backend/workflow_manager/workflow_v2/workflow_helper.py (3)

67-68: Constant naming convention

The constant name EXECUTION_EXCLUDED_PARAMS follows Python naming conventions correctly. Good addition of "user_data" to the exclusion list to maintain separation between workflow execution parameters and user-provided data.

154-154: Consistent user_data propagation throughout the workflow

The user_data parameter is correctly threaded through all the workflow execution paths:

From initial entry points (process_input_files, run_workflow, execute_workflow_async)

Through FileData objects for batch processing

To the Celery task execution

Finally to the prompt service

This ensures user context is available throughout the processing pipeline.

Also applies to: 206-206, 273-273, 325-325, 444-444, 484-484, 702-702

669-679: Clean parameter filtering for workflow execution

Good implementation of filtering out excluded parameters before passing to create_workflow_execution. This maintains backward compatibility while preventing user_data from being persisted in execution records.

harini-venkataraman

LGTM on the API deployment side of things. But, there are some changes needed for the user_data support from Prompt Studio, during the prompt development phase. Please add it as a TODO to handle in subsequent PRs. It is critical, because only then this support would be from the Prompt Studio side.

Ref - https://zipstack.slack.com/archives/C058WQU0R6E/p1757684853923349?thread_ts=1757594159.803519&cid=C058WQU0R6E

jaags-dev · 2025-09-19T07:43:59Z

LGTM on the API deployment side of things. But, there are some changes needed for the user_data support from Prompt Studio, during the prompt development phase. Please add it as a TODO to handle in subsequent PRs. It is critical, because only then this support would be from the Prompt Studio side.

True, We need to introduce this in Prompt Studio too. I'll add a TODO or ticket to keep track of this.

…ack/unstract into feature/user-data-variable-support

github-actions · 2025-09-19T08:22:42Z

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

sonarqubecloud · 2025-09-19T08:23:35Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

prompt-service/src/unstract/prompt_service/services/variable_replacement.py (2)
37-37: Fix type hint to follow modern Python standards.

The preferred way to denote an optional argument in Python 3.10 and up is Type | None rather than implicit optional types.

Apply this diff to use the modern type hint syntax:
-        user_data: dict[str, Any] = None,
+        user_data: dict[str, Any] | None = None,
101-101: Fix type hint to follow modern Python standards.

The preferred way to denote an optional argument in Python 3.10 and up is Type | None rather than implicit optional types.

Apply this diff to use the modern type hint syntax:
-        prompt_text: str, variable_map: dict[str, Any], user_data: dict[str, Any] = None
+        prompt_text: str, variable_map: dict[str, Any], user_data: dict[str, Any] | None = None
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (1)
138-139: Consider using logging.exception for better error context.

The static analysis tool suggests using logging.exception instead of logging.error when handling exceptions, as it automatically includes the exception traceback.

Apply this diff to improve error logging:
-            app.logger.error(error_msg)
-            raise ValueError(error_msg) from e
+            app.logger.exception(error_msg)
+            raise ValueError(error_msg) from e

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7352c17 and 21cd69c.

⛔ Files ignored due to path filters (3)

backend/uv.lock is excluded by !**/*.lock
platform-service/uv.lock is excluded by !**/*.lock
prompt-service/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

backend/api_v2/deployment_helper.py (3 hunks)
prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (2 hunks)
prompt-service/src/unstract/prompt_service/services/variable_replacement.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/api_v2/deployment_helper.py

🧰 Additional context used

🪛 Ruff (0.12.2)

prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py

138-138: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

prompt-service/src/unstract/prompt_service/services/variable_replacement.py

37-37: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

101-101: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (4)

prompt-service/src/unstract/prompt_service/services/variable_replacement.py (2)

67-81: LGTM! User data parameter correctly threaded through both code paths.

The implementation correctly propagates user_data through both the normal execution path (when variable_map exists) and the fallback path (for structured tools and APIs without a variable map). This ensures consistent behavior regardless of the execution context.

117-129: LGTM! Clean implementation of user data variable handling.

The logic correctly:

Converts the previous if-elif chain to handle the new USER_DATA variable type

Only processes user data variables when user_data is provided (guards against None)

Maintains the existing flow for STATIC and DYNAMIC variables

The elif structure ensures only one variable type is processed per iteration, which is appropriate behavior.

prompt-service/src/unstract/prompt_service/helpers/variable_replacement.py (2)

64-75: LGTM! Proper variable type identification priority.

The reordered logic correctly prioritizes USER_DATA detection first, then DYNAMIC, then defaults to STATIC. This ordering ensures that user data variables with dot notation (e.g., {{user_data.name}}) are properly identified before being misclassified as dynamic variables.

102-152: LGTM! Robust user data variable replacement implementation.

The implementation provides comprehensive error handling and follows the established patterns:

Validates user_data presence and variable format

Implements proper dot-notation traversal for nested JSON access

Uses existing helper methods (handle_json_and_str_types, replace_generic_string_value)

Provides clear error messages for debugging

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

backend/sample.env (1)

81-84: Consider pinning by digest to harden supply chain.

Tags can be retagged; prefer content-addressable refs (image@sha256:...) in STRUCTURE_TOOL_IMAGE_URL and keep TAG only for display. Non-blocker.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 21cd69c and 747f3db.

📒 Files selected for processing (2)

backend/sample.env (1 hunks)
tools/structure/src/config/properties.json (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tools/structure/src/config/properties.json

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (1)

backend/sample.env (1)

81-84: Keep STRUCTURE_TOOL_IMAGE_URL and STRUCTURE_TOOL_IMAGE_TAG in lockstep — manual verification required

Automated checks here failed (rg returned line-prefixed values and a follow-up script errored "conditional binary operator expected"). Confirm backend/sample.env has identical tag values and that the image/tag exists and matches any properties.json .toolVersion.

Verify locally with these commands:

Extract & compare tags:
tag_url=$(sed -nE 's/^STRUCTURE_TOOL_IMAGE_URL\s*=\s*".:([0-9]+.[0-9]+.[0-9]+)"./\1/p' backend/sample.env)
tag_var=$(sed -nE 's/^STRUCTURE_TOOL_IMAGE_TAG\s*=\s*"([0-9]+.[0-9]+.[0-9]+)".*/\1/p' backend/sample.env)
echo "url:$tag_url var:$tag_var"; [ "$tag_url" = "$tag_var" ] && echo "MATCH" || echo "MISMATCH: $tag_url vs $tag_var"

Check properties.json (if present):
find . -name properties.json -exec sh -c 'jq -r .toolVersion {} 2>/dev/null || echo "{}: no toolVersion"' ;

Confirm Docker Hub tag exists:
curl -sf -o /dev/null "https://hub.docker.com/v2/repositories/unstract/tool-structure/tags/${tag_var:-0.0.87}/" && echo "docker tag exists" || echo "docker tag missing"

If any step reports a mismatch or missing tag, update backend/sample.env (or properties.json) so both sources match and point to a published Docker tag.

…rnal APIs (#1494) * UN-2470 [MISC] Remove Django dependency from Celery workers This commit introduces a new worker architecture that decouples Celery workers from Django where possible, enabling support for gevent/eventlet pool types and reducing worker startup overhead. Key changes: - Created separate worker modules (api-deployment, callback, file_processing, general) - Added internal API endpoints for worker communication - Implemented Django-free task execution where appropriate - Added shared utilities and client facades - Updated container configurations for new worker architecture 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix pre-commit issues: file permissions and ruff errors Setup the docker for new workers - Add executable permissions to worker entrypoint files - Fix import order in namespace package __init__.py - Remove unused variable api_status in general worker - Address ruff E402 and F841 errors 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactoreed, Dockerfiles,fixes * flexibility on celery run commands * added debug logs * handled filehistory for API * cleanup * cleanup * cloud plugin structure * minor changes in import plugin * added notification and logger workers under new worker module * add docker compatibility for new workers * handled docker issues * log consumer worker fixes * added scheduler worker * minor env changes * cleanup the logs * minor changes in logs * resolved scheduler worker issues * cleanup and refactor * ensuring backward compatibbility to existing wokers * added configuration internal apis and cache utils * optimization * Fix API client singleton pattern to share HTTP sessions - Fix flawed singleton implementation that was trying to share BaseAPIClient instances - Now properly shares HTTP sessions between specialized clients - Eliminates 6x BaseAPIClient initialization by reusing the same underlying session - Should reduce API deployment orchestration time by ~135ms (from 6 clients to 1 session) - Added debug logging to verify singleton pattern activation * cleanup and structuring * cleanup in callback * file system connectors issue * celery env values changes * optional gossip * variables for sync, mingle and gossip * Fix for file type check * Task pipeline issue resolving * api deployement failed response handled * Task pipline fixes * updated file history cleanup with active file execution * pipline status update and workflow ui page execution * cleanup and resolvinf conflicts * remove unstract-core from conenctoprs * Commit uv.lock changes * uv locks updates * resolve migration issues * defer connector-metadtda * Fix connector migration for production scale - Add encryption key handling with defer() to prevent decryption failures - Add final cleanup step to fix duplicate connector names - Optimize for large datasets with batch processing and bulk operations - Ensure unique constraint in migration 0004 can be created successfully 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * hitl fixes * minor fixes on hitl * api_hub related changes * dockerfile fixes * api client cache fixes with actual response class * fix: tags and llm_profile_id * optimized clear cache * cleanup * enhanced logs * added more handling on is file dir and added loggers * cleanup the runplatform script * internal apis are excempting from csrf * sonal cloud issues * sona-cloud issues * resolving sonar cloud issues * resolving sonar cloud issues * Delta: added Batch size fix in workers * comments addressed * celery configurational changes for new workers * fiixes in callback regaurding the pipline type check * change internal url registry logic * gitignore changes * gitignore changes * addressng pr cmmnets and cleanup the codes * adding missed profiles for v2 * sonal cloud blocker issues resolved * imlement otel * Commit uv.lock changes * handle execution time and some cleanup * adding user_data in metadata Pr: #1544 * scheduler backward compatibitlity * replace user_data with custom_data * Commit uv.lock changes * celery worker command issue resolved * enhance package imports in connectors by changing to lazy imports * Update runner.py by removing the otel from it Update runner.py by removing the otel from it Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com> * added delta changes * handle erro to destination db * resolve tool instances id validation and hitl queu name in API * handled direct execution from workflow page to worker and logs * handle cost logs * Update health.py Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor log changes * introducing log consumer scheduler to bulk create, and socket .emit from worker for ws * Commit uv.lock changes * time limit or timeout celery config cleanup * implemented redis client class in worker * pipline status enum mismatch * notification worker fixes * resolve uv lock conflicts * workflow log fixes * ws channel name issue resolved. and handling redis down in status tracker, and removing redis keys * default TTL changed for unified logs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com> Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Add user_data parameter for API deployment variable replacement overr…

7352c17

…ides

jaags-dev requested review from Deepak-Kesavan, harini-venkataraman, jaseemjaskp, muhammad-ali-e and vishnuszipstack September 19, 2025 07:18

Deepak-Kesavan approved these changes Sep 19, 2025

View reviewed changes

coderabbitai Bot reviewed Sep 19, 2025

View reviewed changes

harini-venkataraman approved these changes Sep 19, 2025

View reviewed changes

jaags-dev and others added 5 commits September 19, 2025 13:34

error handling fixes

8f9c91e

Commit uv.lock changes

db88951

added doc string

98c61b5

Merge branch 'feature/user-data-variable-support' of github.com:Zipst…

21cd69c

…ack/unstract into feature/user-data-variable-support

updated sample.env

747f3db

coderabbitai Bot reviewed Sep 19, 2025

View reviewed changes

kirtimanmishrazipstack merged commit d40ea54 into main Sep 19, 2025
7 of 8 checks passed

kirtimanmishrazipstack deleted the feature/user-data-variable-support branch September 19, 2025 08:28

coderabbitai Bot reviewed Sep 19, 2025

View reviewed changes

muhammad-ali-e added a commit that referenced this pull request Sep 22, 2025

adding user_data in metadata Pr: #1544

c4a2a2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-2807 [FEAT] Add user_data variable support for Prompt Studio#1544

UN-2807 [FEAT] Add user_data variable support for Prompt Studio#1544
kirtimanmishrazipstack merged 6 commits into
mainfrom
feature/user-data-variable-support

jaags-dev commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harini-venkataraman left a comment •

edited

Loading

Uh oh!

jaags-dev commented Sep 19, 2025

Uh oh!

github-actions Bot commented Sep 19, 2025

Uh oh!

sonarqubecloud Bot commented Sep 19, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jaags-dev commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai Bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harini-venkataraman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaags-dev commented Sep 19, 2025

Uh oh!

github-actions Bot commented Sep 19, 2025

Uh oh!

sonarqubecloud Bot commented Sep 19, 2025

Quality Gate passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jaags-dev commented Sep 19, 2025 •

edited

Loading

coderabbitai Bot commented Sep 19, 2025 •

edited

Loading

harini-venkataraman left a comment •

edited

Loading