Skip to content

Conversation

@Ferko-dts
Copy link

@Ferko-dts Ferko-dts commented Nov 13, 2025

Changes 🏗️

  • Added an optional parameter for AIStructuredResponseGeneratorBlock in LLM.py.
  • Integrated Tesseract OCR inside the relevant method to enable text extraction from images.
  • Updated Dockerfile to install Tesseract OCR for proper functionality of the new feature.

Reason for changes:
These changes allow the AIStructuredResponseGeneratorBlock to optionally process images using OCR, enabling structured responses from image content. The Dockerfile update ensures that the necessary OCR engine is available in all deployment environments.


Note

Adds optional image input with OCR preprocessing to LLM blocks and updates the Docker image to include Tesseract/pytesseract.

  • Backend — LLM Blocks:
    • AIStructuredResponseGeneratorBlock:
      • Add image input (MediaFileType) and preprocess via Tesseract OCR (supports URL, base64 data URI, local path), appending extracted text to prompt or noting errors.
    • AITextGeneratorBlock:
      • Add image input; delegate calls to AIStructuredResponseGeneratorBlock and return plain text; minor schema/description cleanups; maintains prompt tracking.
  • Infrastructure:
    • backend/Dockerfile: install tesseract-ocr, libtesseract-dev, tesseract-ocr-eng in build/runtime stages; add pytesseract via Poetry.
  • API:
    • v1.py: add imports and a commented example endpoint for testing AI text generation (no active route changes).

Written by Cursor Bugbot for commit 3a1f6c6. This will update automatically on new commits. Configure here.

@Ferko-dts Ferko-dts requested a review from a team as a code owner November 13, 2025 23:15
@Ferko-dts Ferko-dts requested review from Pwuts and Swiftyos and removed request for a team November 13, 2025 23:15
@github-project-automation github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Nov 13, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link

CLAassistant commented Nov 13, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ntindle
❌ Ferko


Ferko seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end platform/blocks labels Nov 13, 2025
@github-actions
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@qodo-code-review
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 Security concerns

Input validation:
The OCR flow accepts arbitrary URLs and local paths without validation. This can enable SSRF (fetching internal network resources) or reading unintended local files if paths are accepted from untrusted input. Mitigate by restricting URL schemes/hosts, disallowing local file paths from untrusted contexts, enforcing size limits, and using timeouts plus content-type checks. Also avoid logging full OCR text if it may contain sensitive data.

⚡ Recommended focus areas for review

Error Handling

OCR path swallows all exceptions and appends raw error strings into the user prompt, which can leak internal details and pollute model inputs. Consider more granular handling, limiting message content, and avoiding mutation of input directly.

if input_data.image:
    try:
        # Handle different image input formats
        if input_data.image.startswith('http'):
            # URL image
            response = requests.get(input_data.image)
            image = Image.open(io.BytesIO(response.content))
        elif input_data.image.startswith('data:image'):
            # Base64 image
            base64_data = re.sub('^data:image/.+;base64,', '', input_data.image)
            image_data = base64.b64decode(base64_data)
            image = Image.open(io.BytesIO(image_data))
        else:
            # Local file path
            image = Image.open(input_data.image)

        # Perform OCR
        ocr_text = pytesseract.image_to_string(image)
        logger.debug(f"OCR extracted text: {ocr_text}")

        # Append OCR text to prompt if text was extracted
        if ocr_text.strip():
            if input_data.prompt:
                input_data.prompt += f"\n\nExtracted text from image:\n{ocr_text}"
            else:
                input_data.prompt = f"Extracted text from image:\n{ocr_text}"

    except Exception as e:
        logger.error(f"Error processing image with OCR: {str(e)}")
        if input_data.prompt:
            input_data.prompt += f"\n\nError processing image: {str(e)}"
        else:
            input_data.prompt = f"Error processing image: {str(e)}"
Network Robustness

Image download via requests.get lacks timeouts, status checks, and content-type validation. Add timeout, response.raise_for_status(), and validate image size/type to prevent hangs and misuse.

if input_data.image.startswith('http'):
    # URL image
    response = requests.get(input_data.image)
    image = Image.open(io.BytesIO(response.content))
elif input_data.image.startswith('data:image'):
    # Base64 image
    base64_data = re.sub('^data:image/.+;base64,', '', input_data.image)
    image_data = base64.b64decode(base64_data)
    image = Image.open(io.BytesIO(image_data))
else:
    # Local file path
    image = Image.open(input_data.image)

# Perform OCR
ocr_text = pytesseract.image_to_string(image)

Build Hygiene
Adding pytesseract via poetry add at build time introduces nondeterminism; prefer pinning in pyproject.toml. Also verify that installing tesseract twice (builder and runtime) is necessary and not redundant.

@netlify
Copy link

netlify bot commented Nov 13, 2025

Deploy Preview for auto-gpt-docs ready!

Name Link
🔨 Latest commit 3a1f6c6
🔍 Latest deploy log https://app.netlify.com/projects/auto-gpt-docs/deploys/692f088b8575670008aa3144
😎 Deploy Preview https://deploy-preview-11379--auto-gpt-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@AutoGPT-Agent
Copy link

Thank you for your PR adding OCR capabilities to the AIStructuredResponseGeneratorBlock! Here's some feedback to help get this PR ready for merging:

  1. Your PR description is missing the required checklist. Please add the template checklist and complete it, especially the sections about test plans since this is a significant feature addition.

  2. Your PR title needs to follow our conventional commit format. It should start with a type like feat: and include an appropriate scope. Based on the changes, something like feat(backend): add image OCR capabilities to LLM block would be more appropriate.

  3. The implementation looks good overall, but consider adding some basic error handling for cases where Tesseract might not be available or for handling different image formats.

  4. I notice there's some commented-out test endpoint code in v1.py - please either complete and uncomment this for testing or remove it if it's not needed for the PR.

  5. There's also a commented-out section in docker-compose.yml - please clarify if this is needed or should be removed.

Please address these items and we'll be happy to review again.

@deepsource-io
Copy link

deepsource-io bot commented Nov 13, 2025

Here's the code health analysis summary for commits 6590fcb..3a1f6c6. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource JavaScript LogoJavaScript✅ SuccessView Check ↗
DeepSource Python LogoPython✅ Success
❗ 17 occurences introduced
🎯 4 occurences resolved
View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

@AutoGPT-Agent
Copy link

Thanks for your PR adding OCR capabilities to process images in prompts! Here are some items that need to be addressed before this can be merged:

  1. Missing Checklist: Please include the complete checklist from the PR template. Since this makes material code changes, the checklist is required to ensure proper testing has been done.

  2. PR Title Format: Your PR title needs to follow the conventional commit format. It should be structured like: feat(platform/blocks): Add image processing using OCR - starting with a type (feat, fix, etc.) and including the relevant scope.

  3. Testing: Please ensure you've thoroughly tested this new functionality, especially with different image formats (URL, local file, base64) and include your test plan in the checklist.

  4. Docker Compose Comments: There are commented out lines in the docker-compose.yml file changes. Please either remove these comments or explain why they're being preserved.

  5. Commented Route: There's a large commented-out endpoint in v1.py. If this is intended for testing only and not for the final PR, please remove it.

Your implementation of OCR functionality looks promising, but we need to ensure it meets all our PR requirements before merging. Let me know if you need any clarification on these items!

from typing import Any, Iterable, List, Literal, NamedTuple, Optional


import pytesseract
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: pytesseract is unconditionally imported in llm.py but is missing from pyproject.toml, leading to ModuleNotFoundError at startup.
Severity: CRITICAL | Confidence: 1.00

🔍 Detailed Analysis

The application will crash at startup with a ModuleNotFoundError: No module named 'pytesseract' because pytesseract is imported unconditionally in llm.py at line 13, but it is not declared as a permanent dependency in pyproject.toml. The poetry add pytesseract --no-ansi || true command in the Dockerfile is an unreliable installation method that does not guarantee the dependency is always present, especially in non-Docker environments.

💡 Suggested Fix

Add pytesseract as a formal dependency to pyproject.toml. Remove the unreliable poetry add pytesseract --no-ansi || true from the Dockerfile, allowing Poetry to manage dependencies correctly.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: autogpt_platform/backend/backend/blocks/llm.py#L13

Potential issue: The application will crash at startup with a `ModuleNotFoundError: No
module named 'pytesseract'` because `pytesseract` is imported unconditionally in
`llm.py` at line 13, but it is not declared as a permanent dependency in
`pyproject.toml`. The `poetry add pytesseract --no-ansi || true` command in the
Dockerfile is an unreliable installation method that does not guarantee the dependency
is always present, especially in non-Docker environments.

Did we get this right? 👍 / 👎 to inform future reviews.

Reference_id: 2669854

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Nov 14, 2025
@github-actions
Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@ntindle ntindle self-requested a review December 1, 2025 19:41
@netlify
Copy link

netlify bot commented Dec 2, 2025

Deploy Preview for auto-gpt-docs-dev ready!

Name Link
🔨 Latest commit 3a1f6c6
🔍 Latest deploy log https://app.netlify.com/projects/auto-gpt-docs-dev/deploys/692f088c36c6490008a5701f
😎 Deploy Preview https://deploy-preview-11379--auto-gpt-docs-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Dec 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

@AutoGPT-Agent
Copy link

Thank you for your contribution to add OCR image processing capability! Here's some feedback to help get your PR ready for merging:

Required Changes

  1. PR Title Format

    • Please update your PR title to follow the conventional commit format, such as: feat(platform/blocks): add image processing with OCR
  2. Missing Checklist

    • The PR template requires a completed checklist section. Please update your PR description to include the checklist from the template and check off all applicable items.
  3. Code Cleanup

    • Remove the commented-out test endpoint in backend/server/routers/v1.py (lines 1375-1426)
    • Remove the commented-out extra_hosts in docker-compose.yml

Other Recommendations

  1. Security Considerations

    • Consider adding validation for image URLs and size limits to prevent potential security issues
    • Document any security considerations for processing external images
  2. Documentation

    • Add brief documentation about how to use this new feature with examples
    • Update your PR description to explain why this feature is useful and how it enhances the platform
  3. Testing Plan

    • Include a specific test plan for this feature in your PR checklist

Once these items are addressed, your PR will be ready for another review.

@AutoGPT-Agent
Copy link

Thank you for your contribution to add image processing with OCR functionality! I have a few items that need to be addressed before this can be merged:

  1. The PR title needs to follow the conventional commit format with type and scope. Please update it to something like: feat(platform/blocks): Add images to the prompt using OCR

  2. Your PR description is missing the required checklist. Please update it to include:

    • The checklist items for code changes
    • Your test plan for these changes
  3. Security considerations:

    • The code adds functionality to process images from URLs and files
    • Consider adding validation for image sizes and types to prevent potential issues
    • Add error handling for malformed or malicious image inputs
  4. Code suggestions:

    • Consider adding timeout handling for image downloads from URLs
    • Add documentation comments for the new image parameter

The implementation looks solid overall, but we need the proper checklist and testing confirmation before proceeding with the merge. Please update your PR with these items, and we'll be happy to review it again.

@AutoGPT-Agent
Copy link

Thank you for your PR adding OCR capabilities to the platform! This is a valuable feature that will allow the LLM blocks to process image content. I have a few points that need to be addressed before this can be merged:

  1. Missing Checklist: Your PR description is missing the required checklist from the template. Please add the checklist and ensure all items are checked off appropriately for your code changes.

  2. PR Title Format: The title needs to follow our conventional commit format. Please update it to something like: feat(platform/blocks): add image OCR processing to LLM blocks

  3. Cleanup: There's a commented-out test endpoint in backend/server/routers/v1.py that should be removed before merging unless it's intended to be uncommented in a future PR.

  4. Security Considerations: Since this PR is labeled with "Possible security concern", could you add a note about how you're handling security aspects of processing external images? The error handling looks good, but it would be helpful to explain any other security considerations.

  5. Test Plan: Please add a test plan to your PR description to demonstrate how you've tested the OCR functionality with different image types (URL, base64, local).

The implementation looks well done overall! Once these items are addressed, we can proceed with the review process.

@AutoGPT-Agent
Copy link

Thank you for your PR adding OCR functionality to process images in LLM blocks. Here are some items to address before this can be merged:

  1. Missing Checklist: Your PR description is missing the required checklist. Please update your description to include the checklist from the template, particularly the sections related to testing since this is a significant functional addition.

  2. PR Title Format: Your title needs to follow the conventional commit format with a type and scope. For example: feat(platform/blocks): Adding images to the prompt using OCR

  3. Testing: Since you're adding new functionality, please include a test plan in the PR description detailing how you've verified the OCR functionality works as expected.

  4. Docker Configuration: You've added Tesseract OCR dependencies to the Dockerfile, which is appropriate. However, there's a commented-out section in docker-compose.yml that should either be uncommented if needed or removed if not required.

  5. API Endpoint: There's a commented-out test endpoint in routers/v1.py. Is this intended for testing only? If so, it should probably be removed before merging, or at least documented as to why it's being kept in a commented state.

The implementation itself looks solid, adding proper OCR functionality with error handling. Once you address these issues, the PR can be reconsidered for merging.

@AutoGPT-Agent
Copy link

Thank you for implementing OCR functionality for image processing in LLM blocks. This is a valuable addition to our platform. Before we can merge this PR, there are two issues that need to be addressed:

  1. Missing Checklist: The PR description is missing the required checklist for code changes. Since you're adding new functionality, please update the description to include the standard checklist with all items checked off, including your test plan.

  2. PR Title Format: The title needs to follow our conventional commit format. Please update it to something like: feat(platform/blocks): Add images to the prompt using OCR

The code implementation looks good, and I appreciate the thorough integration of Tesseract OCR with proper error handling and the Dockerfile updates to ensure the dependencies are available in all environments.

@AutoGPT-Agent
Copy link

Thank you for this PR adding OCR capability to the LLM blocks! This is a valuable feature that will enable processing of image content. I have a few issues that need to be addressed before this can be merged:

Required Changes

  1. Please update your PR title to follow the conventional commit format. It should be something like:
    feat(platform/blocks): add OCR image processing to LLM blocks

  2. The PR description is missing the required checklist section. Please add the complete checklist and mark off all the relevant items.

Additional Feedback

  1. The image processing functionality accepts URLs and local paths without validation. Consider adding validation to prevent potential security issues:

    • URL validation to prevent SSRF attacks
    • Path validation to prevent path traversal
    • Size/format limits to prevent DoS attacks
  2. The commented-out test endpoint in v1.py should probably be removed before merging unless there's a specific reason to keep it.

  3. The error handling in the OCR section could be improved to provide more specific error messages.

Once you update the PR title and add the completed checklist, we can proceed with the review process.

@AutoGPT-Agent
Copy link

Thank you for your contribution to add OCR image processing capability to the platform. Here's what needs to be addressed before this PR can be merged:

Required Changes

  1. Missing PR Checklist: Please add the required checklist from the PR template and fill it out completely.

  2. PR Title Format: Update your PR title to follow the conventional commit format. For example: feat(platform/blocks): Adding images to the prompt using OCR

  3. Security Concerns:

    • The image processing implementation should include validation of input URLs and file paths to prevent security vulnerabilities
    • Consider adding size limits for downloaded images to prevent DoS attacks
  4. Code Cleanup:

    • Remove the commented-out test endpoint in v1.py (lines 1478-1529)
    • Also remove the commented extra_hosts in docker-compose.yml
  5. Additional Testing Needed:

    • Add explicit test plans for the new OCR functionality
    • Include examples of how this feature should be used
  6. Input Validation:

    • Add appropriate error handling for cases where OCR might fail
    • Consider adding validation for the image parameter to ensure it contains valid image data

The concept of adding OCR capability is great, but we need to ensure it's implemented securely and according to project standards.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

)
response = await self.llm_call(object_input_data, credentials)
yield "response", response
print(structured_input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Debug print statement left in production code

A debug print(structured_input) statement remains in the AITextGeneratorBlock.run method. This outputs the full input data, which may include sensitive prompt content, to stdout on every execution, cluttering logs and posing a security risk in production.

Fix in Cursor Fix in Web

if input_data.prompt:
input_data.prompt += f"\n\nExtracted text from image:\n{ocr_text}"
else:
input_data.prompt = f"Extracted text from image:\n{ocr_text}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: OCR text corrupted by Jinja2 template processing

OCR-extracted text is appended to the prompt before fmt.format_string() is called. Since format_string uses Jinja2 templating, any Jinja-like patterns in the OCR text may be misinterpreted as template syntax, potentially causing TemplateError or corrupting the prompt content.

Fix in Cursor Fix in Web

image = Image.open(io.BytesIO(image_data))
else:
# Local file path
image = Image.open(input_data.image)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: HTTP request missing timeout can hang indefinitely

The new image processing in AIStructuredResponseGeneratorBlock uses raw requests.get and Image.open. This bypasses existing security utilities, creating potential SSRF and arbitrary file read vulnerabilities. The requests.get call also lacks a timeout and is synchronous, which can hang the application and block the event loop.

Fix in Cursor Fix in Web

LlmModel,
LLMResponse,
llm_call,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unused imports from commented out test endpoint

Several imports from backend.blocks.llm were added (TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, AIBlockBase, AICredentials, AICredentialsField, LlmModel, LLMResponse, llm_call, and AITextGeneratorBlock) but are only referenced in the commented-out test endpoint at the end of the file. These appear to be development artifacts that weren't cleaned up before committing.

Additional Locations (1)

Fix in Cursor Fix in Web

@ntindle
Copy link
Member

ntindle commented Dec 2, 2025

@cursor can you run the linter :)

@ntindle
Copy link
Member

ntindle commented Dec 9, 2025

Hey hey! I'm going to have to close this soon if CLA isn't signed becasue its not mergable otherwise :) -- if you're worried about time to update it, dont worry too much as I (or another team member) can handle a lot of that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🆕 Needs initial review

Development

Successfully merging this pull request may close these issues.

4 participants