-
Notifications
You must be signed in to change notification settings - Fork 46.2k
#10006 Adding images to the prompt using OCR #11379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
#10006 Adding images to the prompt using OCR #11379
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Ferko seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
This PR targets the Automatically setting the base branch to |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
✅ Deploy Preview for auto-gpt-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Thank you for your PR adding OCR capabilities to the AIStructuredResponseGeneratorBlock! Here's some feedback to help get this PR ready for merging:
Please address these items and we'll be happy to review again. |
|
Here's the code health analysis summary for commits Analysis Summary
|
|
Thanks for your PR adding OCR capabilities to process images in prompts! Here are some items that need to be addressed before this can be merged:
Your implementation of OCR functionality looks promising, but we need to ensure it meets all our PR requirements before merging. Let me know if you need any clarification on these items! |
| from typing import Any, Iterable, List, Literal, NamedTuple, Optional | ||
|
|
||
|
|
||
| import pytesseract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: pytesseract is unconditionally imported in llm.py but is missing from pyproject.toml, leading to ModuleNotFoundError at startup.
Severity: CRITICAL | Confidence: 1.00
🔍 Detailed Analysis
The application will crash at startup with a ModuleNotFoundError: No module named 'pytesseract' because pytesseract is imported unconditionally in llm.py at line 13, but it is not declared as a permanent dependency in pyproject.toml. The poetry add pytesseract --no-ansi || true command in the Dockerfile is an unreliable installation method that does not guarantee the dependency is always present, especially in non-Docker environments.
💡 Suggested Fix
Add pytesseract as a formal dependency to pyproject.toml. Remove the unreliable poetry add pytesseract --no-ansi || true from the Dockerfile, allowing Poetry to manage dependencies correctly.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: autogpt_platform/backend/backend/blocks/llm.py#L13
Potential issue: The application will crash at startup with a `ModuleNotFoundError: No
module named 'pytesseract'` because `pytesseract` is imported unconditionally in
`llm.py` at line 13, but it is not declared as a permanent dependency in
`pyproject.toml`. The `poetry add pytesseract --no-ansi || true` command in the
Dockerfile is an unreliable installation method that does not guarantee the dependency
is always present, especially in non-Docker environments.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference_id: 2669854
|
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
✅ Deploy Preview for auto-gpt-docs-dev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
…t-for-aitextgenerator
|
Thank you for your contribution to add OCR image processing capability! Here's some feedback to help get your PR ready for merging: Required Changes
Other Recommendations
Once these items are addressed, your PR will be ready for another review. |
|
Thank you for your contribution to add image processing with OCR functionality! I have a few items that need to be addressed before this can be merged:
The implementation looks solid overall, but we need the proper checklist and testing confirmation before proceeding with the merge. Please update your PR with these items, and we'll be happy to review it again. |
|
Thank you for your PR adding OCR capabilities to the platform! This is a valuable feature that will allow the LLM blocks to process image content. I have a few points that need to be addressed before this can be merged:
The implementation looks well done overall! Once these items are addressed, we can proceed with the review process. |
|
Thank you for your PR adding OCR functionality to process images in LLM blocks. Here are some items to address before this can be merged:
The implementation itself looks solid, adding proper OCR functionality with error handling. Once you address these issues, the PR can be reconsidered for merging. |
|
Thank you for implementing OCR functionality for image processing in LLM blocks. This is a valuable addition to our platform. Before we can merge this PR, there are two issues that need to be addressed:
The code implementation looks good, and I appreciate the thorough integration of Tesseract OCR with proper error handling and the Dockerfile updates to ensure the dependencies are available in all environments. |
|
Thank you for this PR adding OCR capability to the LLM blocks! This is a valuable feature that will enable processing of image content. I have a few issues that need to be addressed before this can be merged: Required Changes
Additional Feedback
Once you update the PR title and add the completed checklist, we can proceed with the review process. |
|
Thank you for your contribution to add OCR image processing capability to the platform. Here's what needs to be addressed before this PR can be merged: Required Changes
The concept of adding OCR capability is great, but we need to ensure it's implemented securely and according to project standards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| ) | ||
| response = await self.llm_call(object_input_data, credentials) | ||
| yield "response", response | ||
| print(structured_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Debug print statement left in production code
A debug print(structured_input) statement remains in the AITextGeneratorBlock.run method. This outputs the full input data, which may include sensitive prompt content, to stdout on every execution, cluttering logs and posing a security risk in production.
| if input_data.prompt: | ||
| input_data.prompt += f"\n\nExtracted text from image:\n{ocr_text}" | ||
| else: | ||
| input_data.prompt = f"Extracted text from image:\n{ocr_text}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: OCR text corrupted by Jinja2 template processing
OCR-extracted text is appended to the prompt before fmt.format_string() is called. Since format_string uses Jinja2 templating, any Jinja-like patterns in the OCR text may be misinterpreted as template syntax, potentially causing TemplateError or corrupting the prompt content.
| image = Image.open(io.BytesIO(image_data)) | ||
| else: | ||
| # Local file path | ||
| image = Image.open(input_data.image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: HTTP request missing timeout can hang indefinitely
The new image processing in AIStructuredResponseGeneratorBlock uses raw requests.get and Image.open. This bypasses existing security utilities, creating potential SSRF and arbitrary file read vulnerabilities. The requests.get call also lacks a timeout and is synchronous, which can hang the application and block the event loop.
| LlmModel, | ||
| LLMResponse, | ||
| llm_call, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unused imports from commented out test endpoint
Several imports from backend.blocks.llm were added (TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, AIBlockBase, AICredentials, AICredentialsField, LlmModel, LLMResponse, llm_call, and AITextGeneratorBlock) but are only referenced in the commented-out test endpoint at the end of the file. These appear to be development artifacts that weren't cleaned up before committing.
Additional Locations (1)
|
@cursor can you run the linter :) |
|
Hey hey! I'm going to have to close this soon if CLA isn't signed becasue its not mergable otherwise :) -- if you're worried about time to update it, dont worry too much as I (or another team member) can handle a lot of that |
Changes 🏗️
AIStructuredResponseGeneratorBlockinLLM.py.Tesseract OCRinside the relevant method to enable text extraction from images.Dockerfileto installTesseract OCRfor proper functionality of the new feature.Reason for changes:
These changes allow the
AIStructuredResponseGeneratorBlockto optionally process images using OCR, enabling structured responses from image content. The Dockerfile update ensures that the necessary OCR engine is available in all deployment environments.Note
Adds optional image input with OCR preprocessing to LLM blocks and updates the Docker image to include Tesseract/pytesseract.
AIStructuredResponseGeneratorBlock:imageinput (MediaFileType) and preprocess via Tesseract OCR (supports URL, base64 data URI, local path), appending extracted text topromptor noting errors.AITextGeneratorBlock:imageinput; delegate calls toAIStructuredResponseGeneratorBlockand return plain text; minor schema/description cleanups; maintains prompt tracking.backend/Dockerfile: installtesseract-ocr,libtesseract-dev,tesseract-ocr-engin build/runtime stages; addpytesseractvia Poetry.v1.py: add imports and a commented example endpoint for testing AI text generation (no active route changes).Written by Cursor Bugbot for commit 3a1f6c6. This will update automatically on new commits. Configure here.