|
1 | | -# GPT integration for image recognition |
2 | | -Recognize the contents in an image being sent by your users. Some example usecases can be to grade worksheets, digitse the inputs written on pages on the go, perform analysis of condition of skin, water quality, wounds, environment etc. |
3 | 1 |
|
4 | | -## How to use in a flow |
| 2 | +<h3> |
| 3 | + <table> |
| 4 | + <tr> |
| 5 | + <td><b>5 minutes read</b></td> |
| 6 | + <td style={{ paddingLeft: '40px' }}><b>Level: Advanced</b></td> |
| 7 | + <td style={{ paddingLeft: '40px' }}><b>Last Updated: October 2025</b></td> |
| 8 | + </tr> |
| 9 | +</table> |
| 10 | +</h3> |
5 | 11 |
|
6 | | -1. Get the image input from the user |
7 | | -2. Go to call a webhoook node |
8 | | -3. In the type of webhook call select `function` |
9 | | -4. Give the webhoook an appropriate result name. In the exmaple shown the webhook result name is `gptvision` |
10 | | -5. Enter `parse_via_gpt_vision` |
11 | 12 |
|
12 | | -<img width="527" alt="Screenshot 2024-05-07 at 6 20 20 PM" src="https://github.com/glific/docs/assets/141305477/0741e6a0-1e8e-4b30-a246-e6ff454d90c6"/> |
| 13 | +# GPT Integration for Image Recognition |
| 14 | + |
| 15 | +This feature helps to recognize and understand the contents of images shared by the end users. It is useful for a variety of use cases such as: |
| 16 | + |
| 17 | +- Grading worksheets or assignments |
| 18 | +- Reading and digitizing handwritten notes |
| 19 | +- Checking skin conditions, wounds, or other health-related visuals |
| 20 | +- Reviewing the condition of water, crops, or surroundings |
| 21 | + |
| 22 | +With GPT’s image processing abilities, Glific makes it easy to automate tasks that involve looking at and understanding images. |
| 23 | + |
| 24 | + |
| 25 | +**To get the best results:** |
| 26 | +- Use clear and well-lit images |
| 27 | +- Avoid blurry or low-quality photos |
| 28 | +- Make sure the main content is visible and not blocked |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | + |
| 33 | +## How to Use in a Flow |
| 34 | + |
| 35 | +#### Step1: Collect Image Input from the end user |
| 36 | +- Create a `send a message` node directing users to send images as their response, based on their preference. |
| 37 | +- In the `Wait for response` node, select `Has image` as the message response type. Also, give a `Result Name`. In the screenshot below, `image` is used as the result name. |
| 38 | + |
| 39 | +<img width="500" height="383" alt="Screenshot 2025-09-29 at 1 29 38 AM" src="https://github.com/user-attachments/assets/4d685fb3-b126-4490-bb92-23fd9f30ed0b" /> |
| 40 | + |
| 41 | + |
| 42 | +#### Step 2: Add a Webhook Node to Process the Image |
| 43 | +- Select the `function` from the dropdown. |
| 44 | +- In the function field, enter `parse_via_gpt_vision`, this function name is pre-defined. |
| 45 | +- Give the webhook result name - you can use any name. In the screenshot example, it’s named `gptvision`. |
| 46 | + |
| 47 | +<img width="493" height="408" alt="Screenshot 2025-09-29 at 1 46 28 AM" src="https://github.com/user-attachments/assets/1162f959-f923-4f5d-9214-3337456e8bda" /> |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +<img width="504" height="553" alt="Screenshot 2025-09-29 at 1 46 58 AM" src="https://github.com/user-attachments/assets/b398e16c-5a6b-48ac-8c16-680c30728690" /> |
| 52 | + |
| 53 | + |
| 54 | +#### Step 3: Add Parameters in Function Body |
| 55 | +- Click on `Function Body` and pass the parameters as shown in the below screenshot. |
| 56 | + |
| 57 | +<img width="493" height="343" alt="Screenshot 2025-09-29 at 1 51 22 AM" src="https://github.com/user-attachments/assets/cfafbb40-04da-4321-9cf8-d457ff08835e" /> |
| 58 | + |
| 59 | + |
| 60 | +- `Url`: in this field pass the flow variable accepting the image response from the user (In the given example `image` is the result name). |
| 61 | +- `prompt`: in this field pass the prompt, or instructions you wish to convey to the AI model towards processing the image input. |
| 62 | +- `Model`: in this field pass the gpt model. |
| 63 | + |
| 64 | + |
| 65 | +#### Step 4: Display the text response |
| 66 | +- Create a `Send Message` node. |
| 67 | +- Use `@results.webhook_result_name.response` to show the text response (In the given example `gptvision` is the webhook result name). |
| 68 | + |
| 69 | + |
| 70 | +<img width="494" height="378" alt="Screenshot 2025-09-29 at 1 54 03 AM" src="https://github.com/user-attachments/assets/fa53b6af-a5f2-41cc-b058-91a6991ed611" /> |
| 71 | + |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Sample Flow |
| 76 | +Try this [Sample Flow](https://drive.google.com/file/d/1Czi9Bh7YIWGeZwpveXPu-xTGMml-h_R9/view?usp=sharing) to test the GPT Vision integration. |
| 77 | + |
13 | 78 |
|
14 | | - |
15 | | -7. provide the following params in the function body |
16 | 79 |
|
17 | | - `"url":"",` in this field pass the flow variable accepting the image response from the user |
18 | | - |
19 | | - `"prompt":""`in this field pass the prompt, or instructions you wish to convey to the AI model towards processing the image input |
20 | | - |
21 | | - <img width="529" alt="Screenshot 2024-05-07 at 6 17 37 PM" src="https://github.com/glific/docs/assets/141305477/5d5e202c-dd9c-4569-94da-ab53d535d749"/> |
22 | 80 |
|
23 | | -7. To use the `gpt-4o`provide additional parameter called `model` |
24 | | - <img width="631" alt="Screenshot 2024-05-22 at 3 33 37 PM" src="https://github.com/glific/docs/assets/141305477/ad5ea9fc-727b-43c5-ad4c-a11521542177" /> |
25 | 81 |
|
26 | | - |
27 | | -10. Use the variable `@results.webhookname.response` to print the response from the AI model. In the above example the response will be printed by calling `@results.gptvision.response` |
28 | | -<img width="579" alt="Screenshot 2024-05-22 at 3 34 06 PM" src="https://github.com/glific/docs/assets/141305477/b16d4708-b18e-40ff-8e53-57adc50c92c2" /> |
29 | 82 |
|
30 | 83 |
|
31 | 84 |
|
32 | | -Here is an exmaple flow for you to import and try it out: https://drive.google.com/file/d/1p-L5bPnXtsZZKBA3_pJr5wsdeQm3taVD/view?usp=sharing |
|
0 commit comments