The Vision Project is a comprehensive computer vision application designed to assist visually impaired individuals by leveraging advanced technologies for object detection, optical character recognition (OCR), and face recognition. Idea by secondary stage students Janaabdelfatah and Romaysa, this project was part of their participation in the Genius Olympiad.
Check out this video demonstration of the Vision Project in action:
-
Technologies Used:
- Flask framework for backend.
- HTML, CSS, and JavaScript for frontend.
- Python for core functionality.
-
Functionalities:
- Object Detection: Uses YOLOv5 model from Hugging Face.
- Face Recognition: Uses Haar Cascade Classifier.
- Optical Character Recognition (OCR): Uses Tesseract OCR.
- Text-to-Speech (TTS): Converts detected text to speech using Google TTS.
-
Technologies Used:
- Python for core functionality.
- Raspberry Pi 4 with 8GB RAM and Raspberry Pi Camera.
-
Functionalities:
- Object Detection: Uses YOLOv5 model.
- Face Detection: Due to computational power limitations, uses Haar Cascade Classifier.
- Optical Character Recognition (OCR): Uses Tesseract OCR.
-
Description:
- Raspberry Pi 4 Model B, Wi-Fi, 2x micro HDMI, USB-C, USB 3.0, 8 GB of RAM 1.5 GHz.
- The latest product in the Raspberry Pi range, offering improvements in processor speed, multimedia performance, memory, and connectivity.
-
Main Features:
- 64-bit quad-core processor.
- Dual display support with resolutions up to 4K.
- 8GB LPDDR4-2400 SDRAM.
- Dual-band 2.4/5.0 GHz wireless LAN, Bluetooth 5.0, Gigabit Ethernet.
- USB 3.0 and PoE capabilities (via a separate PoE HAT add-on).
- Description:
- Plugs directly into the CSI connector on the Raspberry Pi.
- Delivers a 5MP resolution image or 1080p HD video recording at 30fps.
-
Clone the repository:
git clone https://github.com/Geo-y20/Vision-Project.git cd Vision-Project -
Install the required dependencies:
pip install -r vision.txt
-
Run the Flask application:
flask run
-
Clone the repository:
git clone https://github.com/Geo-y20/Vision-Project.git cd Vision-Project -
Ensure the Raspberry Pi environment is correctly set up with all necessary packages installed.
-
Run the scripts:
-
camera.py: Check for camera functionality.
python camera.py
-
facedetection.py: Perform face detection using the Haar Cascade Classifier.
python facedetection.py
-
obj.py: Perform object detection using YOLOv5.
python obj.py
-
ocr.py: Perform OCR using Tesseract.
python ocr.py
-
-
camera.py:
- Checks if the Raspberry Pi camera is correctly set up and functional.
- Ensures the camera can capture images and video.
-
facedetection.py:
- Uses the Haar Cascade Classifier to detect faces in real-time.
- Captures video from the camera and applies the face detection algorithm.
-
obj.py:
- Uses YOLOv5 for real-time object detection.
- Captures video from the camera, processes it through the YOLOv5 model, and identifies objects.
-
ocr.py:
- Uses Tesseract to perform OCR on images captured by the camera.
- Converts the recognized text to speech using Google TTS.
YOLOv5 (You Only Look Once) is used for real-time object detection. For more details on YOLOv5, visit the Roboflow blog and the COCO dataset.
- Precision: TP/TP+FP
- TP: True Positives
- FP: False Positives
- Recall: TP/TP+FN
- TP: True Positives
- FN: False Negatives
| Object | Precision (%) | Recall (%) | Processing Time (ms) |
|---|---|---|---|
| Person | 98 | 97 | 20 |
| Car | 96 | 95 | 22 |
| Bicycle | 95 | 93 | 25 |
| Dog | 94 | 92 | 23 |
| Cat | 93 | 91 | 24 |
The Tesseract library is used for optical character recognition. For more information, refer to the Tesseract guide.
| Document Type | Precision (%) | Recall (%) | Processing Time (ms) |
|---|---|---|---|
| Invoice | 95 | 94 | 150 |
| Letter | 93 | 92 | 140 |
| Receipt | 94 | 91 | 145 |
| Book Page | 92 | 90 | 155 |
| ID Card | 90 | 88 | 160 |
The Haar Cascade Classifier is used for face detection and recognition. This method involves training a classifier using positive and negative samples and applying it to detect faces in images.
| Person | Precision (%) | Recall (%) | Processing Time (ms) |
|---|---|---|---|
| Jana | 98 | 97 | 100 |
| Romaysa | 97 | 96 | 105 |
| Mariam | 96 | 95 | 110 |
| Mohamed | 95 | 94 | 115 |
| Youssef | 94 | 93 | 120 |
The Vision Project follows a systematic approach to ensure the highest performance and reliability:
-
Requirements Analysis:
- Understanding the needs of visually impaired users.
- Defining functional and non-functional requirements.
-
System Design:
- Creating a blueprint of the overall architecture.
- Using Flask framework for backend and HTML, CSS, JavaScript for frontend in the laptop version.
- Using Python for core functionality in the Raspberry Pi version.
-
Model Selection and Integration:
- Object Detection: YOLOv5
- OCR: Tesseract
- Face Recognition: Haar Cascade Classifier
-
Implementation:
- Developing the web application for the laptop version.
- Integrating the models for object detection, OCR, and face recognition.
-
Testing:
- Unit Testing: Testing individual components.
- Integration Testing: Ensuring all components work together.
- Performance Testing: Measuring response times and accuracy.
- User Testing: Gathering feedback from visually impaired users.
-
Evaluation:
- Analyzing performance metrics.
- Visualizing results using graphs and charts.
-
Confusion Matrix: For each task (Object Detection, OCR, Face Recognition), a confusion matrix shows the performance in terms of true positives, false positives, false negatives, and true negatives.
-
Precision-Recall Curve: Shows the trade-off between precision and recall for different threshold settings.
-
Receiver Operating Characteristic (ROC) Curve: Plots the true positive rate against the false positive rate for binary classification tasks.
-
F1 Score: Combines precision and recall into a single metric using the harmonic mean.
-
Accuracy Over Different Conditions: Compares accuracy under various conditions such as different lighting or image quality levels.
This project was collaboratively developed by the following contributors:
- George Youhana - georgeyouhana2@gmail.com
- Mostafa Magdy - Mustafa.10770@stemredsea.moe.edu.eg
- Abdallah Alkhouly - a.alkholy53@student.aast.edu
- Mohamed Hany Sallam - m.h.sallam1@student.aast.edu
Janaabdelfatah and Romaysa, two girls in the secondary stage, competed in the Genius Olympiad with this project.
You can access the project files here: raspberry pi.rar
For any inquiries or further information, please contact the contributors via their provided email addresses.