A complete prototype of a vision-based personal memory assistant that:
- Captures images periodically via webcam (simulating wearable glasses)
- Detects objects using YOLOv8 for scene understanding
- Stores memories with timestamps and metadata in SQLite
- Processes queries using ChatGPT API for natural language understanding
- Provides a web interface using Streamlit for easy interaction
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Webcam │ │ YOLO v8 │ │ ChatGPT API │
│ Capture │───▶│ Detection │───▶│ NLP Processing│
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ SQLite DB │ │ Scene Analysis│ │ Query Analysis│
│ Storage │ │ & Context │ │ & Response │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
▼
┌─────────────────────────┐
│ Streamlit Web UI │
│ Search & Browse │
└─────────────────────────┘
MemoryAssistant/
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── setup.py # Automated setup script
├── test_setup.py # System verification tests
├── QUICKSTART.md # Quick start guide
├── README.md # Comprehensive documentation
├── env_example.txt # Environment template
├── config/ # Configuration management
│ ├── __init__.py
│ └── settings.py # App settings & env vars
├── capture/ # Image capture system
│ ├── __init__.py
│ └── camera.py # Webcam integration
├── vision/ # Computer vision processing
│ ├── __init__.py
│ └── detector.py # YOLO object detection
├── memory/ # Memory storage & retrieval
│ ├── __init__.py
│ └── database.py # SQLite database operations
├── api/ # External API integration
│ ├── __init__.py
│ └── openai_client.py # ChatGPT API wrapper
└── data/ # Data storage (auto-created)
├── images/ # Captured images
├── database/ # SQLite database
└── embeddings/ # Vector embeddings
- ✅ Webcam integration with OpenCV
- ✅ Periodic capture scheduling
- ✅ Active hours configuration (8 AM - 10 PM)
- ✅ Image quality optimization
- ✅ Error handling and logging
- ✅ YOLOv8 object detection
- ✅ Scene description generation
- ✅ Context analysis (indoor/outdoor, activity type)
- ✅ Object counting and classification
- ✅ Confidence threshold filtering
- ✅ SQLite database with structured schema
- ✅ JSON storage for complex metadata
- ✅ Timestamp indexing for fast queries
- ✅ Query logging for analytics
- ✅ Statistics and reporting
- ✅ ChatGPT API integration
- ✅ Query intent analysis
- ✅ Entity extraction (objects, time, location)
- ✅ Enhanced scene descriptions
- ✅ AI-powered search responses
- ✅ Streamlit-based UI
- ✅ Real-time image capture
- ✅ Natural language search
- ✅ Memory browsing and statistics
- ✅ Responsive design with tabs
| Component | Technology | Purpose |
|---|---|---|
| Frontend | Streamlit | Web interface |
| Backend | Python 3.8+ | Application logic |
| Vision | YOLOv8 + OpenCV | Object detection |
| NLP | ChatGPT API | Query understanding |
| Database | SQLite | Memory storage |
| Image Processing | OpenCV | Camera operations |
| Configuration | python-dotenv | Environment management |
- "When did I last see my keys?"
- "Show me when I was working at my desk"
- "Find memories from today"
- "When was I in the kitchen?"
- "Show me outdoor scenes"
- ✅ Natural language query processing
- ✅ Object-based memory search
- ✅ Time-based filtering
- ✅ AI-generated responses
- ✅ Image gallery with metadata
- ✅ Real-time statistics
# 1. Run automated setup
python setup.py
# 2. Add OpenAI API key to .env file
# 3. Start the application
streamlit run app.py- ✅ Complete end-to-end pipeline
- ✅ Real-time object detection
- ✅ Natural language query processing
- ✅ Scalable database design
- ✅ User-friendly web interface
- ✅ Working prototype
- ✅ Interactive web interface
- ✅ Natural language queries
- ✅ Memory recall functionality
- ✅ Comprehensive documentation
- Hybrid Approach Works: Combining local vision processing with cloud NLP provides good performance and cost balance
- Modular Design: Clean separation of concerns makes the system maintainable and extensible
- User Experience: Natural language queries make the system intuitive and accessible
- Performance Optimization: Local processing for vision, cloud for NLP strikes the right balance
- Rapid Prototyping: Streamlit enables quick iteration and demo development
We've successfully built a working prototype of a vision-based personal memory assistant that demonstrates:
- Real-time image capture and processing
- Natural language memory queries
- AI-powered scene understanding
- Interactive web interface
- Scalable data storage
The system is demo-ready and provides a solid foundation for future enhancements. The hybrid approach using local YOLO processing and cloud ChatGPT API delivers excellent performance while maintaining reasonable costs.
Ready for demonstration! 🚀