[Enhancement]: Voice-to-Text Transcription Module — Implementing FireForm's Core "Talk to Fill" Capability

---
name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: <Short Description>"
labels: enhancement
assignees: ''

---

## 📝 Description
FireForm's foundational design describes voice memos as the **primary input method** for first responders, but no voice-to-text transcription code exists in the codebase. This enhancement proposes adding a Whisper-based transcription module to complete this core capability.

Evidence that voice input is part of FireForm's design:
| Source | Reference |
|--------|-----------|
| **README.md** | *"A firefighter records a single **voice memo** or fills out one 'master' text field"* — voice listed first |
| **Hackathon submission** | *"Our goal is to use AI to allow firefighters to complete paperwork simply by **talking out loud**"* |
| **Hackathon project tags** | `llm`, `Paperwork`, **`Voice-to-Text`**, `FireForm` |
| **GSoC listing** | *"report incidents using natural language, either **voice** or text"* |
| **backend.py line 37** | LLM prompt says *"information extracted from **transcribed voice recordings**"* |
| **backend.py line 11** | Parameter is named `transcript_text` — designed for voice transcription |
Zero audio/speech/transcription code exists in the repo or the original prototype (rte2025).

## 💡 Rationale
First responders work **in the field** — at fire scenes, in trucks, at emergency sites. Typing isn't practical when wearing firefighting gloves, driving an emergency vehicle, or exhausted after a 12-hour shift.

Voice input isn't a "nice to have" — it's the **only realistic input method** for the target users. The entire existing pipeline (`transcript_text` → LLM prompt → JSON → PDF) already assumes voice transcription exists upstream, but the actual transcription module was never built.

## 🛠️ Proposed Solution
Add a `VoiceManager` module using **OpenAI Whisper** (open-source, MIT license, runs entirely locally — no data ever leaves the machine, aligning with FireForm's privacy-first design):

from voice_manager import VoiceManager

vm = VoiceManager()
transcript = vm.transcribe("incident_report.wav")


## ✅ Acceptance Criteria

- [x] VoiceManager transcribes audio files to text using Whisper locally
- [x] Multiple audio formats supported (wav, mp3, m4a, ogg, flac, webm)
- [x] Model size configurable via WHISPER_MODEL environment variable 
- [x] Runs entirely offline (no external API calls)
- [x] Feature works in Docker container.
- [x] Documentation updated in `docs/`.
- [x] JSON output validates against the schema.

## 📌 Additional Context

- OpenAI Whisper: https://github.com/openai/whisper (MIT license, runs on CPU/GPU)
- The base model (~140MB) offers a good accuracy/speed tradeoff for field use
- Whisper supports 99+ languages — useful for FireForm's goal as a Digital Public Good for global adoption
- This aligns with the GSoC project description: "report incidents using natural language, either voice or text"


Source	Reference
README.md	"A firefighter records a single voice memo* or fills out one 'master' text field"* — voice listed first
Hackathon submission	"Our goal is to use AI to allow firefighters to complete paperwork simply by talking out loud"
Hackathon project tags	`llm`, `Paperwork`, `Voice-to-Text`, `FireForm`
GSoC listing	"report incidents using natural language, either voice* or text"*
backend.py line 37	LLM prompt says "information extracted from transcribed voice recordings"
backend.py line 11	Parameter is named `transcript_text` — designed for voice transcription
Zero audio/speech/transcription code exists in the repo or the original prototype (rte2025).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Voice-to-Text Transcription Module — Implementing FireForm's Core "Talk to Fill" Capability #44

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Enhancement]: Voice-to-Text Transcription Module — Implementing FireForm's Core "Talk to Fill" Capability #44

Description

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions