Skip to content

[Enhancement]: Voice-to-Text Transcription Module β€” Implementing FireForm's Core "Talk to Fill" CapabilityΒ #44

@pigeio

Description

@pigeio

name: πŸš€ Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: "
labels: enhancement
assignees: ''


πŸ“ Description

FireForm's foundational design describes voice memos as the primary input method for first responders, but no voice-to-text transcription code exists in the codebase. This enhancement proposes adding a Whisper-based transcription module to complete this core capability.

Evidence that voice input is part of FireForm's design:

Source Reference
README.md "A firefighter records a single voice memo or fills out one 'master' text field" β€” voice listed first
Hackathon submission "Our goal is to use AI to allow firefighters to complete paperwork simply by talking out loud"
Hackathon project tags llm, Paperwork, Voice-to-Text, FireForm
GSoC listing "report incidents using natural language, either voice or text"
backend.py line 37 LLM prompt says "information extracted from transcribed voice recordings"
backend.py line 11 Parameter is named transcript_text β€” designed for voice transcription
Zero audio/speech/transcription code exists in the repo or the original prototype (rte2025).

πŸ’‘ Rationale

First responders work in the field β€” at fire scenes, in trucks, at emergency sites. Typing isn't practical when wearing firefighting gloves, driving an emergency vehicle, or exhausted after a 12-hour shift.

Voice input isn't a "nice to have" β€” it's the only realistic input method for the target users. The entire existing pipeline (transcript_text β†’ LLM prompt β†’ JSON β†’ PDF) already assumes voice transcription exists upstream, but the actual transcription module was never built.

πŸ› οΈ Proposed Solution

Add a VoiceManager module using OpenAI Whisper (open-source, MIT license, runs entirely locally β€” no data ever leaves the machine, aligning with FireForm's privacy-first design):

from voice_manager import VoiceManager

vm = VoiceManager()
transcript = vm.transcribe("incident_report.wav")

βœ… Acceptance Criteria

  • VoiceManager transcribes audio files to text using Whisper locally
  • Multiple audio formats supported (wav, mp3, m4a, ogg, flac, webm)
  • Model size configurable via WHISPER_MODEL environment variable
  • Runs entirely offline (no external API calls)
  • Feature works in Docker container.
  • Documentation updated in docs/.
  • JSON output validates against the schema.

πŸ“Œ Additional Context

  • OpenAI Whisper: https://github.com/openai/whisper (MIT license, runs on CPU/GPU)
  • The base model (~140MB) offers a good accuracy/speed tradeoff for field use
  • Whisper supports 99+ languages β€” useful for FireForm's goal as a Digital Public Good for global adoption
  • This aligns with the GSoC project description: "report incidents using natural language, either voice or text"

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status
Week X Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions