-
Notifications
You must be signed in to change notification settings - Fork 10
[FE] Converts audio to markdown #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 15 commits
a4dd2f7
2c5051c
24ad0d2
fc7523b
1c4f07d
b282176
c32ad88
864eeaf
cd2dfa6
804cc5b
260ec00
5721920
5d6df67
7a70a0d
59a502e
ce78f7e
82b80fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -114,9 +114,9 @@ | |
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "\r", | ||
| "0% [Working]\r", | ||
| " \r", | ||
| "\r\n", | ||
| "0% [Working]\r\n", | ||
| " \r\n", | ||
| "Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]\n", | ||
| "Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease [1,581 B]\n", | ||
| "Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]\n", | ||
|
|
@@ -1968,6 +1968,64 @@ | |
| "source": [ | ||
| "display(Markdown(result_md))" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Audio to Markdown (Support with Gemini)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 1, | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "<>:3: SyntaxWarning: invalid escape sequence '\\h'\n", | ||
| "<>:3: SyntaxWarning: invalid escape sequence '\\h'\n", | ||
| "C:\\Users\\vaish\\AppData\\Local\\Temp\\ipykernel_16776\\2861124903.py:3: SyntaxWarning: invalid escape sequence '\\h'\n", | ||
| " document_path =\"inputs\\harvard.wav\"\n", | ||
| "e:\\Lexoid\\Lexoid\\.venv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", | ||
| " from .autonotebook import tqdm as notebook_tqdm\n", | ||
| "\u001b[32m2025-11-17 16:28:41.463\u001b[0m | \u001b[34m\u001b[1mDEBUG \u001b[0m | \u001b[36mlexoid.core.utils\u001b[0m:\u001b[36mis_supported_file_type\u001b[0m:\u001b[36m92\u001b[0m - \u001b[34m\u001b[1mFile type: audio/wav\u001b[0m\n", | ||
| "\u001b[32m2025-11-17 16:28:41.463\u001b[0m | \u001b[34m\u001b[1mDEBUG \u001b[0m | \u001b[36mlexoid.core.utils\u001b[0m:\u001b[36mrouter\u001b[0m:\u001b[36m559\u001b[0m - \u001b[34m\u001b[1mUsing LLM_PARSE because the type of file is audio.\u001b[0m\n", | ||
| "\u001b[32m2025-11-17 16:28:41.464\u001b[0m | \u001b[34m\u001b[1mDEBUG \u001b[0m | \u001b[36mlexoid.api\u001b[0m:\u001b[36mwrapper\u001b[0m:\u001b[36m70\u001b[0m - \u001b[34m\u001b[1mAuto-detected parser type: ParserType.LLM_PARSE\u001b[0m\n", | ||
| "\u001b[32m2025-11-17 16:28:41.464\u001b[0m | \u001b[34m\u001b[1mDEBUG \u001b[0m | \u001b[36mlexoid.api\u001b[0m:\u001b[36mparse_chunk\u001b[0m:\u001b[36m135\u001b[0m - \u001b[34m\u001b[1mUsing LLM parser\u001b[0m\n" | ||
| ] | ||
| }, | ||
| { | ||
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "The Stale Smell of Old Beer Lingers\n", | ||
| "\n", | ||
| "- The stale smell of old beer lingers.\n", | ||
| "- It takes heat to bring out the odor.\n", | ||
| "- A cold dip restores health and zest.\n", | ||
| "- A salt pickle tastes fine with ham.\n", | ||
| "- Tacos al pastor are my favorite.\n", | ||
| "- A zestful food is the hot cross bun.\n" | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "from lexoid.api import parse\n", | ||
| "\n", | ||
| "document_path =\"inputs\\harvard.wav\"\n", | ||
| "parsed_md = parse(document_path, \"AUTO\",api=\"gemini\")[\"raw\"]\n", | ||
|
||
| "print(parsed_md)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [] | ||
| } | ||
| ], | ||
| "metadata": { | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -11,6 +11,7 @@ | |||||
| import requests | ||||||
| import torch | ||||||
| from anthropic import Anthropic | ||||||
| from google import genai | ||||||
|
||||||
| from google import genai | |
| import google.generativeai as genai |
Vaishnav2804 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Vaishnav2804 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string formatting placeholder {path} is not being replaced with the actual path value. This line should use f-string formatting: system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n" (note the f prefix and corrected spelling of "Audio").
| system_prompt = AUDIO_TO_MARKDOWN_PROMPT + "Audo file name is: {path}\n" | |
| system_prompt = AUDIO_TO_MARKDOWN_PROMPT + f"Audio file name is: {path}\n" |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new audio parsing functionality lacks test coverage. Consider adding a test case similar to the existing test_llm_parse and test_jpg_parse functions to verify audio file parsing works correctly with the Gemini API. This would help ensure the feature works as expected and prevent regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path string uses a single backslash which Python interprets as an escape sequence, causing a SyntaxWarning (visible in the output at lines 1988-1991). Use either a raw string (
r"inputs\harvard.wav") or forward slashes ("inputs/harvard.wav") to avoid this warning.