Conversation
550dbdb to
df4c507
Compare
server/routes/nuxt-api/albert/generate-dataservice-description.post.ts
Outdated
Show resolved
Hide resolved
| + `Here is the API information:\n` | ||
| + `Title: ${title.trim()}\n` | ||
| + (hasTechnical ? `Technical documentation URL: ${technicalDocumentationUrl.trim()}\n` : '') | ||
| + (hasMachine ? `Machine documentation URL (OpenAPI/Swagger): ${machineDocumentationUrl.trim()}\n` : '') |
There was a problem hiding this comment.
Design: The openweight-small model can't browse these URLs — it only sees the URL strings. The prompt asks to "mention key endpoints, data types, and use cases" but the model has no access to the actual documentation content. This will likely produce hallucinated descriptions about specific endpoints.
Possible alternatives:
- Fetch the documentation content server-side and include it in the prompt
- Use
createAgentCompletionif the Albert agent API supports web browsing - Adjust the prompt to only describe what can be reasonably inferred from a title + URL patterns
There was a problem hiding this comment.
Damned I was tricked by the hallucinations of the models who made me think it was indeed browsing the URLs! Good call.
I would go for option 1, trying to fetch and format the data from those URLs, with some guardrails regarding the maximum size of the prompt and what those models should/could have as a long prompt. I'll suggest a commit soon.
server/routes/nuxt-api/albert/generate-dataservice-description.post.ts
Outdated
Show resolved
Hide resolved
server/routes/nuxt-api/albert/generate-dataservice-description.post.ts
Outdated
Show resolved
Hide resolved
7c4bf25 to
3c20ce2
Compare
494af3c to
ba1f4fa
Compare
| const timeoutId = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS) | ||
|
|
||
| try { | ||
| const response = await $fetch<string>(url, { |
There was a problem hiding this comment.
Security: SSRF vulnerability. This $fetch call will follow any URL provided by the user, including internal network addresses. An attacker could probe:
http://169.254.169.254/latest/meta-data/(cloud metadata — AWS, GCP)http://localhost:3000/...orhttp://127.0.0.1/...(internal endpoints)http://10.x.x.x/...(private network)
At a minimum, validate that the URL scheme is http/https and that the resolved hostname is not a private/reserved IP (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16).
Security: no response size limit. The MAXIMUM_PROMPT_LENGTH check in the endpoint happens after the full body has been downloaded into memory. A malicious URL could return gigabytes of data and exhaust server memory before the check kicks in.
Mitigation options:
- Check
Content-Lengthheader before reading the body and reject if too large - Stream the response and abort once a byte threshold is reached
- Or at the very least, truncate
rawimmediately after reception (beforeformatDocumentationContent)
| text = text | ||
| .replace(/<script\b[^>]*>[\s\S]*?<\/script>/gi, '') | ||
| .replace(/<style\b[^>]*>[\s\S]*?<\/style>/gi, '') | ||
| .replace(/<[^>]+>/g, ' ') |
There was a problem hiding this comment.
Quality: HTML entities are not decoded after stripping tags. After removing HTML tags, entities like &, ’, —, remain as-is in the text. The LLM will see raw entity strings in the prompt, which degrades description quality.
A simple entity decode pass after the tag strip would help (e.g. using a lightweight lib like he, or a manual replacement of the most common entities).
| /** | ||
| * Trims and formats content: strips HTML tags if present, normalizes whitespace. | ||
| */ | ||
| function formatDocumentationContent(raw: string, _url: string): string { |
There was a problem hiding this comment.
Nit: _url parameter is declared but unused. Either use it (e.g. to infer content type from the URL extension) or remove it.
| {{ $t('Suggérer une description') }} | ||
| </template> | ||
| </BrandedButton> | ||
| <CdataLink |
There was a problem hiding this comment.
UX: feedback link is visible before any suggestion has been generated. "Comment avez-vous trouvé cette suggestion ?" doesn't make sense when no description has been suggested yet. Consider conditioning on a hasGeneratedDescription boolean (set to true after a successful generation).
(Same issue exists in DescribeDataset.vue but no need to reproduce it here.)
Remove explicit ref import as it's auto-imported by Nuxt Co-authored-by: Cursor <cursoragent@cursor.com>
- Use callAlbertAPI from albert-helpers in generate-dataservice-description (align with other Albert endpoints) - Fix 422 for 'description too short' so it is returned to client instead of 500 - Remove validateAlbertConfig; useAlbertConfig() already throws when API key is missing - Drop redundant error logging in dataservice-description handler Co-authored-by: Cursor <cursoragent@cursor.com>
…ateDescriptionFeedbackUrl Co-authored-by: Cursor <cursoragent@cursor.com>
ba1f4fa to
d99abad
Compare
Add AI-powered description suggestion for dataservices (external APIs) via the Albert API.
When editing a dataservice, a "Suggérer une description" button generates a French description. The button is enabled only when:
Changes:
titleand at least one oftechnicalDocumentationUrlormachineDocumentationUrlEDIT (2026-02-17):
callAlbertAPI(albert-helpers), same pattern as other Albert endpoints.fetch-documentation.ts: HTML stripped, JSON/YAML as-is, 15s timeout, 120k char cap). Previously only the URLs were sent.validateAlbertConfigfrom helper and all Albert endpoints.