feat: suggest dataservice description by bolinocroustibat · Pull Request #924 · datagouv/cdata

bolinocroustibat · 2026-02-05T16:30:20Z

Add AI-powered description suggestion for dataservices (external APIs) via the Albert API.

When editing a dataservice, a "Suggérer une description" button generates a French description. The button is enabled only when:

title is filled
at least one of these is filled: technical documentation URL or machine documentation URL (OpenAPI/Swagger)

Changes:

DescribeDataservice.vue: suggestion button with loading/disabled states and tooltip; enabled when title + at least one doc URL are present
generate-dataservice-description.post.ts (new): Nitro endpoint that fetches documentation from the given URLs, inlines it into the prompt, and calls Albert API. Requires title and at least one of technicalDocumentationUrl or machineDocumentationUrl
fetch-documentation.ts (new): utility to fetch doc content from URLs (HTML stripped, JSON/YAML as-is; 15s timeout, 120k char cap)

EDIT (2026-02-17):

generate-dataservice-description.post.ts uses shared callAlbertAPI (albert-helpers), same pattern as other Albert endpoints.
Doc content is now fetched from the URLs and inlined into the prompt (new fetch-documentation.ts: HTML stripped, JSON/YAML as-is, 15s timeout, 120k char cap). Previously only the URLs were sent.
Fixed 422 "description too short" being rethrown as 500.
Removed redundant validateAlbertConfig from helper and all Albert endpoints.

server/routes/nuxt-api/albert/generate-dataservice-description.post.ts

components/Dataservices/DescribeDataservice.vue

ThibaudDauce · 2026-02-17T09:04:32Z

server/routes/nuxt-api/albert/generate-dataservice-description.post.ts

+          + `Here is the API information:\n`
+          + `Title: ${title.trim()}\n`
+          + (hasTechnical ? `Technical documentation URL: ${technicalDocumentationUrl.trim()}\n` : '')
+          + (hasMachine ? `Machine documentation URL (OpenAPI/Swagger): ${machineDocumentationUrl.trim()}\n` : '')


Design: The openweight-small model can't browse these URLs — it only sees the URL strings. The prompt asks to "mention key endpoints, data types, and use cases" but the model has no access to the actual documentation content. This will likely produce hallucinated descriptions about specific endpoints.

Possible alternatives:

Fetch the documentation content server-side and include it in the prompt

Use createAgentCompletion if the Albert agent API supports web browsing

Adjust the prompt to only describe what can be reasonably inferred from a title + URL patterns

Damned I was tricked by the hallucinations of the models who made me think it was indeed browsing the URLs! Good call.

I would go for option 1, trying to fetch and format the data from those URLs, with some guardrails regarding the maximum size of the prompt and what those models should/could have as a long prompt. I'll suggest a commit soon.

components/MarkdownEditor/InternalEditor.vue

server/routes/nuxt-api/albert/generate-dataservice-description.post.ts

ThibaudDauce · 2026-02-18T10:55:56Z

server/routes/nuxt-api/albert/utils/fetch-documentation.ts

+  const timeoutId = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS)
+
+  try {
+    const response = await $fetch<string>(url, {


Security: SSRF vulnerability. This $fetch call will follow any URL provided by the user, including internal network addresses. An attacker could probe:

http://169.254.169.254/latest/meta-data/ (cloud metadata — AWS, GCP)

http://localhost:3000/... or http://127.0.0.1/... (internal endpoints)

http://10.x.x.x/... (private network)

At a minimum, validate that the URL scheme is http/https and that the resolved hostname is not a private/reserved IP (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16).

Security: no response size limit. The MAXIMUM_PROMPT_LENGTH check in the endpoint happens after the full body has been downloaded into memory. A malicious URL could return gigabytes of data and exhaust server memory before the check kicks in.

Mitigation options:

Check Content-Length header before reading the body and reject if too large

Stream the response and abort once a byte threshold is reached

Or at the very least, truncate raw immediately after reception (before formatDocumentationContent)

ThibaudDauce · 2026-02-18T10:55:56Z

server/routes/nuxt-api/albert/utils/fetch-documentation.ts

+    text = text
+      .replace(/<script\b[^>]*>[\s\S]*?<\/script>/gi, '')
+      .replace(/<style\b[^>]*>[\s\S]*?<\/style>/gi, '')
+      .replace(/<[^>]+>/g, ' ')


Quality: HTML entities are not decoded after stripping tags. After removing HTML tags, entities like &, ’, —,   remain as-is in the text. The LLM will see raw entity strings in the prompt, which degrades description quality.

A simple entity decode pass after the tag strip would help (e.g. using a lightweight lib like he, or a manual replacement of the most common entities).

ThibaudDauce · 2026-02-18T10:55:56Z

server/routes/nuxt-api/albert/utils/fetch-documentation.ts

+/**
+ * Trims and formats content: strips HTML tags if present, normalizes whitespace.
+ */
+function formatDocumentationContent(raw: string, _url: string): string {


Nit: _url parameter is declared but unused. Either use it (e.g. to infer content type from the URL extension) or remove it.

ThibaudDauce · 2026-02-18T10:55:56Z

components/Dataservices/DescribeDataservice.vue

+                  {{ $t('Suggérer une description') }}
+                </template>
+              </BrandedButton>
+              <CdataLink


UX: feedback link is visible before any suggestion has been generated. "Comment avez-vous trouvé cette suggestion ?" doesn't make sense when no description has been suggested yet. Consider conditioning on a hasGeneratedDescription boolean (set to true after a successful generation).

(Same issue exists in DescribeDataset.vue but no need to reproduce it here.)

Remove explicit ref import as it's auto-imported by Nuxt Co-authored-by: Cursor <cursoragent@cursor.com>

- Use callAlbertAPI from albert-helpers in generate-dataservice-description (align with other Albert endpoints) - Fix 422 for 'description too short' so it is returned to client instead of 500 - Remove validateAlbertConfig; useAlbertConfig() already throws when API key is missing - Drop redundant error logging in dataservice-description handler Co-authored-by: Cursor <cursoragent@cursor.com>

…ateDescriptionFeedbackUrl Co-authored-by: Cursor <cursoragent@cursor.com>

bolinocroustibat requested review from ThibaudDauce, maudetes and nicolaskempf57 as code owners February 5, 2026 16:30

bolinocroustibat self-assigned this Feb 5, 2026

bolinocroustibat marked this pull request as draft February 5, 2026 16:30

bolinocroustibat added this to 🚀 Produit data.gouv.fr Feb 5, 2026

bolinocroustibat moved this to 🛠 Doing in 🚀 Produit data.gouv.fr Feb 5, 2026

bolinocroustibat force-pushed the feat/suggest-dataservice-description branch 2 times, most recently from 550dbdb to df4c507 Compare February 16, 2026 16:48

bolinocroustibat marked this pull request as ready for review February 16, 2026 16:49

ThibaudDauce requested changes Feb 17, 2026

View reviewed changes

bolinocroustibat force-pushed the feat/suggest-dataservice-description branch from 7c4bf25 to 3c20ce2 Compare February 17, 2026 12:51

bolinocroustibat moved this from 🛠 Doing to 👀 Review in 🚀 Produit data.gouv.fr Feb 17, 2026

bolinocroustibat force-pushed the feat/suggest-dataservice-description branch 3 times, most recently from 494af3c to ba1f4fa Compare February 17, 2026 19:53

ThibaudDauce requested changes Feb 18, 2026

View reviewed changes

bolinocroustibat and others added 7 commits February 23, 2026 15:17

feat: suggest dataservice description

f6bda0a

docs: add help text

c38c42d

refactor(markdown-editor): remove unnecessary ref import

85e6926

Remove explicit ref import as it's auto-imported by Nuxt Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(config): rename generateShortDescriptionFeedbackUrl to gener…

7046b35

…ateDescriptionFeedbackUrl Co-authored-by: Cursor <cursoragent@cursor.com>

feat: fetch documentation URLs to build prompt to suggest description

458dd9c

docs: add comments

d99abad

bolinocroustibat force-pushed the feat/suggest-dataservice-description branch from ba1f4fa to d99abad Compare February 23, 2026 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: suggest dataservice description#924

feat: suggest dataservice description#924
bolinocroustibat wants to merge 7 commits intomainfrom
feat/suggest-dataservice-description

bolinocroustibat commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ThibaudDauce Feb 17, 2026

Uh oh!

bolinocroustibat Feb 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThibaudDauce Feb 18, 2026

Uh oh!

ThibaudDauce Feb 18, 2026

Uh oh!

ThibaudDauce Feb 18, 2026

Uh oh!

ThibaudDauce Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bolinocroustibat commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThibaudDauce Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

bolinocroustibat Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ThibaudDauce Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ThibaudDauce Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ThibaudDauce Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ThibaudDauce Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bolinocroustibat commented Feb 5, 2026 •

edited

Loading