Skip to content

parseSummaryXml: regex parser fails on markdown code fences and extra text around XML, no retry on final summarize #783

@Acharnite

Description

@Acharnite

Summary

The parseSummaryXml function in src/functions/summarize.ts uses simple regex to extract XML tags from the LLM response. Many LLMs (DeepSeek, GPT, and others) sometimes wrap structured output in markdown code fences (```xml ... ```) or include conversational text before/after the XML. The simple regex parser doesn't handle these cases, causing session summaries to be silently dropped.

Additionally, the final summarization call has no retry mechanism, unlike the chunk-level summarizeChunkWithRetry which retries twice.

Symptoms

[agentmemory] warn Failed to parse summary XML {"sessionId":"ses_..."}

The mem::summarize function returns {success: false, error: "parse_failed"} and the session is never summarized.

Root Cause (v0.9.24)

src/functions/summarize.tsparseSummaryXml (around line 148):

function parseSummaryXml(xml, sessionId, project, obsCount) {
    const title = getXmlTag(xml, "title");
    if (!title) return null;
    // ...
}

src/functions/summarize.tsgetXmlTag:

function getXmlTag(xml, tag) {
    if (!VALID_TAG.test(tag)) return "";
    const match = xml.match(new RegExp(`<${tag}>([\\s\\S]*?)</${tag}>`));
    return match ? match[1].trim() : "";
}

Two problems:

  1. No preprocessing of raw LLM output — If the LLM wraps the XML in markdown fences (```xml\n<summary>...\n```), adds text before/after, or otherwise deviates from the exact XML format, the regex misses the tags and parseSummaryXml returns null.

  2. No retry on final summarize — The chunk-level summarizeChunkWithRetry retries twice on parse failure (line 155: for (let attempt = 1; attempt <= 2; attempt++)), but the mem::summarize function calls parseSummaryXml once with no retry (around line 183). This is inconsistent — the chunk level has resilience but the final merge doesn't.

Steps to Reproduce

  1. Have a session that triggers auto-compress (AGENTMEMORY_AUTO_COMPRESS=true)
  2. Use any LLM provider that sometimes wraps XML in markdown code fences (observed with deepseek-v4-flash via OpenAI-compatible endpoint, but also known to happen with GPT/Claude)
  3. The warning appears when the LLM returns something like:
Here's the summary:
```xml
<summary>
  <title>My session</title>
  ...
</summary>
```

Instead of the expected exact XML with no surrounding text or fences.

Suggested Fix

Two changes in src/functions/summarize.ts:

1. Make parseSummaryXml robust to markdown fences and extra text

Before the regex extraction, strip markdown code fences and try to extract the raw XML from any surrounding text:

function parseSummaryXml(xml, sessionId, project, obsCount) {
    // Strip markdown code fences if present (```xml ... ``` or ``` ... ```)
    xml = xml.replace(/```xml\n?/gi, "").replace(/```\n?/g, "").trim();
    // If extra text exists before/after the XML root element, extract just the XML
    const rootMatch = xml.match(/(<[a-zA-Z_][a-zA-Z0-9_-]*>[\s\S]*<\/[a-zA-Z_][a-zA-Z0-9_-]*>)/);
    if (rootMatch) xml = rootMatch[1].trim();
    const title = getXmlTag(xml, "title");
    if (!title) return null;
    // ...
}

2. Add retry loop to the final summarize call

Match the retry behavior already present in summarizeChunkWithRetry:

let summary = null;
for (let attempt = 1; attempt <= 2; attempt++) {
    summary = parseSummaryXml(response, sessionId, session.project, compressed.length);
    if (summary) break;
    logger.warn("Failed to parse summary XML", { sessionId, attempt });
    if (attempt === 1) {
        const retryResult = await produceSummaryXml(provider, compressed, sessionId, session.project);
        response = retryResult.response;
        if (!response || !response.trim()) break;
    }
}

Impact

Without the fix, every session summarization that hits a fence-wrapping LLM response silently drops the session summary. This cascades: no session summary means no crystal digest, no episodic→semantic→procedural consolidation, and gaps in the memory graph.

In our environment (agentmemory v0.9.24, deepseek-v4-flash via OpenAI-compatible endpoint), this occurred 31 times across 15+ sessions over a few days of normal usage before we patched it.

Workaround

Users can set SUMMARIZE_CHUNK_SIZE=999999 to force single-chunk mode (which avoids the chunk concurrency but doesn't fix the fence parsing), or apply the patch above to dist/index.mjs in their global installation.

However, a fix upstream would benefit everyone regardless of LLM provider.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions