feat(express): Introduce A2UI Express compiler, draft proposals, and evaluations by gspencergoog · Pull Request #1678 · a2ui-project/a2ui

gspencergoog · 2026-06-16T21:24:43Z

Summary

This pull request introduces A2UI Express, an experimental, compact domain-specific language (DSL) that allows agents to express UI layouts with minimal token usage. It provides a complete compilation/decompilation pipeline, integrates it into the evaluations strategy suite, and documents the specification draft under proposals.

Evaluation Results

Below is a summary of the evaluations benchmark run (47 samples) comparing the layout generation strategies using the lightweight google/gemini-3.1-flash-lite model:

Strategy	`a2ui_scorer` Accuracy (Syntax)	`measured_model_graded_qa` Accuracy (Semantics)	Avg Latency	Median Latency	Output Tokens	Total Tokens
`direct` (Raw JSON)	87.23%	93.62%	4.99s (baseline)	3.57s (baseline)	35,022 (baseline)	527,882 (baseline)
`express` (A2UI Express DSL)	91.49%	84.04%	1.09s (78% reduction)	1.06s (70% reduction)	9,912 (72% reduction)	232,748 (56% reduction)

By compiling into inline components and inline data models, the Express strategy matches or exceeds the direct strategy's syntactic validity while reducing generation latencies by 70% (over 3.3x faster) and saving 56% of total generation tokens (and 72% of output tokens).

Changes

Python Agent SDK

agent_sdks/python/a2ui_agent/src/a2ui/express/: Added the core package including:
- compiler.py: Compiles flat Express DSL statements into standard A2UI wire JSON. Includes robust exception handling and positional argument validation.
  - Added support for bare $ path token resolving to relative root JSONPointer {"path": ""}.
  - Added recursive nested check compilation (e.g. ?and([?required, ?email])).
  - Added automatic unrolling of inline component constructor instantiations (e.g. Row([Text("Soup")]) -> _inline_1).
  - Added lexical scanner rules to ignore Python-style # and JS-style // comments outside string literals.
- decompiler.py: Translates standard wire JSON layouts back into the flat DSL representation, with safety guards on missing properties.
- prompt_generator.py: Generates LLM system instruction prompt contracts containing component signatures, enums, static property indicators, and nested schema requirements dynamically.
- schema_helper.py: Crawls catalog JSON schemas to resolve positional parameter bounds and properties, with type checks to skip boolean schemas inside allOf loops.
agent_sdks/python/a2ui_core/: Improved core validation routines to make them schema-driven:
- node_graph.py & integrity_checker.py: Decoupled structural and static checks from hardcoded catalog definitions, resolving constraints dynamically from catalog schemas.
Feature Flag Control: Added environment variable switches:
- A2UI_EXPRESS_ENABLED: If not set to true, importing a2ui.express raises an ImportError.
- A2UI_VERSION_1_0: Gates the new dynamic JSON schema validator for v1.0. Automatically enabled if A2UI_EXPRESS_ENABLED is true.
SDK Tests: Added comprehensive test cases in tests/express/test_express.py covering DSL tokenization, parsing, expression nesting, map variable inlining, comments, and validator gating. Added tests/express/test_cli_tools.py to test all 4 CLI wrapper scripts under specification/proposals/express/, achieving 90% test coverage for the CLI utility suite.

Specification & Proposals

specification/proposals/express/: Relocated and formalized the post-1.0 draft specification (a2ui_express.md, evolve_express.md, and sample configs) to prevent polluting the ratified v1_0 baseline.
DSL Examples: Relocated all .a2ui evaluation layout examples to specification/proposals/express/examples/.

Evaluations Suite

eval/a2ui_eval/strategies/express.py: Added the express evaluation solver that runs and merges layout accuracy scores. Express compiler results generate standard A2UI v1.0 JSON payloads directly.
eval/tasks.py / main.py: Shifted the core baseline evaluation task to run against A2UI v0.9.1 (renamed to a2ui_v0_9_1_eval), while keeping the express strategy target isolated to A2UI v1.0.
Evaluation Datasets: Introduced a new translated dataset eval/datasets/v1_0_prompts.yaml that adapts prompt texts and target descriptions to use v1.0 terminology (inline components, longText instead of multiline, and omission of returnType properties). This aligns target outputs with strategy versions for clean, accurate LLM grading.

Impact & Risks

No Breaking Changes: The compilation pipeline is isolated and disabled by default under the A2UI_EXPRESS_ENABLED environment switch.
No Regression: The existing baseline evaluation runs (direct, subagent_tool) are preserved as defaults.

Testing

Unit Tests: Run uv run pytest inside agent_sdks/python/a2ui_agent/ folder.
Evaluations Run: To run the new strategy in the benchmark runner, run uv run main.py --strategies express inside the eval/ folder.

refactor(express): implement thread-safety, type safety, and operations constants - Refactors `ExpressCompiler` to use a stateless, thread-safe `_CompileContext` instance for compiler invocations, preventing race conditions during concurrent execution. - Introduces `SurfaceOperation` constants for standard A2UI surface message envelope keys, removing hardcoded string lookups across the compiler and decompiler. - Adds explicit, strict type annotations to `_load_mappings` and `get_property_enum` in `CatalogSchemaHelper`. - Adds `test_compiler_concurrency` regression test to verify concurrent compile runs on a single compiler instance, with overall package test coverage maintained above 90%. docs(express): refocus README on Gemini API execution - Restructures `specification/proposals/express/README.md` to highlight executing inference and validation using remote Gemini models (e.g. `gemini-3.1-flash-lite`). - Relocates local Apple Silicon MLX setup instructions to the bottom of the document. docs(express): update README with env requirements and remove outdated documents - Updates `specification/proposals/express/README.md` to document the mandatory `A2UI_EXPRESS_ENABLED` environment variable gate. - Standardizes command usage in README to use `uv run` with the environment variable. - Resolves file extension in compiler example from `.express` to `.a2ui`. - Removes outdated `basic_prompt.md` and `evolve_express.md` files from the proposals directory. fix(express): correct compiler check mapping, multiline string parsing, and decompiler formatting - Fixes positional check parsing shifts so checks do not map to preceding optional properties (like weight). - Corrects multiline string compiler logic to preserve blank lines inside active statements. - Improves check argument matching to map string literals to custom validation messages when property schemas expect non-string types. - Optimizes the decompiler to strip unnecessary "_" placeholders. - Adds comprehensive round-trip tests to verify all 36 basic catalog examples against their Express DSL counterparts, achieving 91% total coverage. - Fixes missing Optional typing imports in schema_helper.py and prompt_generator.py. - Fixes fix_format.sh to make corepack enable non-fatal, preventing permission errors from aborting formatting early on developer machines. feat(express): introduce A2UI Express compilation pipeline, specification proposals, and evaluations - Introduce A2UI Express—an experimental, compact DSL notation allowing agents to describe UI layouts using minimal tokens. - Add `agent_sdks/python/a2ui_agent/src/a2ui/express/` package implementing the `ExpressCompiler` (translating flat DSL to standard wire JSON) and `ExpressDecompiler` (reconstructing DSL from wire JSON). - Secure Express package behind a disabled-by-default environment flag `A2UI_EXPRESS_ENABLED`. - Relocate and formalize the specification draft to `/specification/proposals/express/` to avoid polluting the v1.0 baseline. - Add the `express` layout generation strategy inside evaluations (`eval/`), featuring support for running it optionally via a comma-separated or repeating `--strategies` CLI list parameter. - Exclude research-facing genetic prompt optimizer tooling from this base branch (moved to `a2ui_express_optimizer` branch).

gemini-code-assist

Code Review

This pull request introduces A2UI Express, a compact, token-efficient declarative DSL for generative user interfaces, complete with a compiler, decompiler, prompt generator, CLI tools, and integration into the evaluation framework. The code review highlights several critical issues, including a sentinel parsing bug that ignores statements on the same line as tags, a potential crash in the schema helper when handling boolean schemas in allOf, and a decompiler bug that incorrectly strips quotes from string literals matching component IDs. Additionally, the reviewer noted hardcoded absolute paths in the dataset translation script and a PEP 8 import style violation in the evaluation solver.

- Fixes compiler sentinel tag parsing to strip tags and process statement content remaining on the same line. - Adds boolean schema type guards in `CatalogSchemaHelper._load_mappings` when resolving `allOf` elements. - Implements `is_ref` parameters in `ExpressDecompiler._decompile_value` to prevent string literal values matching component IDs from being decompiled without quotes. - Corrects path resolution in `translate_dataset.py` to be relative to the script directory. - Re-orders imports in `express.py` strategy file to follow PEP 8. - Adds comprehensive regression tests in `test_express.py` covering all fixes, maintaining 100% pass rate.

jacobsimionato

Exciting!!!

jacobsimionato · 2026-06-18T05:49:58Z

+@solver
+def a2ui_express_prompt(catalog_path: str) -> Solver:
+    """Solver to inject A2UI Express prompt contract instructions."""
+    generator = ExpressPromptGenerator(catalog_path)


Next, let's combine ExpressPromptGenerator and ExpressCompiler into one ExpressInferenceFormat which implements AbstractInferenceFormat, so that people can implement other inference formats and reuse this inspect_ai strategy.

Good idea. I'll leave that for another PR though.

jacobsimionato · 2026-06-18T05:50:35Z

+  return results
+
+
+class ExpressDecompiler:


Yes! Love the idea of having the reverse direction too, so we can make sure that the model also understands sees content through the DSL

This was partly to make sure that I could "round trip" things to make sure that the express format could support everything that A2UI can. It was also helpful in converting examples in the catalog to use in the prompt.

I think this is also important to handle conversation history, assuming it is stored in the more standardized, stable A2UI format. If the agent is communicating back and forth with A2UI, we want it to always read and write A2UI in the same format, to avoid confusion (and maximise efficiency!). So I think we need the decompiler for this.

Steps for an inference in a multi-turn conversation:

Convert all A2UI messages in existing conversation history to Express format (required decompiler)

Convert Catalog to express format

Prompt agent with system prompt in express format, and conversation history also using express format.

Parse express output back to A2UI (requires compiler)

Persist conversation history in A2UI format.

jacobsimionato · 2026-06-18T05:51:42Z

Do these examples need to be checked in? Could we generate them on demand using the decompiler from the canonical A2UI examples?

We could generate them, yes. That's how I produced them.

jacobsimionato · 2026-06-18T05:52:06Z

Amazing! Should this be checked in though?

No, that is an artifact of the improvement loop I was running. I'll remove it.

jacobsimionato · 2026-06-18T05:54:16Z

Are you intending to land this PR as-is? Any chance of breaking it up a bit, to make it easier to do thorough reviewing?

jiahaog · 2026-06-18T05:15:41Z

+The design of A2UI Express focuses on four main requirements:
+
+- Token footprint reduction. Generative models spend excessive output tokens when producing verbose JSON structures. A2UI Express removes structural keys, brackets, and repeated quotes, reducing output tokens by 55% to 70% compared to native A2UI wire payloads.
+- On-device model optimization. Small local models, such as Gemma 4 E2B and E4B, operate with limited context windows and reasoning budgets. The syntax uses clean positional signatures that fit into prompt contracts without consuming excessive context.


What does "clean positional signatures" refer to?

It just means that the DSL uses argument positions without named arguments. "Clean" is because they don't have the named arguments, I guess.

jiahaog · 2026-06-18T05:16:12Z

+
+The syntax supports three literal primitive types:
+
+- Strings are enclosed in straight double quotes, for example `"Enter your name"`.


What about escaping?

And newlines in strings.

Embedded newlines are allowed, they just have to be properly closed.
I added more rules about quoting and escaping too:

Strings are represented in two formats:

Standard Strings: Enclosed in single double quotes (e.g., "Enter your name") or triple double quotes (e.g., """Line 1\nLine 2"""). Standard strings support common escape sequences: \n (newline), \t (tab), \\ (backslash), and \" (double quote). Embedded newlines are allowed.

Raw Strings: Prefaced by r (e.g., r"^[a-zA-Z]+$" or r"""Raw multi-line content"""). In raw strings, no escape sequences are processed, and backslashes are interpreted as literal characters. This is particularly useful for validation regex patterns containing backslashes. Embedded newlines are allowed.

jiahaog · 2026-06-18T06:55:57Z

+
+## Compilation pipeline
+
+The compilation pipeline runs on the host application. It takes the plain text stream of A2UI Express, processes it, and emits a standard A2UI v1.0 JSON payload.


The compilation pipeline runs on the host application.

Do we need to opine on this? I thought the compiler runs (and is an implementation detail of the agent). If it runs on the client, we have more problems like dealing with versioning

It does. I think "host application" is the agent here. I'll reword it.

jiahaog · 2026-06-18T07:13:13Z

Some more general comments:

For the benchmarks we should use at least use a non-lite model. It's unlikely that a lite model will be sufficient in the "direct" strategy since the agent needs to do domain specific things, in addition to generating output as A2UI. I'll send some changes to allow us to benchmark multiple providers besides just Gemini too.
Is your intention for express to be an implementation detail of the agent SDK? If not, perhaps we should start with that first (also see my inline comment on where the compiler should run)

gspencergoog · 2026-06-18T16:57:33Z

Are you intending to land this PR as-is? Any chance of breaking it up a bit, to make it easier to do thorough reviewing?

No, I mainly wanted to get your (and Jia Hao's) impressions on it. I'll split it up today into at least two PRs, one with the proposal directory and docs, and one with the Python agent implementation and evals, taking into account both of your review comments.

I was maybe a little overexcited to share the results and it needs some more pruning and cleaning.

gspencergoog · 2026-06-18T17:01:15Z

Some more general comments:

For the benchmarks we should use at least use a non-lite model. It's unlikely that a lite model will be sufficient in the "direct" strategy since the agent needs to do domain specific things, in addition to generating output as A2UI. I'll send some changes to allow us to benchmark multiple providers besides just Gemini too.

Yes, I agree. I did run the direct model on Lite too, so it's at least comparable, but you're right that there isn't a lot more reasoning overhead left there after building the UIs. I mainly wanted to see 1) that a Lite model could handle the reasoning needed (showing that it wasn't too reasoning heavy), and 2) to see how fast we could go if latency were the only driver.

It works well in Flash 3.5, but is of course slower. I found that limiting the thinking budget helped a bit there.

Is your intention for express to be an implementation detail of the agent SDK? If not, perhaps we should start with that first (also see my inline comment on where the compiler should run)

Yes, that is my intention. The rendering clients should have no idea that Express was ever involved, and just see regular A2UI.

yjbanov · 2026-06-18T18:11:27Z

+username = Text($/username, "caption")
+bio = Text($/bio, "body")
+stats-row = Row([followers-col, following-col, posts-col], "spaceAround")
+followers-col = Column([followers-count, followers-label], _, "center")


Are we sure we want to support names that contain "-" and other special characters? In the future, when models become more advanced, we might want to introduce features that would take advantage of special characters, but allowing them in identifier names could cause ambiguities in the grammar.

Good point. We probably should limit it to the recommendations in https://www.unicode.org/reports/tr31/ so that we can allow Unicode characters in identifiers (I recently added this to A2UI as well).

yjbanov · 2026-06-18T18:14:01Z

+root = Card(main-column)
+main-column = Column([title-text, markdown-content], _, "stretch")
+title-text = Text("### Markdown Rendering")
+markdown-content = Text("# Heading 1


Another " can almost certainly appear inside a multiline string like this. How do we ensure we can still parse it correctly?

"Naaah, that'll
"never" happen." :-)

Sure, we should specify quoting rules. I really like the """ rules that other languages like Dart have. I want to make sure we can keep the prompt small though, so it's easy to describe, but I don't want to open the door to something like just saying "Use Python string quoting rules" because that opens the door to supporting lots of other syntax (raw string quoting, multiple character escape formats, misinterpreting conflicting string interpolation formats, etc.). Maybe there's a way we can say it that is concise but doesn't mention a language.

e.g. "For quoting strings, surround with a double quote. If a double quote appears in the string, then use """ around the string. Individual double quotes may also be escaped with a preceding backslash. For strings that are raw strings interpreted literally, precede the string with an r: r" or r""" ".

Unlike other languages, one of the driving criteria here is that its rules are able to be concisely described in English prose.

yjbanov · 2026-06-18T18:17:22Z

+$/now = "2025-12-15T12:00:00Z"
+root = Card(main-column)
+main-column = Column([welcome-text, email-field, phone-field, zip-field, terms-checkbox, submit-btn], _, "stretch")
+welcome-text = Text(formatString("Hello! Today is ${formatDate(value: ${/now}, format: 'EEEE, MMMM d')}."))


Do we need formatString? Dart and JS can format strings without it. Just the ${ syntax is enough. It's also fewer tokens this way.

formatString is implemented in the catalog. This makes it so that a catalog author can implement whatever string formatting they want to use (e.g. you could implement a printf function and use printf formatting instead). It also makes it a lot easier to implement a renderer, since the string formatting isn't part of the specification, and implementing formatString is actually pretty involved (it's reactive, and so generates new strings whenever the interpolated values, including interpolated function calls, change, for instance).

We provide an implementation in the core libraries that catalog writers can leverage so they can just include it.

I can see the argument for including it in the language, but if we do that for Express, it needs to be included in A2UI as well, and implemented in all renderers.

I agree with just using our existing formatString function call approach for the first cut of express, to keep it catalog-agnostic.

But in the future, I think it'd be interesting to pursue these optimizations which make the format catalog-specific in service of greater performance gains.

We might be able to generalize this - JSON render has a concept of "directives" which is something like this - https://json-render.dev/docs/directives.

So perhaps you create a generic ExpressInferenceFormat and then install a directive that transforms ${/number} to formatString(/number) etc

munificent

@yjbanov asked me to take a look with an eye to language design stuff. I have very little context on the overall problem being solved and I'm definitely not an expert at what kinds of code LLMs are good at writing and reading, so take all of this with a very large grain of salt.

Most of my comments are probably fine but I figured it's better to bring it up then potentially miss a problem.

Overall, this makes sense to me. Unlike a syntax where the entire thing is one monolithic tree which isn't meaningfully parsable until you have the whole thing, it gives you a way to break the code into smaller separately handle-able units.

That does raise questions around forward declarations and how names are resolved and managed. Name resolution in general is pretty vague here and is something you'll likely want to be pedantic about. It's a part of language design that has a lot of sharp edges.

Also, an explicit grammar in something like EBNF would be nice to see. I know it's not something that everyone loves, but the exercise of writing it will force you to answer a lot of things that might otherwise be left implicit and then become subtle parser bugs. (For example, it's not clear from the spec here if component definitions can have map literals as arguments or not. Can lists have trailing commas? Be empty? Be empty except for just a comma?)

I'm always excited to see people doing novel language design to approach a problem a different way! :D

munificent · 2026-06-18T19:04:52Z

+
+The design of A2UI Express focuses on four main requirements:
+
+- Token footprint reduction. Generative models spend excessive output tokens when producing verbose JSON structures. A2UI Express removes structural keys, brackets, and repeated quotes, reducing output tokens by 55% to 70% compared to native A2UI wire payloads.


It's surprising that this reduces token size so much. In theory, a directly nested syntax should be more concise than what's proposed here because it avoids repeating path components, as in:

foo/bar/baz/bang/a = 1 foo/bar/baz/bang/b = 2 foo/bar/baz/bang/c = 3 foo/bar/baz/bang/d = 4 foo/bar/baz/bang/e = 5 // 55 tokens ("token" in the PL sense, not LLM sense) // Versus: foo( bar( baz( bang( a = 1 b = 2 c = 3 d = 4 e = 5 ) ) ) ) // 27 tokens

So is the improvement here really from the flattening, or from not having a nesting syntax that has a lot of other unnecessary boilerplate like quoted key names, argument names, comma separators, etc.?

If so, perhaps it would be beneficial for this notation to allow nesting too?

There were two reasons I stayed away from this kind of nesting:

if we want to "stream" user interfaces, nesting like this is hard, since we have to wait until the end, or do some auto-closing magic in order to do intermediate states. If I can use adjacency lists to separate the components from the list, then it makes streaming much nicer: we ignore symbols that we don't recognize yet, and fill them in when they are defined. This lets the LLM be sloppy about ordering the definitions, and

the LLM then doesn't need to keep track of all the nesting paren levels, which it is not terribly good at.

munificent · 2026-06-18T19:07:34Z

+
+- Token footprint reduction. Generative models spend excessive output tokens when producing verbose JSON structures. A2UI Express removes structural keys, brackets, and repeated quotes, reducing output tokens by 55% to 70% compared to native A2UI wire payloads.
+- On-device model optimization. Small local models, such as Gemma 4 E2B and E4B, operate with limited context windows and reasoning budgets. The syntax uses clean positional signatures that fit into prompt contracts without consuming excessive context.
+- Streaming compatibility. The line-oriented grammar allows the client host to parse and build the component hierarchy line-by-line, enabling progressive rendering of the interface before the model finishes its output.


One of the examples below looks like:

root = Card(main-column) main-column = Column([icon, title, description, actions], _, "center") icon = Icon($/icon) title = Text($/title, "h3") ...

Lines here often refer to names declared on later lines. That implies that we can't always process lines as they come in, unless the system can gracefully handle references to unknown entities.

Yes, this is on purpose to let the LLM control streaming behavior. The A2UI renderers already handle this by ignoring symbols that they don't recognize until they are defined, and also caching symbol definitions that aren't yet connected to anything until they get used. This lets us stream in a Column with identifiers for the children, and then fill in the children as the come in, or vice versa and have them all pop in at once when the Column definition arrives.

munificent · 2026-06-18T19:09:46Z

+
+### Variable declarations
+
+Every component definition is assigned to a unique, alphanumeric variable. The compiler uses these variables to resolve parent-child hierarchies. A reserved variable named `root` acts as the primary entry point for the interface tree.


What happens if a user refers to root? What does this do:

root = Row([root])

Or is it "write-only" in some way?

Related: how are cyclic references handled?

It is an error to have circular references.

The renderers will throw an error back to the agent if they catch circular references. This is a somewhat bad design in that in order to catch these (algorithmically) on the server before they hit the client, the server has to keep track of everything the client has seen. On the other hand, the renderer is the final say as to the actual state of things, so maybe it's appropriate there.

munificent · 2026-06-18T19:14:52Z

+
+Every component definition is assigned to a unique, alphanumeric variable. The compiler uses these variables to resolve parent-child hierarchies. A reserved variable named `root` acts as the primary entry point for the interface tree.
+
+To eliminate syntax errors from complex bracket structures and enable line-oriented streaming compilation, A2UI Express prohibits inline component nesting. Component constructor calls (e.g., `Text(...)`, `Column(...)`) can **only** appear on the right-hand side of a variable assignment (`var = ComponentName(...)`). They **cannot** be passed directly as positional arguments to other components. Instead, you must declare them separately and reference their variable names.


For what it's worth, I've seen various little DSLs and hobby languages over the years try to stake a claim like this in the name of simplicity (or because their authors aren't comfortable writing a full expression parser) and most usually end up dialing it back over time. It becomes really annoying if you can't do any computation in a nested expression.

If someone wants to do:

root = Framed(app-frame-thickness + (is-android ? android-frame-adjust : ios-frame-adjust) + 4)

Do you really want them to have to write something like:

a = is-android ? android-frame-adjust : ios-frame-adjust b = app-frame-thickness + a c = b + 4 root = Framed(c)

If this DSL is really only for authoring component trees, it's probably fine. But you do have literal values and even lists. Presumably it will be useful to add numbers, concatenate strings, or append to lists. Having to hoist all of that out to separate named declarations could get really annoying.

Though if this code will never be written or read by a human... 🤷

Well, exactly, it will not be written by a human. Which is weird.

The prohibition there is actually there more to keep an LLM from writing an entire tree in one expression, so that streaming works better. It forces it to split it up into a bunch of lines that can be evaluated as they come in.

munificent · 2026-06-18T19:15:10Z

+
+The syntax supports three literal primitive types:
+
+- Strings are enclosed in straight double quotes, for example `"Enter your name"`.


And newlines in strings.

munificent · 2026-06-18T19:30:00Z

+- Client functions are written as `<FunctionName>(<args>)`, matching the exact function names registered in the loaded catalog.
+- If the client catalog contains a text formatting helper (such as `formatString`), it is called explicitly: `welcomeText = Text(formatString("Welcome, ${/user/firstName}!"))`. This prevents failures if a client catalog uses a different naming convention for interpolation.
+- Local actions use this same signature to trigger behaviors, for example `openUrl("https://example.com")`. The compiler maps these to standard client function actions.
+- Server events use a reserved `Event` signature to declare backend actions, for example `Event("save_deal", {rep: $/form/rep})`.


So it seems like map literals can be used as expressions basically anywhere? If so, you probably want to add them to ### Core primitive types.

Yeah, good point, I'll do that.

munificent · 2026-06-18T19:31:41Z

+
+### Validation and logic expressions
+
+Validation checks are defined using the `?` prefix. If a component expects validation rules, the compiler converts these expressions into standard client-side functions:


I don't have enough context to know what "validation" means here. But if the leading ? is just syntactic sugar for calling a function with that name, does it carry its weight?

Validation here is in the context of "form validation", in the sense of wanting to check that a text field contains an email address, for instance. It is handled by defining a client side function to do the validation and return a boolean. Any function that returns a boolean can be used as a validation function.

The ? is just syntactic sugar for calling a boolean-returning function that takes an implicit first "value" first argument.

To be honest, I haven't really thought this part through that well. I think probably it could just be a regular function call syntax, but that does have slightly higher (LLM) token size (not much though).

Maybe it should instead be something like

username = TextInput($/form/username, [required(_), regex(_, "^[0-9]{5}$")])

and we can sub in the value for the _. Right now that would be:

username = TextInput($/form/username, [?required, ?regex("^[0-9]{5}$")])

munificent · 2026-06-18T19:32:34Z

+
+- Simple checks are written with the function name, for example `?required`.
+- Parameterized checks accept arguments in parentheses, for example `?regex("^[0-9]{5}$", "Must be a valid zip code")`.
+- Multiple checks are grouped in lists: `[?required, ?email]`.


So the system implicitly understands that a list containing validation checks is itself a validation check? What about:

[?required, "oops, not a validation check"]

Would it make more sense to do:

?[required, email]

So the system implicitly understands that a list containing validation checks is itself a validation check? What about:

[?required, "oops, not a validation check"]

This would fail because the string isn't a boolean.

Would it make more sense to do:

?[required, email]

No, the values don't have to be functions, they could also be from the data model.

munificent · 2026-06-18T19:34:30Z

+
+### Line parsing and tokenization
+
+The compiler reads the input text line-by-line. It discards empty lines and parses assignments into tokens.


You do say an assignment can span multiple lines, so it's probably clearer to say that separate top-level assignments or standalone operations may be executed before later ones are parsed. Is that the intent here?

Are there comments?

Yes, that's the intent.

There aren't explicitly comments, but I do actually ignore both # and // end of line comments in the parser because the LLM sometimes adds them anyhow. We don't want to mention or "allow" them because they just take up tokens we're not going to use.

munificent · 2026-06-18T19:35:47Z

+
+If the compiler encounters a syntax error or catalog schema mismatch during parsing, it triggers a structured error recovery workflow:
+
+1. Isolation. The compiler flags the invalid line, discards that sub-branch of the AST, and continues parsing the remaining lines to avoid collapsing the user interface.


What if the offending line is defining some name that is referred to elsewhere on other lines?

The other place will just ignore the undefined value. It might not render because of that, but it would just wait until there's a valid value there, which could come in a an error correction update.

* Add lexer regexes and token parsing logic for standard triple-quoted strings (using a refined lookahead pattern to support nested quotes) and raw strings (single/triple quoted with zero escape processing). * Implement strict unescaping for standard strings, resolving only \n, \t, \\, and \", and treating all other escape sequences literally. * Update prompt generator instructions to include the simplified raw/triple string rules. * Implement decompiler changes to format string values into the most readable quote style (raw strings for paths/regexes, triple quotes for multi-line or quote-nested strings). * Update standard catalog example files and evaluation strategy documentation to use the new string formats. * Add comprehensive test suite covering all string quoting, escaping, and formatting choices.

…mpts to format-agnostic - Move a2ui.express package to a2ui.experimental.express. - Rename prompt texts in v1_0_prompts.yaml to be format-agnostic (removing JSON/createSurface terminology). - Delete regex prompt-rewriting hacks from eval/a2ui_eval/strategies/express.py. - Add parse_express_response parser helper in python SDK and use it in express strategy solver. - Move development run_* helper scripts to specification/proposals/express/scripts/ subfolder. - Remove temporary leaderboard.json artifact. - Fix Prettier and Pyink formatting across changed files.

…ocks - Convert all multi-line and long strings in v1_0_prompts.yaml to use literal block scalar notation (| or |-). - Remove all manual quote escapes and newline characters from the prompt entries.

- Replace legacy references to createSurface and updateComponents in registration, cart, openUrl, and nestedLayout prompts with format-agnostic descriptions.

- Update TOKEN_SPEC lexer rules in compiler.py to support unclosed strings at end-of-stream and separate horizontal/vertical whitespace. - Shift statement slicing from a raw line-by-line balancer to a token-by-token statement grouper. - Add is_final parameter to compile/tokenize to manage streaming chunks versus completed inputs. - Add specification details on string literal variants in a2ui_express.md. - Add tests for multi-line unescaped parenthesis in strings, parser syntax checks, and unbalanced trailing structures.

Variable identifiers now strictly conform to the Unicode Identifier standard (UAX a2ui-project#31), allowing Unicode letters, digits, and underscores, but forbidding dashes (-). This prevents naming ambiguity with future expression/subtraction syntax support. - Update identifier regex pattern to `[^\W\d]\w*` in `compiler.py`. - Document identifier rules in `a2ui_express.md` and `prompt_generator.py`. - Convert all example `.a2ui` variables and `.json` component IDs to use underscores instead of dashes. - Refactor python tests in `test_express.py`. BREAKING-CHANGE: Separators like dashes (`-`) are no longer permitted in A2UI Express variable names. Existing DSL definitions containing dashes in variables will fail parsing.

gspencergoog · 2026-06-19T01:11:11Z

@yjbanov asked me to take a look with an eye to language design stuff. I have very little context on the overall problem being solved and I'm definitely not an expert at what kinds of code LLMs are good at writing and reading, so take all of this with a very large grain of salt.

Thank you for taking a look! I really appreciate the feedback.

Overall, this makes sense to me. Unlike a syntax where the entire thing is one monolithic tree which isn't meaningfully parsable until you have the whole thing, it gives you a way to break the code into smaller separately handle-able units.

Yes, we want to be able to stream the data, make corrections, and recover from missing pieces.

That does raise questions around forward declarations and how names are resolved and managed. Name resolution in general is pretty vague here and is something you'll likely want to be pedantic about. It's a part of language design that has a lot of sharp edges.

Okay, point well taken. That makes a lot of sense. I'll see if I can lock that down.

Also, an explicit grammar in something like EBNF would be nice to see. I know it's not something that everyone loves, but the exercise of writing it will force you to answer a lot of things that might otherwise be left implicit and then become subtle parser bugs. (For example, it's not clear from the spec here if component definitions can have map literals as arguments or not. Can lists have trailing commas? Be empty? Be empty except for just a comma?)

Also a great idea. We won't be giving it to the LLM because that's too verbose, but we need it for the compiler and it would help formalize the language.

I'm always excited to see people doing novel language design to approach a problem a different way! :D

Thanks! This one is weird because

it's not meant to be authored, or even read, by a human
it's description needs to be easily, accurately, and completely describable in a minimum of English words in a prompt. and
it's throw-away code: we're not going to compile it to machine code, optimize it, etc. and
we're actively trying to stay away from making a Turing complete language so that we can avoid all the security and complexity implications of that.

chenman9226 · 2026-06-22T03:47:35Z

One thing I’m curious about: since Express drops many of the JSON/property keys, have you observed any impact on generation quality from losing those semantic cues? For example, keys like label, value, children, action, or validation seem like they might help the model understand which argument is which, especially for less common components or longer positional signatures.

I can see the token-efficiency benefit, so I’m mostly wondering whether this showed up in practice, or whether the shorter format generally outweighed the loss of those semantic anchors.

gspencergoog · 2026-06-22T16:14:01Z

One thing I’m curious about: since Express drops many of the JSON/property keys, have you observed any impact on generation quality from losing those semantic cues? For example, keys like label, value, children, action, or validation seem like they might help the model understand which argument is which, especially for less common components or longer positional signatures.

I can see the token-efficiency benefit, so I’m mostly wondering whether this showed up in practice, or whether the shorter format generally outweighed the loss of those semantic anchors.

The thing I thought might be a problem, but doesn't appear to be, is that LLMs aren't great at counting, so I thought that positional parameters would be the issue (getting them in the wrong order, or inserting something between them).

The lack of property keys also doesn't seem to affect quality as long as the descriptions from the JSON schema in the catalog are included. For example, if the catalog item for TextField is converted to this:

• TextField(label, value?, placeholder?, variant? (static only), weight? (static only), checks? (static only))
  - label: The text label for the input field.
  - value: The value of the text field.
  - placeholder: The placeholder text for the input field.
  - variant: The type of input field to display. Must be one of: 'longText', 'number', 'shortText', 'obscured'
  - weight: The relative weight of this component within a Row or Column. This is similar to the CSS 'flex-grow' property. Note: this may ONLY be set when the component is a direct descendant of a Row or Column.

Then the LLM has enough context to be able to decide what each argument means when it writes the output. We may have to play with how the catalogs are converted to prompts to minimize the tokens in the prompt, but we will need to include the entire descriptions that are supplied by the catalog developers because they can include important instructions for how to use the components.

In fact, for a while I wasn't including any of the descriptions at all (which saved a lot of input tokens!), and as long as the parameter was something well named and intuitive, the LLM seemed to be able to extrapolate. If anything was vague or unconventional then it started to break down, however.

Resolved merge conflicts in eval/a2ui_eval/scorers.py and eval/tasks.py. Refactored express evaluation solvers to be dynamic and resolve the catalog path from TaskState metadata at runtime.

gspencergoog · 2026-06-22T21:51:47Z

FYI, I created a stack of three PRs that I split this PR into:

feat(validation): Support A2UI v1.0 validation and nested reference path checks #1718
- The core library and validation layers for the A2UI v1.0 and Express compiler outputs. It introduces support for v1.0 schemas and enhances component reference resolution for nested slot structures.
feat(express): Implement A2UI Express compiler, decompiler and parser gspencergoog/A2UI#3
- The A2UI Express technical specification, introducing the A2UI Express format. It includes the compiler, decompiler, schema helper, and parser modules, as well as a EBNF format description of the format.
feat(eval): Integrate A2UI Express into evaluation suite gspencergoog/A2UI#4
- Integrates the A2UI Express compiler and prompt strategies into the Inspect-ai evaluation suite. It updates the scorers, solvers, and task configurations to support v1.0 and Express DSL targets.

The last two are diffs on the first one and each other to make a stacked set of PRs. The last two are PRs on my fork, not on the main repo, until we roll the first one into the repo, and then I'll change their target to the main repo and land them too.

I converted this PR back to a draft and will close it once the other PRs land.

…oducing a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It includes the compiler, decompiler, schema helper, and parser modules. This contains the A2UI Express compiler/decompiler portions of a2ui-project#1678, with some additional issues fixed, additional tests, and refinements. * **Compiler & Parser**: Implemented the `ExpressCompiler` and `Parser` in `a2ui.experimental.express` to parse line-oriented DSL and compile it into standard A2UI v1.0 JSON. Supports standard strings, raw strings (`r"..."`), raw multi-line strings (`r"""..."""`), and partial streaming recovery. * **Strict Enum Validation**: Added strict validation for component property enums to raise ValueError on invalid inputs instead of silently ignoring them. * **Event Context Compilation**: Simplified event context processing to avoid redundant compilation. * **Decompiler**: Implemented `ExpressDecompiler` to convert standard v1.0 JSON payloads back into compact Express DSL. * **Schema Helper & Prompt Generator**: Implemented `ExpressPromptGenerator` to compile active catalog schemas into positional signatures used by generative models. * **Examples**: Added 36 `.a2ui` layout examples and corresponding compiled `.json` targets. * **Format Checks**: Integrated `pyink` style verification for specification proposals. The feature is fully experimental and gated behind the `A2UI_EXPRESS_ENABLED=true` environment variable. It does not affect any stable production paths. * Added 44 comprehensive unit tests in `tests/express/` covering parser correctness, thread-safe compilation, raw string escaping, strict enum validation, and round-trip integrity.

…s into the Inspect-ai evaluation suite. It updates the scorers, solvers, and task configurations to support v1.0 and Express DSL targets. This contains the portions of a2ui-project#1678 which integrates A2UI Express into the evaluation suite, with some additional issues fixed, additional tests, and refinements. Changes: * **Inspect Solver**: Implemented the `express` solver strategy in `eval/a2ui_eval/strategies/express.py` to rewrite prompts for Express DSL targets and extract `<a2ui>` sentinel blocks. * **Inspect Scorer**: Updated `a2ui_scorer` to support v1.0 and compile generated Express outputs before validating them against the schema. * **Datasets**: Added the translated `v1_0_prompts.yaml` dataset containing prompt targets updated for v1.0 component requirements. * **Documentation**: Checked in `express_dsl_examples.md` describing component signatures for model context. * **Unit Tests**: Added `test_strategies.py` and updated CI test runs to verify the evaluation strategies. Impact & Risks: None. This is an evaluation-only integration and does not affect runtime SDK paths. Testing: * Added 23 integration tests covering dataset loading, scoring, and solver rewriting. All tests pass successfully.

…#1726) ## Summary This PR implements the A2UI Express technical specification, introducing A2UI Express — a highly compressed, model-optimized declarative syntax (DSL) for generative user interfaces. It provides a complete end-to-end Python implementation, including an ANTLR-based compiler, a decompiler, a schema-based system prompt generator, helper scripts, and comprehensive test suites. This is a refined, standalone extraction of the A2UI Express compiler/decompiler portions originally proposed in PR #1678, incorporating automated parser generation, strict validation, and extensive test coverage. ## Changes * **Build System & Code Generation**: * Added Hatch build hook in [pack_specs_hook.py](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/pack_specs_hook.py) to automatically compile the ANTLR grammar [Express.g4](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/src/a2ui/experimental/express/Express.g4) into Python3 source files at build-time. * The build hook handles target case-insensitive file renaming to clean snake_case (`express_lexer.py`, `express_parser.py`, `express_visitor.py`), relative import post-processing, and automatic formatting of generated code with `pyink`. * Updated [pyproject.toml](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/pyproject.toml) to include `antlr4-python3-runtime` as a runtime dependency, and `antlr4-tools` in the build system requirements. * **Compiler & Parser** (`a2ui.experimental.express`): * Implemented an ANTLR-based parsing pipeline using `Express.g4` to parse line-oriented declarative layout files. * The `ExpressCompiler` compiles the AST directly into standard A2UI v1.0 JSON payloads (with dynamic positional parameter resolution and variable flattening). * Supports rich string types: standard strings, raw strings (`r"..."`), raw multiline strings (`r"""..."""`), and escaped carriage returns. * Integrates a partial parser mode supporting streaming recovery for incomplete layouts. * Incorporates strict enum validation for component properties, raising `ValueError` on mismatch rather than silently ignoring invalid values. * **Decompiler**: * Implemented `ExpressDecompiler` to convert standard A2UI v1.0 JSON payloads back into the highly compact, line-oriented Express DSL. * **Schema Helper & Prompt Generator**: * Implemented `CatalogSchemaHelper` to parse catalog schema definitions. * Implemented `ExpressPromptGenerator` to compile active catalog schemas into positional signatures used to prompt generative models. * **Evaluation & Testing Scripts**: * Added `run_inference.py` to evaluate the A2UI Express prompt contract by converting JSON examples to Express DSL via Gemini/Ollama/MLX models and validating the round-trip compilation. * Added `recreate_dsl_examples.py` to programmatically regenerate the dynamic markdown documentation. * **Documentation & Examples**: * Added comprehensive layout examples under `specification/proposals/express/examples/*.a2ui` (36 files) along with their corresponding compiled JSON targets. * Created [README.md](file:///Users/gspencer/code/a2ui/main/specification/proposals/express/README.md) and [a2ui_express.md](file:///Users/gspencer/code/a2ui/main/specification/proposals/express/a2ui_express.md) detailing the DSL grammar, compiler mechanics, and usage. * Created [express_dsl_examples.md](file:///Users/gspencer/code/a2ui/main/specification/proposals/express/express_dsl_examples.md) detailing the active system prompt contract and compiled weather forecast examples. ## Impact & Risks * The feature is fully experimental, contained in the `a2ui.experimental.express` namespace, and gated behind the `A2UI_EXPRESS_ENABLED=true` environment variable. * There is no impact on stable production paths or other existing SDK modules. * Build-time code generation introduces a dependency on `antlr4` (via `antlr4-tools` and `antlr4-python3-runtime`) during development/builds, which is automatically resolved by Hatch and standard pip/uv environments. ## Testing * Added 44 robust unit tests under `agent_sdks/python/a2ui_agent/tests/express/` including: * [test_compiler.py](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/tests/express/test_compiler.py): Verifies parser correctness, token parsing, raw string handling, and carriage return unescaping. * [test_decompiler.py](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/tests/express/test_decompiler.py): Validates round-trip integrity (JSON -> Express -> JSON). * [test_integration.py](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/tests/express/test_integration.py): Tests the compiler against all 36 catalog layout examples. * [test_cli_tools.py](file:///Users/gspencer/code/a2ui/main/agent_sdks/python/a2ui_agent/tests/express/test_cli_tools.py): Tests script interfaces and prompt generation. * The tests can be executed via the standard Dart/Python test runners (e.g. `uv run pytest`).

github-project-automation Bot added this to A2UI Jun 16, 2026

github-project-automation Bot moved this to Todo in A2UI Jun 16, 2026

gspencergoog force-pushed the a2ui_express_base branch from 7d7fdeb to e242e09 Compare June 16, 2026 21:27

This comment was marked as resolved.

Sign in to view

gspencergoog requested a review from jiahaog June 16, 2026 23:04

gspencergoog force-pushed the a2ui_express_base branch from f696fd7 to cc1114d Compare June 18, 2026 01:07

gspencergoog force-pushed the a2ui_express_base branch from dfcec4a to 71b9534 Compare June 18, 2026 01:23

gspencergoog marked this pull request as ready for review June 18, 2026 01:23

gspencergoog requested a review from jacobsimionato June 18, 2026 01:26

gemini-code-assist Bot reviewed Jun 18, 2026

View reviewed changes

jacobsimionato reviewed Jun 18, 2026

View reviewed changes

jiahaog reviewed Jun 18, 2026

View reviewed changes

yjbanov reviewed Jun 18, 2026

View reviewed changes

munificent reviewed Jun 18, 2026

View reviewed changes

gspencergoog force-pushed the a2ui_express_base branch from 46b3073 to 6c527e0 Compare June 18, 2026 19:48

gspencergoog added 4 commits June 18, 2026 15:27

style(express): convert prompts dataset to literal multi-line YAML bl…

45d5233

…ocks - Convert all multi-line and long strings in v1_0_prompts.yaml to use literal block scalar notation (| or |-). - Remove all manual quote escapes and newline characters from the prompt entries.

style(express): clean final low-level payload terms from prompts

f7da8dd

- Replace legacy references to createSurface and updateComponents in registration, cart, openUrl, and nestedLayout prompts with format-agnostic descriptions.

gspencergoog force-pushed the a2ui_express_base branch from 40c7583 to ce0af3c Compare June 18, 2026 23:50

gspencergoog added 2 commits June 22, 2026 09:30

Fix static types that aren't static types

a9b6345

Merge branch 'main' into a2ui_express_base

5bbf529

Resolved merge conflicts in eval/a2ui_eval/scorers.py and eval/tasks.py. Refactored express evaluation solvers to be dynamic and resolve the catalog path from TaskState metadata at runtime.

gspencergoog marked this pull request as draft June 22, 2026 20:52

gspencergoog mentioned this pull request Jun 23, 2026

feat(express): Implement A2UI Express compiler, decompiler and parser #1726

Merged


		The syntax supports three literal primitive types:

		- Strings are enclosed in straight double quotes, for example `"Enter your name"`.


		## Compilation pipeline

		The compilation pipeline runs on the host application. It takes the plain text stream of A2UI Express, processes it, and emits a standard A2UI v1.0 JSON payload.


		The design of A2UI Express focuses on four main requirements:

		- Token footprint reduction. Generative models spend excessive output tokens when producing verbose JSON structures. A2UI Express removes structural keys, brackets, and repeated quotes, reducing output tokens by 55% to 70% compared to native A2UI wire payloads.


		### Variable declarations

		Every component definition is assigned to a unique, alphanumeric variable. The compiler uses these variables to resolve parent-child hierarchies. A reserved variable named `root` acts as the primary entry point for the interface tree.


		Every component definition is assigned to a unique, alphanumeric variable. The compiler uses these variables to resolve parent-child hierarchies. A reserved variable named `root` acts as the primary entry point for the interface tree.

		To eliminate syntax errors from complex bracket structures and enable line-oriented streaming compilation, A2UI Express prohibits inline component nesting. Component constructor calls (e.g., `Text(...)`, `Column(...)`) can only appear on the right-hand side of a variable assignment (`var = ComponentName(...)`). They cannot be passed directly as positional arguments to other components. Instead, you must declare them separately and reference their variable names.


		### Validation and logic expressions

		Validation checks are defined using the `?` prefix. If a component expects validation rules, the compiler converts these expressions into standard client-side functions:


		### Line parsing and tokenization

		The compiler reads the input text line-by-line. It discards empty lines and parses assignments into tokens.


		If the compiler encounters a syntax error or catalog schema mismatch during parsing, it triggers a structured error recovery workflow:

		1. Isolation. The compiler flags the invalid line, discards that sub-branch of the AST, and continues parsing the remaining lines to avoid collapsing the user interface.

Conversation

gspencergoog commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evaluation Results

Changes

Python Agent SDK

Specification & Proposals

Evaluations Suite

Impact & Risks

Testing

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobsimionato left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobsimionato commented Jun 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiahaog commented Jun 18, 2026

Uh oh!

gspencergoog commented Jun 18, 2026

Uh oh!

gspencergoog commented Jun 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gspencergoog commented Jun 16, 2026 •

edited

Loading

gspencergoog Jun 18, 2026 •

edited

Loading

jacobsimionato Jun 19, 2026 •

edited

Loading