Research / 2025

SpecifyUI - Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI

Yunnong Chen, Chengwei Shi, Liuqing Chen

UI design
SPEC
intermediate representation
multimodal
RAG
LLM
HCI
code generation

In brief

The paper introduces SPEC, a structured intermediate representation for UI design that externalizes designer intent and enables controllable, iterative generation. Built on SPEC, the SpecifyUI system extracts specs from references, supports targeted edits, and renders high fidelity UIs with a multi agent pipeline.

Executive Summary

This work tackles two persistent gaps in LLM assisted UI creation—making designer intent explicit and preserving control across iterations. The authors propose SPEC, a hierarchical, parameterized representation that captures global style and page composition, then present SpecifyUI, an interactive system that extracts SPEC from reference UIs, supports scoped edits, and deterministically maps SPEC to executable code. Quantitative benchmarks and a user study with 16 designers show higher fidelity to reference intent and stronger controllability versus prompt based baselines and a commercial tool.

Key Technical Advancements

SPEC schema for global plus page composition SPEC formalizes design intent in two layers. The Global UI Specification encodes layout structure, color system, shape language, and usage scenario, while Page Composition recursively decomposes pages into sections and components with parameterized attributes and semantic tags. This hybrid of numeric fields and short semantic labels makes intent both machine actionable and designer friendly.

Automated SPEC construction from references The pipeline segments reference screenshots into coherent regions with a trained Co-DETR detector, then uses a multimodal LLM to produce Region SPEC Units that enumerate layout, components, roles, and styles. A separate pass extracts a Global Design Profile from the full page and integrates it with region outputs to form a unified SPEC.

Direct, scoped editing via structured operations User edits—expressed in natural language or by selecting reference elements—are translated into triplets <operation, path, value> that target precise SPEC nodes. A guarded application loop validates edits and triggers repair on failures, preserving structure while enabling global, regional, or component level changes without unintended drift.

Multi agent UI code generation with retrieval and self correction Final SPECs are rendered by a generator based on Qwen3-Coder, augmented with retrieval from a curated SPEC–code database and a debug agent that compiles and feeds error reports back for revision. The team assembled approximately 2,500 web UIs and retained 2,000 SPEC–code pairs after validation to ground generation. React is the target framework with Ant Design components and ECharts for visualization.

Measured fidelity gains over prompt baselines Against three prompting baselines, the integrated SPEC pipeline achieves MSE 40.99, CLIP 0.887, SSIM 0.854, improving reconstruction error and semantic plus structural similarity. Ablations attribute gains to structure, retrieval grounding, and region cropping as complementary strategies.

Practical Implications and Use Cases

Technical impact SPEC provides a stable interface between human intent and model behavior, turning free form prompting into parameter updates. Designers can compose from multiple references—borrowing layout from one, style from another—while the system preserves coherence and deterministically regenerates UI code, reducing rework and code handoff friction.

Design and UX implications By exposing a visual hierarchy aligned to SPEC, SpecifyUI supports the natural coarse to fine rhythm of professional workflows. Reference driven generation jumpstarts early exploration, and SPEC based edits enable precise, non destructive refinement of layout, components, and style with far less text entry.

Strategic implications Specification driven pipelines encourage reuse and standardization by linking to component libraries and domain templates. They also point toward production oriented co creation where AI outputs are not static mockups but structured, interoperable artifacts that flow into engineering.

Challenges and Limitations

Upfront overhead and communication friction Building an explicit SPEC adds effort during initial generation, though it pays back in faster refinement. Participants rated intent communication as improved yet still noted that expressing early, fuzzy ideas remained easier through quick prompting, indicating a tradeoff between speed of ideation and precision of control.

Modalities and scope Current interactions center on references and text; the paper argues for sketch based and other non verbal inputs to capture spatial and stylistic intent more naturally. The present pipeline focuses on single screen designs, with future work needed on page level interactions, data binding, and multi page navigation.{index=10}

Future Outlook and Considerations

The authors envision extending SPEC as a shared collaboration layer spanning UI understanding, retrieval, code generation, and debugging, with richer input channels like sketches and direct manipulation. Integrating established design systems and domain knowledge, plus expanding beyond single pages to connected prototypes, would further tighten the designer–developer loop and push toward deployable artifacts.

Conclusion

The paper reframes LLM assisted UI design from one shot prompting to specification driven iteration. With SPEC as the intermediate representation and an accompanying system that extracts, edits, and renders specs, SpecifyUI improves fidelity to intent and gives designers reliable, scoped control over changes, demonstrating measurable gains over prompt baselines and a commercial benchmark.