Research / 2025

The role of large language models in UI/UX design - A systematic literature review

Ammar Ahmed, Ali Shariq Imran

LLMs
UI/UX
HCI
generative AI
human-AI collaboration
prompt engineering
multimodal
systematic review

In brief

A systematic review of 38 peer-reviewed studies (2022–2025) mapping how LLMs are integrated across the UI/UX lifecycle, the best practices that enable effective use, and the limitations that still constrain reliability, creativity, and adoption.

Executive Summary

The paper synthesizes current evidence on how large language models are changing UI/UX work, from ideation and prototyping to evaluation and refinement. It identifies the most common LLMs and interaction patterns, distills recurring best practices for integrating models into design tools and workflows, and surfaces seven categories of risks that must be addressed to realize trustworthy, inclusive, and effective human-AI collaboration in design.

Key Technical Advancements

Prompt-based prototyping as a core interaction pattern The review finds natural-language prompting to be the near-universal interface between designers and LLMs. Systems largely rely on zero-shot and few-shot prompting, chain-of-thought for task decomposition, and retrieval-augmented grounding. Typical outputs include code generation for HTML/CSS/JS/SVG, UI components and microcopy, heuristic and accessibility analyses, and persona or content generation.

End-to-end integration across the UI/UX lifecycle LLMs are embedded beyond ideation into prototyping, simulation, evaluation, and iterative refinement. They participate as design collaborators across early research through post-prototype usability work, rather than acting only as back-end automation.

Multimodality and real-time interaction Vision-language and multimodal pipelines process images, screenshots, video, and audio to provide context-aware outputs. Examples include layout suggestions from UI screenshots, live captioning and planning during meetings or VR, and hybrid stacks that pair LLMs with computer vision or OCR to improve semantic understanding of interfaces.

Modular and iterative workflows Effective systems decompose design tasks into interpretable stages and specialized agents, combining natural-language prompting with GUI manipulation. This modularity improves controllability, reduces token overhead, isolates failures, and supports rapid iteration with designer oversight.

Human-AI collaboration and accountability LLMs increasingly act as co-creators that ideate, critique, and even help simulate users in testing. Studies emphasize inclusive design support such as accessibility checks, bias and harm identification, and mechanisms for explanation and evaluation that align with core UX principles of control and feedback.

Practical Implications and Use Cases

Technical impact Embedding LLMs inside existing tools and environments lowers adoption friction and preserves context. Integrations with platforms like Figma, conferencing and collaboration tools, or visual programming interfaces enable real-time feedback, editable pipelines, and traceable iteration, turning prompts into reusable assets akin to design templates.

Design and UX implications Designers benefit when prompts are treated as design artifacts that can be sketched, refined, versioned, and shared. Multimodal inputs expand what “context” means in usability, enabling systems that respond to what users say, see, and do. As LLMs become co-creators, explainability features, dismissible suggestions, and visible error localization help maintain trust and reduce cognitive disruption.

Strategic implications Organizations should prepare for hybrid roles that blend design, machine learning literacy, and ethics. Governance and evaluation practices need to evolve toward transparent, inclusive, and auditable workflows, with shared prompt libraries, dataset stewardship, and standards for assessing creativity, accessibility, and bias in AI-assisted outputs.

Challenges and Limitations

Hallucinations and reliability Generated UI elements, critiques, or code can be incorrect or fabricated, undermining validity and requiring manual verification. Output quality remains inconsistent across similar prompts.

Prompt sensitivity and instability Results depend heavily on phrasing and iterative tuning. Non-determinism limits reproducibility and complicates systematic improvement of designs.

Ambiguity and context limits Text-only models struggle with visual and spatial UI context, multi-screen flows, or persistent interaction history. Token limits and weak memory impair reasoning over richer artifacts.

Creativity constraints and over-reliance Outputs tend to converge on conventional patterns, which can narrow exploration if designers anchor too early on AI suggestions. Over-dependence risks skill stagnation.

Trust, transparency, and interpretability Black-box behavior hinders debugging and can create a false sense of precision when presented in polished interfaces. Designers need insight into why suggestions were produced.

Ethical, privacy, and legal concerns Risks include data privacy exposure, unclear ownership of generated content, biased training data, and insufficient governance and auditability for accountability and fairness.

Tooling and integration gaps Seamless connections to widely used design tools, real-time collaboration, error recovery, and iteration tracking remain immature. Hardware and deployment constraints also limit practicality.

Future Outlook and Considerations

The review recommends building validation and explanation into tools, such as confidence indications, rationale summaries, bias indicators, and standardized usability metrics that make model behavior legible. Structured prompt support—grammars, templates, and interactive copilots—can reduce trial-and-error overhead while improving consistency across teams. Ethical safeguards should be embedded early through dataset governance, auditing, and inclusivity checks, and the field would benefit from benchmarks for creativity, usability, accessibility, and bias in real-world UI/UX contexts. As LLMs become more deeply integrated, modular and adaptive workflows with human-in-the-loop oversight, coupled with organizational readiness and cross-functional training, will be essential for responsible scale.

Conclusion

The paper shows that LLMs—especially GPT-4 class and multimodal variants—are already functioning as co-creative partners across the UI/UX lifecycle. Best practices coalesce around structured prompting, iterative human supervision, tight tool integration, modular decomposition, multimodal grounding, and mechanisms for transparency and evaluation. Despite progress, persistent issues with hallucinations, prompt instability, explainability, ethics, and tooling must be resolved to align AI-assisted design with principles of transparency, inclusivity, and user-centered design.