This project addresses a key challenge in smart manufacturing: how to generate realistic, editable, and engineering-aware industrial 3D scenarios from natural-language requirements and iterative user feedback. The vision is to support agentic and interactive digital twins that are not only visually plausible, but also operationally meaningful for downstream applications such as robotic simulation, training, planning, embodied AI, human–robot collaboration, workflow testing, factory reconfiguration, and virtual commissioning.
The research will tackle four interconnected problems. First, it will investigate how natural-language industrial requirements can be translated into a structured, interpretable, and verifiable scene representation that captures objects, spatial relations, functional zones, workflow logic, and engineering constraints. Second, it will address how iterative user feedback can be supported through localised, controllable scene edits while preserving global spatial consistency, design intent, and constraint satisfaction. Third, it will study how to maintain consistency between 2D layout planning and 3D scene generation, so that spatial configurations, semantic relations, and engineering rules remain aligned across modalities. Fourth, it will evaluate whether generated scenes are realistic, constraint-compliant, and operationally useful for applications including robotic planning and training, embodied AI interaction, simulation-based optimisation, and manufacturing decision support.
To address these challenges, the project will develop four methodological components. The first is a domain-adapted scene planner, distilled from a strong LLM-based foundation model and augmented with industrial knowledge. This planner will map natural-language requirements into a hierarchical industrial scene representation spanning object, zone, and workflow levels, together with an initial 2D layout. A constraint completion module will infer implicit engineering rules, such as robot safety buffers, equipment clearance, minimum aisle widths, and access constraints.
The second component is a two-stage local refinement engine for 2D layout editing. In Stage A, a subgraph-based local proposal mechanism will generate targeted updates in response to user feedback or revised requirements. In Stage B, a constraint repair module will resolve collisions, clearance violations, blocked access paths, and other local inconsistencies, while preserving key global properties of the scene. This will allow iterative and controllable editing without destabilising the overall design.
The third component is a cross-modal generation framework built around a canonical scene state that links the validated 2D layout to the generated 3D scene. This will include a 2D-to-3D instantiation module, a cross-modal consistency objective, and a round-trip validation strategy to ensure that semantics, geometry, and engineering constraints remain coherent between representations. The aim is to create a trustworthy 2D-to-3D pipeline suitable for industrial deployment rather than purely visual generation.
The fourth component is a multi-level evaluation framework. This will combine expert visual scoring, geometric comparison against reference layouts, rule-based engineering checks, and downstream case-study testing in robotics simulation. These evaluations will assess scene fidelity, functional feasibility, safety compliance, and usefulness for planning, training, and interaction tasks.
The expected contributions are: a constraint-aware framework for industrial scene generation; a structured and verifiable representation of industrial design intent; a trustworthy and editable 2D-to-3D generation pipeline; and a practical evaluation protocol for robotics, embodied AI, and broader smart manufacturing applications. More broadly, the project will help bridge generative AI, digital twins, and industrial intelligence, enabling reusable virtual environments for training, validation, optimisation, and future autonomous manufacturing systems.