Tencent Hunyuan Unveils Hunyuan Image 2.0, an Industry-First Real-Time Text-to-Image Model with Millisecond Response
TMTPOST — Tencent’s Hunyuan AI division has released Hunyuan Image 2.0, the industry's first real-time text-to-image generation model capable of millisecond-level response.
The new model significantly scales up parameter size—by tens of times compared to its predecessor—and supports multimodal inputs including text, voice, and sketch.
With just a spoken command, written prompt, or simple line drawing, users can instantly generate realistic images in real time. Hunyuan Image 2.0 is built on a single- and dual-stream DiT (Diffusion Transformer) architecture, which boosts generation efficiency without compromising image quality or detail.
The system also integrates a multimodal large language model (MLLM) as its text encoder, paired with a proprietary structured captioning system. This allows the model to deeply understand semantic input, infer visual intent, and progressively generate images with high fidelity.
More News