zhangxinyuezhangxinyue ・ Jan. 28, 2025
DeepSeek Releases Open-Source Multimodal AI Model Janus-Pro, Surpassing DALL-E 3 and Stable Diffusion
The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

TMTPOST -- In the early hours of Tuesday, the AI community was abuzz as Hugging Face announced the release of DeepSeek's latest open-source multimodal AI model, Janus-Pro. Available in two configurations with 1 billion and 7 billion parameters, the model marks a significant leap in AI capabilities.

The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

Janus-Pro integrates cutting-edge advancements in multimodal AI. The model's ability to process and understand images is powered by the innovative SigLIP-L architecture, while its image generation capabilities draw inspiration from LlamaGen. The model is offered in two sizes, with configurations at 1.5 billion and 7 billion parameters, catering to a range of computational needs.

This launch comes at a time when OpenAI's highly anticipated multimodal image-generation model, GPT-4o, remains unavailable to the public, adding to the excitement surrounding Janus-Pro's open-source debut.

DeepSeek has been at the forefront of multimodal generative AI research. The company launched its original Janus model in late 2024 as a unified framework for understanding and generating multimodal content. Built on DeepSeek-LLM-1.3b-base, Janus utilized a massive dataset of 500 billion text tokens for training. Its design decoupled visual encoding to optimize both understanding and generation tasks, employing advanced techniques like SigLIP-L for visual input and an innovative rectified flow for image generation.

This progress culminated in Janus-Pro, an enhanced self-regressive framework with significant architectural refinements. By decoupling visual encoding into independent pathways, Janus-Pro eliminates previous conflicts in understanding and generation tasks while maintaining a unified Transformer architecture. This modularity improves flexibility and task-specific performance.

Janus-Pro is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, trained using HAI-LLM, a high-performance distributed training framework on PyTorch. The training involved clusters of 16 to 32 nodes, each equipped with 8 Nvidia A100 GPUs, and required 7–14 days depending on the model size.

The complete Janus-Pro codebase is now available on GitHub: Janus GitHub Repository.

DeepSeek’s rapid advancements in multimodal AI may heighten competition with industry giants such as OpenAI, Meta, and Nvidia. However, the company has faced challenges, including recent large-scale cyberattacks on its online services. To mitigate these issues, DeepSeek has temporarily restricted new user registrations outside China, requiring international users to register using virtual numbers.

With Janus-Pro setting new standards for multimodal AI, the industry eagerly anticipates further developments, including potential advancements in text-to-image and text-to-video capabilities. 

LIKE 0
Related Posts
Elon Musk Predicts Robots Will Outnumber Humans, Lays Out Ambitious AI and Space Plans at Davos
Elon Musk Predicts Robots Will Outnumber Humans, Lays Out Ambitious AI and Space Plans at Davos
How U.S. and Chinese CIOs See Their Role in Responsible AI and Cross-border Business?
How U.S. and Chinese CIOs See Their Role in Responsible AI and Cross-border Business?
Dewu Finance Makes Debut in HK, Marking First Step to Tokenize China’s Supply Chain Assets
Dewu Finance Makes Debut in HK, Marking First Step to Tokenize China’s Supply Chain Assets
Jensen Huang Calls AI 'Largest Infrastructure Buildout in Human History' at Davos
Jensen Huang Calls AI 'Largest Infrastructure Buildout in Human History' at Davos
TSMC Rallies After Beating Earnings Forecasts, Lifts 2026 Outlook
TSMC Rallies After Beating Earnings Forecasts, Lifts 2026 Outlook
Alibaba’s Qwen App Launches AI Shopping With Full Ecosystem Integration
Alibaba’s Qwen App Launches AI Shopping With Full Ecosystem Integration

  • Subscribe To Our News