zhangxinyuezhangxinyue ・ Jan. 28, 2025
DeepSeek Releases Open-Source Multimodal AI Model Janus-Pro, Surpassing DALL-E 3 and Stable Diffusion
The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

TMTPOST -- In the early hours of Tuesday, the AI community was abuzz as Hugging Face announced the release of DeepSeek's latest open-source multimodal AI model, Janus-Pro. Available in two configurations with 1 billion and 7 billion parameters, the model marks a significant leap in AI capabilities.

The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

Janus-Pro integrates cutting-edge advancements in multimodal AI. The model's ability to process and understand images is powered by the innovative SigLIP-L architecture, while its image generation capabilities draw inspiration from LlamaGen. The model is offered in two sizes, with configurations at 1.5 billion and 7 billion parameters, catering to a range of computational needs.

This launch comes at a time when OpenAI's highly anticipated multimodal image-generation model, GPT-4o, remains unavailable to the public, adding to the excitement surrounding Janus-Pro's open-source debut.

DeepSeek has been at the forefront of multimodal generative AI research. The company launched its original Janus model in late 2024 as a unified framework for understanding and generating multimodal content. Built on DeepSeek-LLM-1.3b-base, Janus utilized a massive dataset of 500 billion text tokens for training. Its design decoupled visual encoding to optimize both understanding and generation tasks, employing advanced techniques like SigLIP-L for visual input and an innovative rectified flow for image generation.

This progress culminated in Janus-Pro, an enhanced self-regressive framework with significant architectural refinements. By decoupling visual encoding into independent pathways, Janus-Pro eliminates previous conflicts in understanding and generation tasks while maintaining a unified Transformer architecture. This modularity improves flexibility and task-specific performance.

Janus-Pro is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, trained using HAI-LLM, a high-performance distributed training framework on PyTorch. The training involved clusters of 16 to 32 nodes, each equipped with 8 Nvidia A100 GPUs, and required 7–14 days depending on the model size.

The complete Janus-Pro codebase is now available on GitHub: Janus GitHub Repository.

DeepSeek’s rapid advancements in multimodal AI may heighten competition with industry giants such as OpenAI, Meta, and Nvidia. However, the company has faced challenges, including recent large-scale cyberattacks on its online services. To mitigate these issues, DeepSeek has temporarily restricted new user registrations outside China, requiring international users to register using virtual numbers.

With Janus-Pro setting new standards for multimodal AI, the industry eagerly anticipates further developments, including potential advancements in text-to-image and text-to-video capabilities. 

LIKE 0
Related Posts
KKR Nears Completion of Dayao Soda Buyout in Rare Foreign Takeover of Chinese Beverage Brand
KKR Nears Completion of Dayao Soda Buyout in Rare Foreign Takeover of Chinese Beverage Brand
Trump Said to Sign Executive Orders Promoting Exports of U.S. Chips and AI Tools
Trump Said to Sign Executive Orders Promoting Exports of U.S. Chips and AI Tools
AI Can Solve Problems, But That's No Excuse to Stop Learning, Says Nvidia CEO
AI Can Solve Problems, But That's No Excuse to Stop Learning, Says Nvidia CEO
India In Trade Talks Reported to Seek U.S.Tariffs Lower Than Indonesia
India In Trade Talks Reported to Seek U.S.Tariffs Lower Than Indonesia
U.S. Services May be Added to EU's Retaliatory Target List as More Members Seek Powerful Trade Tool If Talks Fail
U.S. Services May be Added to EU's Retaliatory Target List as More Members Seek Powerful Trade Tool If Talks Fail
Nvidia CEO: Next Wave of AI is "Physical AI," Taps China's Expanding Role in Global AI Ecosystem
Nvidia CEO: Next Wave of AI is "Physical AI," Taps China's Expanding Role in Global AI Ecosystem

  • Subscribe To Our News