zhangxinyuezhangxinyue ・ Jan. 28, 2025
DeepSeek Releases Open-Source Multimodal AI Model Janus-Pro, Surpassing DALL-E 3 and Stable Diffusion
The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

TMTPOST -- In the early hours of Tuesday, the AI community was abuzz as Hugging Face announced the release of DeepSeek's latest open-source multimodal AI model, Janus-Pro. Available in two configurations with 1 billion and 7 billion parameters, the model marks a significant leap in AI capabilities.

The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

Janus-Pro integrates cutting-edge advancements in multimodal AI. The model's ability to process and understand images is powered by the innovative SigLIP-L architecture, while its image generation capabilities draw inspiration from LlamaGen. The model is offered in two sizes, with configurations at 1.5 billion and 7 billion parameters, catering to a range of computational needs.

This launch comes at a time when OpenAI's highly anticipated multimodal image-generation model, GPT-4o, remains unavailable to the public, adding to the excitement surrounding Janus-Pro's open-source debut.

DeepSeek has been at the forefront of multimodal generative AI research. The company launched its original Janus model in late 2024 as a unified framework for understanding and generating multimodal content. Built on DeepSeek-LLM-1.3b-base, Janus utilized a massive dataset of 500 billion text tokens for training. Its design decoupled visual encoding to optimize both understanding and generation tasks, employing advanced techniques like SigLIP-L for visual input and an innovative rectified flow for image generation.

This progress culminated in Janus-Pro, an enhanced self-regressive framework with significant architectural refinements. By decoupling visual encoding into independent pathways, Janus-Pro eliminates previous conflicts in understanding and generation tasks while maintaining a unified Transformer architecture. This modularity improves flexibility and task-specific performance.

Janus-Pro is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, trained using HAI-LLM, a high-performance distributed training framework on PyTorch. The training involved clusters of 16 to 32 nodes, each equipped with 8 Nvidia A100 GPUs, and required 7–14 days depending on the model size.

The complete Janus-Pro codebase is now available on GitHub: Janus GitHub Repository.

DeepSeek’s rapid advancements in multimodal AI may heighten competition with industry giants such as OpenAI, Meta, and Nvidia. However, the company has faced challenges, including recent large-scale cyberattacks on its online services. To mitigate these issues, DeepSeek has temporarily restricted new user registrations outside China, requiring international users to register using virtual numbers.

With Janus-Pro setting new standards for multimodal AI, the industry eagerly anticipates further developments, including potential advancements in text-to-image and text-to-video capabilities. 

LIKE 0
Related Posts
China Narrows AI Talent Gap With U.S. as Research Enters Engineering Phase: Report
China Narrows AI Talent Gap With U.S. as Research Enters Engineering Phase: Report
EU Aims for Agreement in Principle Instead of Detailed Deal with U.S. Ahead Trump's July Deadline
EU Aims for Agreement in Principle Instead of Detailed Deal with U.S. Ahead Trump's July Deadline
Trump May Start Sending Letters Notifying Countries of U.S. Tariff Rates on Friday
Trump May Start Sending Letters Notifying Countries of U.S. Tariff Rates on Friday
Honor Debuts Magic V5 Foldable as AI Pivot Gains Momentum, Confirms IPO Progress
Honor Debuts Magic V5 Foldable as AI Pivot Gains Momentum, Confirms IPO Progress
BYD Rolls Out Brazil Plant in Strategic Push to Localize EV Production Across Latin America
BYD Rolls Out Brazil Plant in Strategic Push to Localize EV Production Across Latin America
U.S. Reported to Remove License Suspension for GE Aerospace to Restart Engine Sales to China's COMAC
U.S. Reported to Remove License Suspension for GE Aerospace to Restart Engine Sales to China's COMAC

  • Subscribe To Our News