zhangxinyuezhangxinyue ・ Jan. 28, 2025
DeepSeek Releases Open-Source Multimodal AI Model Janus-Pro, Surpassing DALL-E 3 and Stable Diffusion
The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

TMTPOST -- In the early hours of Tuesday, the AI community was abuzz as Hugging Face announced the release of DeepSeek's latest open-source multimodal AI model, Janus-Pro. Available in two configurations with 1 billion and 7 billion parameters, the model marks a significant leap in AI capabilities.

The Janus-Pro-7B model has outperformed OpenAI's DALL-E 3 and Stable Diffusion in benchmark tests such as GenEval and DPG-Bench, establishing its superiority in both image generation and understanding.

Janus-Pro integrates cutting-edge advancements in multimodal AI. The model's ability to process and understand images is powered by the innovative SigLIP-L architecture, while its image generation capabilities draw inspiration from LlamaGen. The model is offered in two sizes, with configurations at 1.5 billion and 7 billion parameters, catering to a range of computational needs.

This launch comes at a time when OpenAI's highly anticipated multimodal image-generation model, GPT-4o, remains unavailable to the public, adding to the excitement surrounding Janus-Pro's open-source debut.

DeepSeek has been at the forefront of multimodal generative AI research. The company launched its original Janus model in late 2024 as a unified framework for understanding and generating multimodal content. Built on DeepSeek-LLM-1.3b-base, Janus utilized a massive dataset of 500 billion text tokens for training. Its design decoupled visual encoding to optimize both understanding and generation tasks, employing advanced techniques like SigLIP-L for visual input and an innovative rectified flow for image generation.

This progress culminated in Janus-Pro, an enhanced self-regressive framework with significant architectural refinements. By decoupling visual encoding into independent pathways, Janus-Pro eliminates previous conflicts in understanding and generation tasks while maintaining a unified Transformer architecture. This modularity improves flexibility and task-specific performance.

Janus-Pro is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, trained using HAI-LLM, a high-performance distributed training framework on PyTorch. The training involved clusters of 16 to 32 nodes, each equipped with 8 Nvidia A100 GPUs, and required 7–14 days depending on the model size.

The complete Janus-Pro codebase is now available on GitHub: Janus GitHub Repository.

DeepSeek’s rapid advancements in multimodal AI may heighten competition with industry giants such as OpenAI, Meta, and Nvidia. However, the company has faced challenges, including recent large-scale cyberattacks on its online services. To mitigate these issues, DeepSeek has temporarily restricted new user registrations outside China, requiring international users to register using virtual numbers.

With Janus-Pro setting new standards for multimodal AI, the industry eagerly anticipates further developments, including potential advancements in text-to-image and text-to-video capabilities. 

LIKE 0
Related Posts
Nvidia Stock Tumbles Up to 7% on Report Signaling Google's Successful Comeback in AI Race
Nvidia Stock Tumbles Up to 7% on Report Signaling Google's Successful Comeback in AI Race
Tencent-backed Prosus Posts 99% Jump in Half-Year Profit as Digital Services and E-Commerce Surge
Tencent-backed Prosus Posts 99% Jump in Half-Year Profit as Digital Services and E-Commerce Surge
JingDong Industrials Targets Up to $500 Million in Hong Kong IPO Next Week, Sources Say
JingDong Industrials Targets Up to $500 Million in Hong Kong IPO Next Week, Sources Say
Autonomous Driving Pioneer Says Autonomous Driving Nears Commercial Inflection as Industry Weighs Pure Vision, Robotics Futures
Autonomous Driving Pioneer Says Autonomous Driving Nears Commercial Inflection as Industry Weighs Pure Vision, Robotics Futures
Amazon to Invest Up to $50 Billion to Build AI infrastructure for U.S. Government, Pour An Extra $15 billion in Indiana Datacenters
Amazon to Invest Up to $50 Billion to Build AI infrastructure for U.S. Government, Pour An Extra $15 billion in Indiana Datacenters
China's Moonshot AI in Talks for New Funding Round That May Value Firm at $4 Billion
China's Moonshot AI in Talks for New Funding Round That May Value Firm at $4 Billion

  • Subscribe To Our News