Chelsea_SunChelsea_Sun ・ Jul. 26, 2024
Application of AI-Generated Videos in the Film Industry Will Take Time, Says Zhipu AI CEO
Zhang acknowledged that current AI video generation technology cannot fully replace the film industry, but can be used as an auxiliary tool. However, AI can positively impact the film industry.

TMTPOST--Zhipu AI, a leading Chinese AI unicorn, unveiled its AI video generation technology "Ying" on Friday in Beijing. This new product, fully available now to all users on the “Zhipu Qingyan” app, supports both text-to-video and image-to-video generation.

The technology behind Ying is based on Zhipu's self-developed video generation model, CogVideoX. Through technological optimization, the inference speed of this generative video model has increased sixfold, reducing the generation time of a six-second video to a theoretical 30 seconds. Unlike the commonly used DiT architecture, Ying employs a self-developed Transformer architecture that integrates text, time, and space dimensions.

To address content coherence issues, Zhipu AI has developed an efficient three-dimensional variational autoencoder (3D VAE) structure, capable of compressing raw video data to 2% of its original size. This significantly lowers the training cost and difficulty of video diffusion generation models.

Zhang Peng, the CEO of Zhipu AI, said that AI multimodal technology is inspired by the workings of the human brain, which coordinates various functions, including text, vision, and hearing. As an AI company focused on AGI, Zhipu has always prioritized multimodal technology.

"The exploration of multimodal models in the AI industry is still in its early stages. We will continue to strive to provide better models and products," Zhang said.

Additionally, Zhipu AI has created an end-to-end video understanding model that generates precise and contextually relevant descriptions for large volumes of video data. This innovation enhances the model's understanding of text and adherence to instructions, ensuring that the generated videos better meet user inputs.

The CogVideoX model is now available on the PC, mobile app, and mini-program versions of Zhipu Qingyan, featuring rapid generation, efficient command compliance, improved content coherence, and flexible video arrangement.

Specifically, Qingyan offers two modes: text-to-video and image-to-video.

Text-to-video is ideal for imaginative scenarios: a dog dancing on fingertips, dolphins flying into deep space, the universe sparkling for you. Regardless of how complex or abstract the scene, Qingyan can vividly bring it to life with just a few descriptive sentences.

Image-to-video adds more fun to existing pictures: by inputting an image and a simple description, you can animate the picture. Make people in old photos move, bringing memories to life, or have characters from famous paintings and movie stills perform imaginative actions.

During the initial testing phase, all users can use Qingyan for free. For faster processing, users can pay five yuan for a day of high-speed access or 199 yuan for a year of high-speed access.

Zhang acknowledged that current AI video generation technology cannot fully replace the film industry, but can be used as an auxiliary tool. However, AI can positively impact the film industry. Currently, AI might be used in small-scale creations, but fully integrating AI into audience-facing film production will take time.

He emphasized that AI video is mainly used for online e-commerce marketing and short video self-media needs but believes its applications will expand beyond these areas.

In terms of the commercialization of AI video generation, Zhang noted that Zhipu Qingyan is still in its early stages of commercialization, primarily offering paid API access.

"The launch of the Ying feature is a phase-specific achievement. It's not perfect yet and requires ongoing refinement. Our goal is to demonstrate what video generation can achieve for everyone under current conditions, rather than being confined to laboratories," Zhang said.

Zhang highlighted the high costs of computational power and algorithms for video generation. "Building large models is extremely costly and demands market demand and commercialization. Our approach is to innovate from the ground up and commercialize based on these innovations."

Looking ahead, Zhang emphasized that Zhipu aims to position Qingyan as an "AI assistant" to help solve real-life problems, enhance productivity, and improve work convenience.

"We believe that the so-called super app doesn't necessarily have to be 'super.' Instead, it can subtly integrate into daily use, gradually changing people's lives through AI efficiency tools," Zhang said.

LIKE 0
Related Posts
Didi and Meituan Testing “DC Super Assistant” and “Miao Shua” App Amid Accelerated Development of Generative AI
Didi and Meituan Testing “DC Super Assistant” and “Miao Shua” App Amid Accelerated Development of Generative AI
Over 250,000 Washington Post Readers Cancel Subscriptions in Protest Against Non-Endorsement
Over 250,000 Washington Post Readers Cancel Subscriptions in Protest Against Non-Endorsement
BYD First Overtakes Tesla to be Top EV Maker by Quarterly Revenue
BYD First Overtakes Tesla to be Top EV Maker by Quarterly Revenue
Google Cloud Q3 Revenue Hits Record with a 35% Surge, Buoyed by AI
Google Cloud Q3 Revenue Hits Record with a 35% Surge, Buoyed by AI
Capturing the Middle Eastern Market: A Digital Gateway to $2.7 Trillion in Opportunities
Capturing the Middle Eastern Market: A Digital Gateway to $2.7 Trillion in Opportunities
China Firmly Opposes US Final Rule to Impose Investment Curbs on AI,Chip and Quantum Tech
China Firmly Opposes US Final Rule to Impose Investment Curbs on AI,Chip and Quantum Tech

  • Subscribe To Our News