Application of AI-Generated Videos in the Film Industry Will Take Time, Says Zhipu AI CEO| TMTPOST

中文

HOME

BRIEF NEWS

OPINION

FEATURES

LIVE

EVENTS

Chelsea_Sun ・ Jul. 26, 2024

Application of AI-Generated Videos in the Film Industry Will Take Time, Says Zhipu AI CEO

Zhang acknowledged that current AI video generation technology cannot fully replace the film industry, but can be used as an auxiliary tool. However, AI can positively impact the film industry.

TMTPOST--Zhipu AI, a leading Chinese AI unicorn, unveiled its AI video generation technology "Ying" on Friday in Beijing. This new product, fully available now to all users on the “Zhipu Qingyan” app, supports both text-to-video and image-to-video generation.

The technology behind Ying is based on Zhipu's self-developed video generation model, CogVideoX. Through technological optimization, the inference speed of this generative video model has increased sixfold, reducing the generation time of a six-second video to a theoretical 30 seconds. Unlike the commonly used DiT architecture, Ying employs a self-developed Transformer architecture that integrates text, time, and space dimensions.

To address content coherence issues, Zhipu AI has developed an efficient three-dimensional variational autoencoder (3D VAE) structure, capable of compressing raw video data to 2% of its original size. This significantly lowers the training cost and difficulty of video diffusion generation models.

Zhang Peng, the CEO of Zhipu AI, said that AI multimodal technology is inspired by the workings of the human brain, which coordinates various functions, including text, vision, and hearing. As an AI company focused on AGI, Zhipu has always prioritized multimodal technology.

"The exploration of multimodal models in the AI industry is still in its early stages. We will continue to strive to provide better models and products," Zhang said.

Additionally, Zhipu AI has created an end-to-end video understanding model that generates precise and contextually relevant descriptions for large volumes of video data. This innovation enhances the model's understanding of text and adherence to instructions, ensuring that the generated videos better meet user inputs.

The CogVideoX model is now available on the PC, mobile app, and mini-program versions of Zhipu Qingyan, featuring rapid generation, efficient command compliance, improved content coherence, and flexible video arrangement.

Specifically, Qingyan offers two modes: text-to-video and image-to-video.

Text-to-video is ideal for imaginative scenarios: a dog dancing on fingertips, dolphins flying into deep space, the universe sparkling for you. Regardless of how complex or abstract the scene, Qingyan can vividly bring it to life with just a few descriptive sentences.

Image-to-video adds more fun to existing pictures: by inputting an image and a simple description, you can animate the picture. Make people in old photos move, bringing memories to life, or have characters from famous paintings and movie stills perform imaginative actions.

During the initial testing phase, all users can use Qingyan for free. For faster processing, users can pay five yuan for a day of high-speed access or 199 yuan for a year of high-speed access.

Zhang acknowledged that current AI video generation technology cannot fully replace the film industry, but can be used as an auxiliary tool. However, AI can positively impact the film industry. Currently, AI might be used in small-scale creations, but fully integrating AI into audience-facing film production will take time.

He emphasized that AI video is mainly used for online e-commerce marketing and short video self-media needs but believes its applications will expand beyond these areas.

In terms of the commercialization of AI video generation, Zhang noted that Zhipu Qingyan is still in its early stages of commercialization, primarily offering paid API access.

"The launch of the Ying feature is a phase-specific achievement. It's not perfect yet and requires ongoing refinement. Our goal is to demonstrate what video generation can achieve for everyone under current conditions, rather than being confined to laboratories," Zhang said.

Zhang highlighted the high costs of computational power and algorithms for video generation. "Building large models is extremely costly and demands market demand and commercialization. Our approach is to innovate from the ground up and commercialize based on these innovations."

Looking ahead, Zhang emphasized that Zhipu aims to position Qingyan as an "AI assistant" to help solve real-life problems, enhance productivity, and improve work convenience.

"We believe that the so-called super app doesn't necessarily have to be 'super.' Instead, it can subtly integrate into daily use, gradually changing people's lives through AI efficiency tools," Zhang said.

About TMTPOST

Join Us

Contribute

Subscribe To Our News