Google says that it is almost ready to reveal its own AI-powered text-to-video generator, which they’re calling Google Imagen Video. Meta had just released their text-to-video generator a few days in advance.
Although the generator is still in the development stage, when it is ready for public release, it will be able to create 1280×768 videos at a frame rate of 24 frames per second from a simple written prompt.
The research paper from Google claims that Imagen Video will be able to produce videos that are stylistically similar to those of well-known artists like Vincent van Gough. Additionally, it will produce text in different animation styles and 3D rotating objects while maintaining their structural integrity.
Google’s new Imagen Video Al turns text descriptions into high resolutions 5.3 second long videos🤩🤩🤩 pic.twitter.com/KhvsvGqLFh— Tansu YEĞEN (@TansuYegen) October 8, 2022
According to Google, Imagen Video was trained using the LAION image-text dataset, which was also used to train Stable Diffusion, as well as 14 million video-text pairs, 60 million image-text pairs, and more.
Google aspires to “significantly reduce the difficulty of high-quality content generation” with its AI-video model. Google’s Imagen, a text-to-image programme akin to OpenAI’s DALL-E, serves as the foundation for Imagen Video.
According to what Google’s research experts have to say, Imagen Video will take a text description and produce a 16-frame, three-frames-per-second video with a resolution of 2448 pixels. After upscaling and “predicting” more frames, the system creates a final 128-frame, 24-frames-per-second video at 720p.
It is important to note that Google selects all of the results from Imagen Video, and that no outside testers have used the programme as of yet.
However, the study asserts that Imagen Video can render text accurately, whereas DALL-E and Stable Diffusion find it difficult to do so. These programmes produce text that is hardly readable.
Additionally, it asserts that Imagen Video has shown an understanding of depth and three-dimensionality by enabling the creation of drone flythrough videos that rotate and accurately capture objects from various angles.
Google has expressed its disapproval of the “problematic data” used to develop its AI-based image-generating software. Social stereotypes and cultural biases, as well as content that is sexually explicit or violent, have all been attempted to be filtered out by the company. It is concerned that the tool could be used to “generate, fake, hateful, explicit, or harmful content.”
Google adds, “We have decided not to release the Imagen Video model or its source code until these concerns are addressed.”