Microsoft unveils VASA-1, creating next-level AI video from a single static image

From a single image (above left), VASA-1 can create lifelike video using an audio track. The portrait images depicted above are fictional, created entirely by AI. (Source: Microsoft Research)

Earlier today Microsoft researchers unveiled a first look at VASA-1, a new system for generating lifelike talking heads of virtual characters using only a single static image and a speech audio clip. See the video below for an example of an AI character animated with VASA-1

The system, which has not been released for public use, represents a significant forward step in the quality of AI-generated video. According to Microsoft, the system precisely aligns nuanced facial expressions and head motions with spoken audio, making virtual conversations highly realistic.

In an unusual addendum to the paper, the nine authors addressed what they acknowledged as “the possibility of misuse” of the VASA-1 technology.

VASA-1 “is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it could still potentially be misused for impersonating humans.”

The benefits, they wrote, could be enormous: Improving educational quality, offering therapeutic support, etc. But the risk of abuse at this point is also, they implied, fairly obvious.

“Given such context,” they concluded, “we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

Read more about VASA-1 in the paper published today by Microsoft Research Asia.

Previous
Previous

How licensing models can be used for AI training data

Next
Next

A call for AI data transparency