Data2vec is part of a major trend in AI towards models that can learn to understand the world in more ways than one. “It’s a clever idea,” says Ani Kembhavi of the Allen Institute for AI in Seattle, who studies vision and language. “It’s a promising advance when it comes to generalized learning systems.”
An important caveat is that while the same learning algorithm can be used for different skills, it can only learn one skill at a time. Once it has learned to recognize images, it has to start from scratch to learn to recognize speech. Giving an AI multiple abilities at once is difficult, but that’s what the Meta AI team wants to look at next.
The researchers were surprised to find that their approach to image and speech recognition actually outperforms existing techniques and works as well as leading language models for text comprehension.
Mark Zuckerberg is already dreaming of possible Metaverse applications. “All of this will eventually be built into AR glasses with an AI assistant,” he posted on Facebook today. “It might help you cook dinner, notice when you’re missing an ingredient, and prompt you to turn down the heat or do more complex tasks.”
For Auli, the key takeaway is that researchers should move out of their silos. “Hey, you don’t have to focus on one thing,” he says. “If you have a good idea, it might actually help across the board.”