Meta’s new AI model learns by watching videos

16Feb/24Off

Meta’s new AI model learns by watching videos

Meta’s AI researchers have released a new model that’s trained in a similar way to today’s large language models, but instead of learning from written words, it learns from video.

LLMs are normally trained on thousands of sentences or phrases where some of the words are masked, forcing the model to find the best words to fill in the blanks. In doing so they pick up a rudimentary sense of the world. Yann LeCun, who leads Meta’s FAIR (foundational AI research) group, has proposed that if AI models could use the same masking technique, but on video footage, they could learn more quickly. ...

The embodiment of LeCun’s theory is a research model called Video Joint Embedding Predictive Architecture (V-JEPA). It learns by processing unlabeled video and figuring out what probably happened in a certain part of the screen during the few seconds it was blacked out. ...

Meta’s next step after V-JEPA is to add audio to the video, which would give the model a whole new dimension of data to learn from—just like a child watching a muted TV then turning the sound up. ...

See the full story here: https://www.fastcompany.com/91029951/meta-v-jepa-yann-lecun

Filed under: Non-3D stories Comments Off

Comments (0) Trackbacks (0) ( subscribe to comments on this post )

Sorry, the comment form is closed at this time.

Trackbacks are disabled.

Forget Artificial General Intelligence (AGI) – the big impact is already here and it’s called AI agents » « HOW GENERATIVE AI COULD ENABLE A NEW ERA OF FILMMAKING

Pages

If your company is an ETC member, you can log in and see more news posts at www.etcentric.org

philip lelyveld The world of entertainment technology

Meta’s new AI model learns by watching videos

Pages

More posts