Meta Platforms has built and is open-sourcing ImageBind, an artificial intelligence that combines six modalities: audio, visual, text, thermal, movement and depth data. Currently a research project, it suggests a future in which AI models generate multisensory content. “ImageBind equips machines with a holistic understanding that connects objects in a photo with how they will sound, their 3D shape, how warm or cold they are, and how they move,” Meta says. In other words, ImageBind’s approach more closely approximates human thinking by training on the relationship between things rather than ingesting massive datasets so as absorb every possibility. ...
See the full story here: https://www.etcentric.org/metas-open-source-imagebind-works-across-six-modalities/
and here https://arxiv.org/abs/2305.05665