The discipline of building or teaching a neural net to have this kind of multifaced awareness is called multimodal modeling. Tools like DALL-E are designed to generate images based on text descriptions, while CLIP (Contrastive Language-Image Pre-training) is intended to associate text and images more robustly than current AI models. ...
Values matter to this bias problem, in a concrete sense. It’s such a low-tech part of the solution to a high-tech problem. Deepfakes are here, and to get software to identify deepfakes reliably, we are going to have to teach computers how to identify and understand us better — and there’s no putting that back in the box. As AI progresses, we’re going to leave our comfort zone over and over. It’s an inevitable consequence of making tools with near-human capabilities. What matters is how our values are reflected in the intelligent systems we create. Maybe we don’t need to worry about a robot uprising. Maybe the dystopia is coming from inside the house.