How do DALL-E, Midjourney, Stable Diffusion, and other forms of generative AI work?
... Generative AI is powered by a computer program called a diffusion model. In simple terms, a diffusion model destroys and recreates images to find statistical patterns in them. The way it operates is not like natural intelligence.
We cannot predict how well, or even why, an AI like this works. We can only judge whether its outputs look good. ...
Diffusion models perform two sequential processes. They ruin images, then they try to rebuild them. Programmers give the model real images with meanings ascribed by humans: dog, oil painting, banana, sky, 1960s sofa, etc. The model diffuses — that is, moves — them through a long chain of sequential steps. In the ruining sequence, each step slightly alters the image handed to it by the previous step, adding random noise in the form of scattershot meaningless pixels, then handing it off to the next step. Repeated, over and over, this causes the original image to gradually fade into static and its meaning to disappear.
When this process is finished, the model runs it in reverse. Starting with the nearly meaningless noise, it pushes the image back through the series of sequential steps, this time attempting to reduce noise and bring back meaning. At each step, the model’s performance is judged by the probability that the less noisy image created at that step has the same meaning as the original, real image. ...
To produce images that have associated text meanings, words that describe the training images are taken through the noising and de-noising chains along at the same time. In this way, the model is trained not only to produce an image with a high likelihood of meaning, but with a high likelihood of the same descriptive words being associated with it. ...
A set of parameters that produces good images is indistinguishable from a set that creates bad images — or nearly perfect images with some unknown but fatal flaw. Thus, we cannot predict how well, or even why, an AI like this works. We can only judge whether its outputs look good. ...
Towering linguist Noam Chomsky pointed out that a generative model like GPT-3 does not produce words in a meaningful language any differently from how it would produce words in a meaningless or impossible language. ...
See the full story here: https://bigthink.com/the-future/dall-e-midjourney-stable-diffusion-models-generative-ai/
Pages
- About Philip Lelyveld
- Mark and Addie Lelyveld Biographies
- Presentations and articles
- Trustworthy AI – A Market-Driven approach
- Tufts Alumni Bio