The AI genie is here. What we're deciding now is whether we all have access to it, or whether it's a privilege afforded only to rich people, corporations, and governments.

IncognitoErgoSum@kbin.social · edit-2 1 year ago

The AI genie is here. What we're deciding now is whether we all have access to it, or whether it's a privilege afforded only to rich people, corporations, and governments.

IncognitoErgoSum@kbin.social · 1 year ago

You need to do your own homework. I’m not doing it for you. What I will do is lay this to rest:

https://en.wikipedia.org/wiki/Stable_Diffusion

Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly […]

https://jalammar.github.io/illustrated-stable-diffusion/

The image information creator works completely in the image information space (or latent space). We’ll talk more about what that means later in the post. This property makes it faster than previous diffusion models that worked in pixel space. In technical terms, this component is made up of a UNet neural network and a scheduling algorithm.

[…]

With this we come to see the three main components (each with its own neural network) that make up Stable Diffusion:

[…]

https://stable-diffusion-art.com/how-stable-diffusion-work/

The idea of reverse diffusion is undoubtedly clever and elegant. But the million-dollar question is, “How can it be done?”

To reverse the diffusion, we need to know how much noise is added to an image. The answer is teaching a neural network model to predict the noise added. It is called the noise predictor in Stable Diffusion. It is a U-Net model. The training goes as follows.

[…]

It is done using a technique called the variational autoencoder. Yes, that’s precisely what the VAE files are, but I will make it crystal clear later.

The Variational Autoencoder (VAE) neural network has two parts: (1) an encoder and (2) a decoder. The encoder compresses an image to a lower dimensional representation in the latent space. The decoder restores the image from the latent space.

https://www.pcguide.com/apps/how-does-stable-diffusion-work/

Stable Diffusion is a generative model that uses deep learning to create images from text. The model is based on a neural network architecture that can learn to map text descriptions to image features. This means it can create an image matching the input text description.

https://www.vegaitglobal.com/media-center/knowledge-base/what-is-stable-diffusion-and-how-does-it-work

Forward diffusion process is the process where more and more noise is added to the picture. Therefore, the image is taken and the noise is added in t different temporal steps where in the point T, the whole image is just the noise. Backward diffusion is a reversed process when compared to forward diffusion process where the noise from the temporal step t is iteratively removed in temporal step t-1. This process is repeated until the entire noise has been removed from the image using U-Net convolutional neural network which is, besides all of its applications in machine and deep learning, also trained to estimate the amount of noise on the image.

So, I’ll have to give you that you’re trivially right that Stable Diffusion does use a Markov Chain, but as it turns out, I had the same misconception as you did, that that was some sort of mathematical equation. A markov chain is actually just a process where each step depends only on the step immediately before it, and it most certainly doesn’t mean that you’re right about Stable Diffusion not using a neural network. Stable Diffusion works by feeding the prompt and partly denoised image into the neural network over some given number of steps (it can do it in a single step, although the results are usually pretty messy). That in and of itself is a Markov chain. However, the piece that’s actually doing the real work (that essentially does a Rorschach test over and over) is a neural network.