Having fun with Stable Diffusion

The latest edition of German computer magazine C’t has a feature on Stable Diffusion and deploying it on your own hardware, which triggered me to install it on my desktop PC. While AMD graphics cards are supported, Nvidia GPUs are better supported and are quicker. My desktop has an RTX 2070, which works fine.

Python 3.10 is required. The easiest way to get this on Windows is to install it from the Microsoft store. You might have to remove other Python installations from the PATH to enable 3.10 as default Python on the command line. The Stable Diffusion WebUI can be downloaded from https://github.com/AUTOMATIC1111/stable-diffusion-webui. Git is not required if you click on the green Code button and choose Download ZIP. Unzip the ZIP in a convenient location.

It should then be sufficient to start the web UI by typing webui.bat –xformers –listen in the directory (preprend .\ for Powershell). It should then install all dependencies, including the v1.5 Stable Diffusion model. The –listen argument makes it possible to access the web UI from other computers in the same network at http://ip-address:7680 in a web browser.

Compared to DALL E 2, the basic model appears to be somewhat limited, especially when requesting more far-fetched images. My daughter requested an avocoda drinking a glass of wine, which resulted at most in avocados next to a glass of wine. It also struggled with the picture of a shark sitting at a computer as on my front page. But the beauty of Stable Diffusion is that there are many different models available, often trained for a specific style. These can be downloaded from https://huggingface.co. C’t recommends Protogen 5.3 as basic model, which does indeed work better. I would definitely set the batch count to 4 to get multiple outputs for each prompt. All images are stored in the outputs subfolder, so you can always retrieve images that you’ve generated but not saved directly.

But there’s more fun to be had with some of the other models. Mo Di Diffusion has been trained on images from Pixar animation movies, so this is what my life would look like in a Pixar movie (use modern disney style in the prompt:

Uhm, daydreaming again? We all know reality looks like this (Protogen 5.3):

Stable Diffusion can also take an existing picture as starting point and generate a new image applying the prompt as modifiers. This makes it possible to fulfill my son’s dream of becoming a Pokémon trainer, thanks to the Anything v3 model, which has been trained to generate anime-style pictures:

I won’t show the original image here, but Stable Diffusion has done a good job of reproducing both the posture and the clothing with a simple prompt of young boy pokemon trainer. Other models deliver similar results, so a Pokemon prompt appears to have large chance of resulting in anime-style images.

You can of course also describe the style of painting. It gets interesting when you ask for aspecific car model, such as BMW E39 along with water color style. Mdjrny-v4 appears to have seen its share of the best BMW ever (mostly M5s I guess), whereas other models ended up producing mostly two-door cars, some with an interesting blend of E39, E46, and E90 features.

You can give more weight to certain words in your prompt by selecting them and pressing Ctrl+Up. I used this to convince Stable Diffusion to generate me a picture of a Cray supercomputer in the style of Rembrandt van Rijn, which was quite a challenge for some models. My favourite here was Anything v3, which created this Steampunk-like image with a computer that definitely has a Cray 2/X-MP/Y-MP vibe:

So far all these images have been generated at the default size of 512 x 512 pixels. You can generate larger images, but this will take considerably more time and will often result in multiples of the same object being shown. This can be fixed by using the Hires.fix option and choosing a suitable upscale factor. A factor of 2 and a target size of 960 x 540 will result in images suitable as full HD desktop background. a landscape photograph of a blue Porsche 911 driving along the pacific coast highway in sunny weather led Anything to create this lovely anime-style wallpaper:

There are more possibilities, such as choosing parts of an image to be modified as well as inpainting (extending an existing image). Have fun experimenting!

Edit: Here’s a webpage that compares many different models: https://daniel.von-appen.org/stable-diffusion-model-comparison/

Leave a comment Cancel reply