StableDiffusion

All aboard the hype-train, the future is now. The time is here were AI can create incredible artworks on demand of any subject and style imaginable. Skilled artists are now obsolete as mere cavemen can create masterpieces with the click of a button.

Jokes aside, this technology is fantastic and has the potential to transform many fields. This might become a new essential tool for artists, animators, game-developers or anyone working with graphics.

Stable diffusion🔗

Stable Diffusion is a text-to-image machine learning model trained on millions of images to transform a text-prompt to an image. The model can be downloaded and setup to run on a consumer pc with a modest GPU. With 10 GB of disk-space ~6 GB VRAM it can generate 512x512 pixel images.

Setup🔗

Instructions on how to set up stable-diffusion can be found on the GitHub page. This is with a command-line interface though. To make it easier to use there are variants with a web-UI. Generating one 512x512 images takes about 30s on my old pc.

Prompt🔗

So the text-to-image model takes a text-prompt and produces an image based on the prompt. When starting off you might notice that you'll sometimes get bland results if you just put in a short concise sentence. The trick is to be overly specific giving plenty of details of the subject and also to give artist references, coloring, style etc. There are plenty of resources giving tips and tricks on how formulate the prompt for better results.

image-1

One source of inspiration and a way to get better at making prompts is to look what others have created at sites like prompthero. And with a single prompt you can generate an as many images as you want, they are all randomized and unique given a new seed. And with a good prompt you can generate countless images that mostly look amazing. I have not done any cherry-picking in the images shown on this page.

image-1

Sizes🔗

The default size is 512x512. If you have enough VRAM you can bump up the resolution a little bit. With my poor old Gtx1070 with 8 GB of VRAM I could increase the resolution to 512x768. This allows you to experiment with portrait or landscape formats.

image-1

Artifacts🔗

There are always some weird artifacts popping up in the images. The model usually struggles with symmetry. It is not uncommon to see persons with missing limbs (or extra) or as in the example below, a hand with 7 fingers. But to be fair, I struggle with drawing hands as well.

image-1

Styles🔗

With a few descriptive terms many different styles can be achieved with "by artist" or art genera. Its amazing how you can get the subject of your choice made in a specific art-style or artist. I mean, why pay 80 million dollars for a Van Gogh when you can generate your own for free.

image-1 image-1 image-1 image-1

Lejondahl