Image to image conversion with Stable diffusion

If you have a great idea of some artwork you would like to draw, but you don't have the skill to pull it off you should try the image-2-image workflow in stable diffusion. It can transform your simple toddler drawing into a professional artwork while still preserving composition and form. This is even more amazing than the text-2-image workflow as it gives you control to get the image just as you want it.

So in the previous post I went over how to use the text-2-image workflow. The image-2-image workflow is very similar but in addition to the prompt you provide you give an image as well. So essentially it generates an image starting partly from your image instead of generating it purely from noise.


So I started out making a simple sketch of a car thinking it would be converted to a nice image of a car, instead it just gave me back another sketch of a car, albeit a better one. So if you want a colored object you need to input one. The coloring is quite well-preserved in the process, so chose the right pallette when making your input-doodle.


Making 2D game assetsπŸ”—

So I stumbled upon this excellent tutorial describing how to make game assets imitating the blizzard/warcraft style. I decided to try it out for myself so I drew a simple chest on my tablet that I used as a base. For the prompt I used a similar text as in the tutorial with artists like Ariel Fain and Calvin Boice with the text wooden treasure chest with lock. The original image can be seen to the left.


Tweaking parametersπŸ”—

So there are some key parameters to tweak to get the output right. The most important on is the denoising strength, this controls how much of the original image is preserved. Setting is low to 0.2-0.3 will only slightly change the image while setting it higher to 0.6-0.7 will redraw it extensively. This can be seen below for different values.


Guidance scaleπŸ”—

Another parameter to tweak is the guidance scale governing how closely the generation should follow the prompt. Increasing it can help with getting stronger influence from the prompt.


Input imageπŸ”—

When I drew the input images is used a semi-transparent artificial brush. This makes the images fuzzy and this sometimes shows up in the results when generating images with them. It would probably be better to use a sharper brush.



For this image below I wanted a scroll with a wax seal and a leather binding. Its kinda amusing that it misinterpreted "wax-seal" and put an actual seal in the image.



I find this technology completely amazing. I can draw a doodle in 30 seconds at the level of a fifth grader and run it through stable-diffusion a couple of times and pick the best result, ending up with a production-ready game asset within minutes. Its impressive what you can accomplish in very little time.


A part of the user agreement of the stablediffusion model I that any image created with it is public domain, and you can't claim any copyright to them. But still, the model has been trained on copyrighted images without consent. I can't help thinking that this will ultimately water-down copyright on images as these systems will only get better and better.

image-1 image-1


As a final piece I upscaled one of the images using BSRGAN. This also improves the sharpening giving a very nice look.