Image generation with DALL-E-2

So recently the waitlist was removed for the image generation AI-system DALL-E-2, so that anyone can sign up and use it immediately. Having used stable diffusion for some time I thought I give it a try to see how it performs in comparison. It certainly has some interesting features like, outpainting, generating variations and editing but also a very impressive natural language processing.

The platform🔗

So the DALL-E-2 image generations system is made available as a beta on their platform, e.i it's not available to run yourself like the stable-diffusion model. You pay for the images you generate, but on signing up you are given 50 credits, enough for generating 50 sets of 4 images. The pricing seems quite modest in my opinion considering the gpu-time involved.

So when generating images you enter a text-prompt and get four 1024x1024 images based on the prompt. This takes about 30 seconds or so.



How does this compare to stable-diffusion? I find that they both have their strengths and weaknesses.

If we take the following example "Cat sipping a margarita in the style of Vermeer" the DALL-E-2 model gets it right all the time, while if I run the same prompt through stable-diffusion it struggles to understand the context and mostly gives cats inserted in random Vermeer paintings. As DALL-E is based on GPT-3 it has a very sophisticated language processing compared to stable-diffusion that more works on keywords/phrases than long intricate sentences.


But I find the stable diffusion more capable of generating interesting artworks in general. But maybe this comes down to on what images the models were trained on.

Datasets and training🔗

The team behind Stable Diffusion are very transparent about how their model is trained, but in short: massive datasets have been generated by scraping websites like pintrest, WordPress blogs, stockphoto sites, deviantart, artstation etc. As for DALL-E, OpenAI are ironically less open in how their model is trained.



The company behind DALL-E-2 is OpenAI. Having the word "open" in your company name would suggest they are an open-source-software company. This might have been what they set out to do in the first place: To democratize AI and bring the technology to the masses. Training huge AI-systems is a costly endeavour and typically AI-development has been done behind closed door by the likes of Google, Facebook and Apple. So the goal in making the technology more open is a noble one. This is not the route the company seem to be taking however as they haven't done much in making the technology free or open-source. One could argue that they are being as open as possible to still attract investors to keep the whole thing going, but then again StabilityAI could release their thing to the public.


Controversies in training data🔗

One artist's style that has been assimilated very well by models like midjourney and stable-diffusion is Greg Rutkowski. He has been vocal about the implication of this new technology. This technology is very polarizing as artwork that previously required very skilled artists can now be made in the dozen by anyone. And to train the model on artworks from living artist can ultimately make them lose commissions. But as for DALL-E it has not been as effective in incorporating his style.