Generating 3D models with Shape-e

Text-to-3D generation is starting to become a thing. OpenAI has been so kind as to release their Shap-E project as open source. Shap-E introduces a diffusion process that can generate 3D objects from a text prompt. So what does this mean? Will it take the jobs from all the 3d-artists?

The github project can be found here. The model itself is downloaded automatically once running the scripts.

Setting up🔗

The setup was fairly simple, in theory. Just do a pip install and launch the application through a jupyter-notebook. In reality I spent way too much time trying to resolve package dependencies between pytorch, cuda and glibc for my conda enviroment. Not very rewarding.

Simple prompts, animals🔗

Running the model is as simple as running the script 'sample_text_to_3d.ipynb' and entering whatever prompt you desire. Starting simple with some animals. The result is a bit rough but still recognizable. I noticed that there is not that much diversity in the output. The output for the pig always seemed to be that stylized round thing.

"a penguin""a pig""a tiger"

Mesh close-up🔗

The meshes that are generated use so called vertex paint for coloring. So instead of texture maps, the color information is embedded in the mesh as vertex data. This is a simple way to add coloring as there is no need for separate textures or uv-mapping, it is however very crude essentially, only giving one rgb-value per vertex.

The diffusion process outputs some intermediate neural representation that can be made into a proper mesh by for example the marching cubes algorithm. Looking at the mesh close up we see a fairly dense grid. These vertices are needed though to give the vertex paint some resolution.

image-1

Running🔗

Running the model took about 30 seconds per object and used up about 10Gb of Vram. Generating the meshes and rendering the images took the longest. I've run them in batches of 3 and picked the model that looked the best. Running this for extended periods of time gets my GPU real toasty.

Simple prompts, vehicles🔗

How about vehicles? Apart from the awkward looking excavator it performed alright. It struggled a little with the fine details on the sail on the boat.

"a blue pickup truck""an airballoon, striped red white""a red sportscar"
"a sailing boat""a fighter aircraft""a yellow excavator"

Simple prompts, fantasy🔗

How about simple assets for say, a fantasy game?

"a sword""a round shield""a medieval helmet""a battle-axe"

Characters🔗

Can it create fantasy characters?

"a troll""a warrior in plate armor, paladin""female sorcerer in robes"

Architecture🔗

How about something a little more complex? What about buildings?

"a greek temple, columns""a medieval castle""a gothic church""a guard tower"

Closing thoughts🔗

To put it mildly, these generated objects are not production ready assets. The shape is very crude and the topology is a mess but the technology itself is very promising. Once these models become decent this could have huge ramifications on the game and animation industry where vast resources are put into the making of 3d-models. As for myself, and other aspiring game developers, this could be an enabler to create things that otherwise would be out of reach.

Lejondahl