Friday, May 17, 2024

DALL-E New Generation Masterpieces

It has been a while since OpenAI announced DALL-E which generated digital images from natural language descriptions. and we have been seeing pictures generated by DALL-E on social media posts. However, we did not think this AI algorithm could work so well that almost all the portraits we generated were nearly masterpieces.

The Generative Pre-trained Transformer (GPT) model was initially developed by OpenAI in 2018 using a Transformer architecture. The first iteration, GPT, was scaled up to produce GPT-2 in 2019; in 2020 it was scaled up again to produce GPT-3, with 175 billion parameters. DALL-E’s model is a multimodal implementation of GPT-3 with 12 billion parameters which “swaps text for pixels”, trained on text-image pairs from the Internet. DALL-E 2 uses 3.5 billion parameters, a smaller number than its predecessor.

DALL-E was developed and announced to the public in conjunction with CLIP (Contrastive Language-Image Pre-training). CLIP is a separate model based on zero-shot learning that was trained on 400 million pairs of images with text captions scraped from the Internet. Its role is to “understand and rank” DALL-E’s output by predicting which caption from a list of 32,768 captions randomly selected from the dataset (of which one was the correct answer) is most appropriate for an image. This model is used to filter a larger initial list of images generated by DALL-E to select the most appropriate output

What could DALL-E generate?

DALL-E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji.It can “manipulate and rearrange” objects in its images, and can correctly place design elements in novel compositions without explicit instruction. DALL-E is able to produce images for a wide variety of arbitrary descriptions from various viewpoints.

Here are some examples that we created!

“an astronaut playing basketball with cats in space, digital art”
“van gogh style train line oil painting in the middle of nowhere with vast steppe ”
“a green cat with hat walking on railway overhead line cables in a sunny British weather while Big Ben is on the background”
a woman in a fur coat is eating her hot steamy soup in a train couch served with a garlic bread in a very cold snowy day in the UK and watching the snow from the window

DALL-E 2 is more functional

The Latest version of this Open AI product, DALL-E 2 can produce “variations” of the image as unique outputs based on the original, as well as edit the image to modify or expand upon it. DALL-E 2’s “inpainting” and “outpainting” use context from an image to fill in missing areas using a medium consistent with the original, following a given prompt. For example, this can be used to insert a new subject into an image, or expand an image beyond its original borders. According to OpenAI, “Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image


