diavlex text-to-paint using Neural Painters and CLIP

Let us know if you have any comments, questions, or want to share your creations (Notebooks at the end of the post)

Cheers!

Ernesto @vedax & diavlex @diavlex_ai

Intro

Artists combine colors and brushstrokes to paint their masterpieces, they do not create paintings pixel by pixel. However, most of the current generative AI Art methods based on Machine Learning (ML) are still centered to teach machines how to ‘paint’ at the pixel-level in order to achieve or mimic some painting style, e.g., GANs-based approaches and style transfer. This might be effective, but not very intuitive, specially when explaining this process to artists, who are familiar with colors and brushstrokes.

Our goal is to teach a machine how to paint using a combination of colors and strokes by telling it in natural language what to paint. How can we achieve this?

Materials and Approach

We need two basic ingredients:

  1. A ML model that knows how to paint using colors and strokes. To this end we will use a Neural Painter [1],[2]

  2. A ML model that connects text with images, that is, it should be able to associate text with visual concepts. We use CLIP (Contrastive Language–Image Pre-training) for this task. [3]

TL;DR

The following steps capture the essence of our idea, note that we use pseudocode based on Python, which does not necessary reflect the models' API. Please have a look to the notebook itself for the actual code and methods used.

  1. Specify what to paint, e.g.,
    prompt = "black sheep"

  2. Encode the text using CLIP's language portion to obtain the text features
    text_features = clip_model.encode_text(prompt)

  3. Initialize a list of brushstrokes or actions and ask the neural painter to paint on a canvas. At the beginning the canvas will look random.
    canvas = neural_painter.paint(actions)

  4. Use the vision portion of the CLIP model to extract the image features of this initial canvas.
    image_features = clip_model.encode_image(canvas)

  5. The goal is to teach the neural painter to modify the strokes (i.e., its actions) depending on how different is what it is painting to the initial text request (prompt). For example, in the perfect case scenario, the cosine similarity between the text and image feature vectors should be 1.0.
    Using this intuition, we use as the loss to guide the optimization process the the cosine distance, that measure how different the vectors are. The cosine distance in our case corresponds to
    loss = 1.0 - cos(text_features, image_features).

  6. We minimize this loss adapting the neural painter actions that in the end should produce a canvas as close as possible to the original request.

Enjoy Neural Painting! ;)

Stroke-by-stroke paining for prompt "black sheep".

Notebook at Kaggle

Note: If you are part of Kaggle's community this is probably a better option given that a currently Kaggle offers a NVIDIA TESLA P100 for all notebook sessions. Whereas using Google Colab you might get a less powerful GPU for your session.

Notebook at Colab


References


[1] The Joy of Neural Painting. Ernesto Diaz-Aviles, Claudia Orellana-Rodriguez, Beth Jochim. Libre AI Technical Report 2019-LAI-CUEVA-X01. 2019.


[2] Neural Painters: A learned differentiable constraint for generating brushstroke paintings. Reiichiro Nakano. 2019.


[3] Learning Transferable Visual Models From Natural Language Supervision. CLIP. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. OpenAI. 2021.