s13e16: Duolingo, but for talking to ML Image Generators

prompt engineering

        October 27, 2022

s13e16: Duolingo, but for talking to ML Image Generators

        0.0 Context Setting
It's Thursday, October 27, 2022 in Portland, Oregon and I just downloaded an archive of my Twitter data because it's a good thing to do every now and then, and it's just a coincidence that Twitter's sale to Elon Musk might be completing tomorrow. You can download an archive of your Twitter data too!
Meantime, it is a bright, mildly cloudy day here. It is not raining. Yet.

1.0 Some Things That Caught My Attention
1.1 Duolingo, but for talking to ML Image Generators
The whole concept of prompt engineering has been stuck in my head. Previously I thought about how prompt engineering felt more like "how to write a creative brief"¹ or "how to do creative direction" and then even earlier as a sort of toy or game where you'd try to get the closest to reverse-engineeing a prompt based on the final image². 
You may be unsurprised to hear that my thinking has continued to evolve! Part of this is because there are more and more jokes about prompt engineering that, for the reasons of time, I am not going to look for and insert here. Actually, wait: it was the Sistine Chapel, and if I just search for "Pope Julius artstation", I get what I was looking for:

Michelangelo: Your Holiness, please describe to me how you imagine the fresco. 
Pope Julius II: epic huge ceiling, christianity, realistic, 1000 bible characters, creation of eve,huge breasts realistic breasts trending on artstation (@zuza_real)³

I mean, this is funny! I am going to kill the joke by explaining it: it is funny because Pope Julius II is commissioning a stunning work of art that has soul, and imagining that the creative brief given to Michelangelo was in the form of a prompt, essentially the creative brief to a machine learning image generation system! It is parodying the type of prompt that people are sharing (and the type of image that many people are generating) and the last part, the best part, is "trending on artstation", where artstation is if you're not aware, a portfolio and asset sale/hiring site for artists⁴.
I digress.
There is the creative brief part, and then there are the, I don't know, modifiers. The part that describes the scene in terms of composability (i.e. a horse on a hill with a person with sausage-fingered hands). 
AI ML image generation typically finds composability difficult "at the moment", with it being the latest test of "is this thing actually intelligent or not" other than "a particularly smart magic brush that paints what you tell it to paint but if you're not careful will turn into a mop and buckets and flood your entire world. Not grey goo so much as drowning in art". 
That's the first part. The second part is the trending on artstation part which also includes things like leica 1.2 or sigma 50mm or hdr, which are canny keywords intended to guide the model toward specific types of images it might have encountered (god, this anthropomorphizing and agent-intention) as tagged data in its training set. 
So then:

Prompt engineering is learning and adapting to the language of a new tool, at the same time as people try to mold the new tool into something that understands us without us having to change our language. (Me)

There's jargon that you need to learn, there's possibly even a unique grammar that you have to learn (does order matter? Do you just tack on as may lens specifications as you want on the end? Do those modifiers need to agree with each other if at all?)
It starts feeling like a language, but then my linguistics is rusty, so I'm just going to wave my hands and say it feels like a language, right?
And if it's a language, then you can learn it, or at least a quick translator's guide if you're visiting the country for a few days. 
Which brings me to Duolingo, but for learning the language of machine learning prompts, whether they're image generation or conversational "finish this sentence" GPT-3/text generation systems.
There's an adaptation here. One of my go-to examples is Palm's Graffiti⁵, where handwriting recognition wasn't quite good enough yet, so a simplified, single-stroke cursive alphabet was created and honest to god enough people learned how to use it because... PDAs were useful enough back then? I say "enough", that's enough for Palm then 3Com then Handspring or in whatever to survive and the people who worked on it go on to contribute to the iPhone and other seminal conventions of smartphones. 
But the thing with Graffiti was that it was designed and intentional. With ML image generation prompts, it is much more like discovering an uncontacted civilization who're a black box and you slide a bit of paper in and instead of understanding Chinese, a picture comes out the slot a little bit later. (Or a lot later, if you're running on a Mac). So you need to explore. It's pretty interesting and not, I'd caution to add, anything like contacting an alien civilization apart from when people start talking about why does it value crungus and actually, what a bunch of longtermists are very worried about in terms of artificial general superintelligences not having value systems aligned with our own. But! You know what, humans don't have fully aligned value systems. So there's that, too. Which is why this is seen as a race: if the U.S. doesn't get to a values-aligned artificial general superintelligence then someone else might too and what will happen to shareholder value then?! Or your right to be a dick on Twitter without suffering any consequences?!
Phew

1.2 Some small bits
Following all my what's next for PlayStation, I'm tracking two stories that came out of the Wall Street Journals' Tech Live... thing... about Microsoft. Phil Spencer, their CEO of gaming, said that the metaverse is a poorly built videogame⁶, his ultimate point being that metaverses (i.e. Horizon Worlds, Horizon Workrooms) are metaverses that "look like a meeting room" and, you know, the shared environments in Horizon Worlds are bad. Low poly, little to no texturing, feels like they're only around the era of Quake, to be honest. Quake One. The second part is that Microsoft's Game Pass is profitable⁷ and is 15% of Microsoft's overall content and services revenue. If I'm skimming Sony's 2022Q1 report right, Network Services (which includes PlayStation Plus and Now and advertising revenue) comprise 17.5% of PlyStation's revenue.

Okay, that's it for today!
I'm considering, at this late stage, cutting back to one or two episodes a week during November so that I can also work on NanoWriMo. There are only so many words in me a day. 
How are you doing?
Best,
Dan

"I look at the prompt there and the first thing that I think is, “that’s not prompt engineering”, that’s writing a creative brief for a photoshoot combining an art director, a creative director, the producer looking for photographers, and the brief for the photographer. ", in s12e44: Yet Another DALL·E-related Reckon, 3 August, 2022 ↩

"Me, on June 14th: Has anyone mande PROMPT-LE yet, where you have to guess the prompt for a DALL-E-esque generated image in 6 tries", in s12e26: PROMPT-LE, 22 June, 2022 ↩

@zuza_real on Twitter, 8 October, 2022 ↩

"Since 2014, ArtStation has provided artists with an amazing platform to showcase their portfolio, find work and connect with opportunities." ↩

Graffiti (Palm OS), Wikipedia ↩

Xbox’s Phil Spencer says the metaverse is a ‘poorly built video game’, Jay Peters, The Verge, 26 October, 2022 ↩

Microsoft says Xbox Game Pass is profitable as it sees subscription growth slow, Tom Warren, The Verge, 26 October, 2022 ↩

                                Don't miss what's next. Subscribe to Things That Caught My Attention:

            Email address (required)

                    ← Newer

                s13e17: A Proposal for News Organization Mastodon Servers and More

                    Older →

                s13e15: Cheap, portable, shareable video has consequences but maybe not what you think