s11e35: A Method for Efficient Exploration of Search Space

the equivalent of picking whatever’s on the first page or the first 5 hits of Google search results

                April 18, 2022

            s11e35: A Method for Efficient Exploration of Search Space

            0.0 Context setting
It’s Monday, April 18, 2022 in Portland Oregon and there is absolutely nothing interesting going on with the weather out my window. 
Apparently this is a long one today, or a one-shot stream of consciousness one, so let’s get on with it.

A quick reminder: Things That Caught My Attention, Volume 1 collecting the 45 best essays from episodes 1-50 is out now. You, my three thousand-odd subscribers, can get a copy with 20% off.  
Paid supporters and subscribers get a free copy and also bask in the knowledge that they’re encouraging my writing habit. So go and become a paid supporter/subscriber and remember, you can probably expense or deduct it! 

1.0 Some things that caught my attention
A Method for Efficient Exploration of Search Space
Michael Tsai collected a couple links about DALL·E 2, first from Ben Thompson, and then from Bram Adams.
What caught my attention about Thompson’s take on DALL·E 2 was his jump to games and the metaverse. In games because “creating art assets is a lot of effort” which, you know. True. Games need a lot of art! They are creative and can be very high fidelity (for example, one expression of high fidelity might be “at 8K resolution and photorealistic”)! Even when they are not very high fidelity (for example, a text adventure), you still have to create a whole bunch of assets like, er, text.
To skip to the end, Thompson says that DALL·E is like achieving the Zero Marginal Content word wonder in Civilization: that as soon as you’ve got it it’s a step-change in (“transformational technology matched only by the printing press”) and it’s not like I’m disagreeing? 
Thoughts I have in reaction that I am not going to go into because they are tired/stereotypical:

Humans have value as sort of AI-whisperers, where the Best Humans (i.e. the ones who are Most Economically Successful, ha) are those who can Best Creatively Direct AI, because this thought is boring
People are going to be out of a job because, well, the friendlier version of saying “content has zero marginal cost” is that “making content is free” because also duh obviously it is not “zero”, that’s just an exaggeration.

So what would be interesting? Okay, well first let’s go sideways to what Bram Adams wrote up. Adams’ piece is kind of like mapping out cheat codes for DALL·E, right? The way DALL·E works is you give it a prompt (for those advertising/creative agency people still subscribed to this list, that’s a Brief!) and then it goes away and then comes back with a bunch of stuff for you to whittle away and start honing in on a creative direction or, ha, here’s a realization, just doing the equivalent of picking whatever’s on the first page or the first 5 hits of Google search results How’s that for a creative process? AI-generated content search space result optimization!
Okay, so I got distracted: if the marginal cost to create content is practically zero (which it isn’t, there are other things that cost) then one of the next Jobs To Be Done is to enable the… efficient? creative? surprising? effective? serendipitous? exploration of the search space of the response to the prompt. But Dan, you’re saying, you’re using bullshit words again which you said not to do, can you not speak plainly?
And I would say: when you have more results than you know what to do with, then the interface that helps you find the one you want/need is potentially the difference between a shitty tool and a good tool. Now people will get distracted with being able to produce more results faster so their products compete on depth or breadth of search. nVidia would like to say that its ML artists can instead of achieve 40tflop/s they can instead generate 500 8K images/second using the latest models. You want a fast creative partner, right? But now you’ve got those 500 images, so how do you search through them? A boring light table? No, that’s an Opportunity For Design Innovation! 
(Call me)
Okay fine, you don’t want to do that because that requires thinking and user research and a bunch of other boring things and you like using ML as a hammer to hit everything with because human-like-hammer, so what else might you do? Oh, you train a model that works over the search results, right? That’s just another layer and nVidia is happy because you buy more flop/s and Google is happy because you use more Google Cloud TPUs and Apple is happy because… something something Neural Engine/Bionic? I do not know where Microsoft or Amazon are here, other than “buying nVidia” oh wait, let’s make a note of that. (And if someone does that, then only so long before someone ends up buying AMD?)
But that’s boring because then you need to train the model on what you want and all you end up then is, I don’t know, pointing to more stuff, which is more like “interfaces for providing briefs” which I suppose is more interesting than “provide a written brief”. Thinking out loud, the way you might do this is by lobbing example images in, like: “stuff like this”, getting a different model to produce a textual description of that image (and the other ones you provide), and then throwing those textual descriptions back into the generator? I expect someone smart who is being paid a lot has already done this and if they haven’t then really what am I even doing. 
ANYWAY, I got distracted with “how do you efficiently work through the search space of the answers to your creative brief” because we were talking about Bram Adams’ cheat codes. 
Adams’ cheat codes are like when people earlier on figured out that you could get more interesting images by including prompts like “ArtStation” or “HDR”. Now, the reason why those prompts work in radically changing the output of the ML artist-in-a-box is because those people had an understanding of the training set used to create the model. See, these models are trained by Images That Are On The Internet, so you, the human, need a model in your head of What Sort Of Images The ML Will Have Seen To Know What References It Gets.
Some of these references are easy, so “like a film poster” or “A Simpsons episode like…” but then you get more creative about the attributes and need to Think Like A Machine That’s Been Trained On Datasets In A Certain Way, which means you know the machine’s been trained on creative commons freely licensed images and you know they’ve come from, e.g. DeviantArt and ArtStation. 
Now we’re getting somewhere interesting. Sure, you want Humans Who Are Good At Writing Briefs For ML Models, and those humans are potentially the ones who’re already Thinking Like A Machine Thinks. Sure, you get pop culture references, but do you get the references that a machine trained on “images that are accessible on the internet and that are likely to be included in a training set?” Because down that route lies things like “well, if you’re really creative, then you might start thinking about the biases inherent in those datasets”. 
All of this is before we even get to What The Zero Marginal Cost Of Content Means For Games And The Metaverse because duh we already have the equivalent of a 3 month old generating text adventures and it already has the issue of moderating/censoring the text it creates (see: AI Dungeon). One tired observation here is that GitHub copilot is already writing bad code based on a prompt/autocompleting, the latest ML models are already creating short stories based on prompts, so, you know, it’s not unreasonable that it’s just a Matter Of Effort And Stringing Bits Together With Tape And Um String before you get a generic 3D engine hooked up to a non-shitty level creator responding to a prompt with object meshes hooked up to a prompt and textures hooked up to a prompt. I mean, sure. And then what? We’re still at There’s More Shitty Stuff Than Before and oh my god here’s another Douglas Adams reference coming: there’s so much TV that we had to invent robots to watch TV for us to tell us what was worth watching and that’s essentially what learning spam modules are based on our feedback? They are horribly bad, horribly simplistic models of “what Dan, a person, thinks is spam or not”. One secret here, of course, is that you don’t need to simulate a model of me to recommend me TV. You can probably make a good, non-ML guess 99% of the time without having to burn a shit-ton of GPU time, but I guess it keeps a lot of people employed? 
So, to recap. Sure, zero marginal cost of creative content, but not actually zero because you’ve just externalized the cost somewhere else, you lazily mean “it costs much less money” but that money’s going to go somewhere. Opportunities to make narrowing down the search space to achieve your result (“give me something really boring? Give me something that is at least 50% dissimilar in vector space to a drug name that already exists?”), opportunities to make browsing that search space super interesting and FOR THE LOVE OF GOD don’t make it turning into the lawnmower man and flying through a city or filing cabinet or virtual gallery full of ML-generated art oh god someone’s doing that right now aren’t they. 
Well. This was fun! And easy to write!

It’s the beginning of the week. How’re you doing, and what are you looking forward to? 
Best,
Dan

Don't miss what's next. Subscribe to Things That Caught My Attention: