s12e44: Yet Another DALL·E-related Reckon
The Trouble With OpenAI, Prompt Engineering, and More
0.0 Context Setting
It’s Wednesday, August 3, 2022 and still hot in Portland, although marginally less hot than before. This is the new normal.
I went tide pooling over the weekend. Here is a picture of two sea stars hanging out.
Okay, on with the show:
1.0 Some Things That Caught My Attention
1.1 Yet Another DALL·E-related Reckon
Some brief reckons about OpenAI’s DALL·E:
Broadly, DALL·E does two things: 1) it creates photorealistic, high-resolution images from text prompts, and 2) it is very good at style transfer.
People are (understandably) mixing the two strengths of DALL·E together.
Back at the end of June, Michael Green posted a thread of DALL·E “replicating styles of well known portrait photographers”. They are very good! I am going to set aside whether Green spent a lot of time culling generated images, presumably Green also had an eye for the ones he chose to showcase in the thread. But the basic point again is this:
1) These are high-quality, canny plains-dwelling (i.e. not in the uncanny valley) images of humans who do not exist, the product of machine learning trained on lots and lots of pictures of real people
2) The images of people also impressively ape, instantly and recognizably, the distinctive styles of photographers with large bodies of work.
First, if I were a photographer, I’d be itching to sue, even if I weren’t sure what I were suing for. I’m sure there should be lawyers tripping over themselves to get involved. This also presupposes that there are sufficiently well-known photographers (and/or their agents or managers) who are up on what DALL·E is doing these days.
Second, you’ve got to think about how DALL·E is able to do style transfer in this way. Style transfer isn’t new, it’s one of the first mainstream examples of image generation AI catching the public’s attention, the whole “oh look, now you can make a photograph look like it was painted by Van Gogh” without having to pay someone a pittance to do a copy for you.
But for this style transfer to work, just like for a human to be able to copy another artist’s style, the machine learning system needs to be trained on those images. Which, presumably, are copyrighted. Sure there’s the usual exception for academia, but… OpenAI isn’t that. It is making available a commercial service. You pay to have it create images in a certain photographer’s style and, well, nobody’s stopping you from then using those images in whichever way you wish. You could have a field day fighting about liability and I imagine that OpenAI’s terms and conditions have the usual disclaimers of warranty and liability and indemnities, that if you end up using one of these images and get sued over it then it’s all on you.
But, you know. OpenAI is making money off replicating people’s recognizable creative work, which is just to say that while I have issues with intellectual property rights, I also have issues when at the very least creative work isn’t attributed. But then figuring out attribution would be a speed-bump along the road to a superhuman artificial general intelligence for the benefit of all1 humanity.
So if you ask me, while DALL·E’s impressive style transfer is going to cause a shit-ton of disruption because now you can replicate someone else’s style at scale, it’s not that interesting.
No, what’ll be interesting is when DALL·E itself has a recognizable style, or when people use DALL·E to create recognizable visual styles. I would go off and wikipedia about different genres of art and try to think about well, what is the distinct visual art movement enabled by DALL·E and other text to image generation networks? You want to get truly creative, then that’s what I want to see. Sure you can say everything you want about artists copying/stealing, but at some point there’s going to be a threshold which’ll be: well, that’s in-the-style-of-DALL·E, a sort of visual image that’s (only?) possible with this kind of tool.
That was one reckon.
The other one was a continuation of the thread about prompt engineering and why I dislike that term and why it’s interesting that the term even exists. Look, here’s Andy Baio with a nice image from a couple of weeks ago: two slugs in wedding attire getting married, stunning editorial photo for bridal magazine shot at golden hour.
I look at the prompt there and the first thing that I think is, “that’s not prompt engineering”, that’s writing a creative brief for a photoshoot combining an art director, a creative director, the producer looking for photographers, and the brief for the photographer. I mean sure, you can call it prompt engineering because putting engineering in things is what’s needed these days to survive in our late-capitalism economy, but with my creative agency background what I’m pretty sure I see is someone who would be good at directing a creative team, wherever that team is on the homo/heterogenous spectrum of human/ML tool/hybrid.
And then if you want to get super late-night college conversation about it, I’m then far, far more interested in what happens when image generation networks produce with zero to no prompt, as if (he says, waving hands) you’re seeing the noise in the network and what they’re dreaming of, and whether that can be mapped to the development of any sort of awareness, never mind consciousness. Look, if you were going to train an AI and make sure that it was going to be friendly to humans or not, do you think you’d want one that dreams of crungus, or one that dreams of… something else?
Welp. I am about as tired as usual, which is to say the new amount of tired, which is all tired, all the time. That’s… probably not good?
How are you doing?
Best,
Dan
-
ha, ha ↩