s12e58: We can hallucinate it for your wholescale
0.0 Context Setting
As I write this, it is, weirdly, still Tuesday, September 27, 2022 and also hotter than 80f in this room.
You know that thing where you spend ages trying to squeeze out some words (i.e. episode 57) and trying to make sure that whatever you write are sufficiently interesting Things That Caught Your Attention? Well, I did that, and then just a mere few hours later, a bunch of things really did catch my attention, so now I am writing about them.
Strap in, as game theory guy used to say.
1.0 Some Things That Caught My Attention
We can hallucinate it for your wholescale1
There is a thing (there are citations, I’m just not looking them up) where we know that human memory is fallible and that we can remember things that did not happen. There is (also citations, etc) the theory that “memory” isn’t actually “memory” and they’re instead reconstructions or re-predictions of what happened, that there isn’t some sort of actual event log that gets written to, immutably, and then is played back. As if memories are a story we tell ourselves about the past, and they’re affected by recall.
Anyway, all of that is to say is that Stable Diffusion, the open source image generation AI deep learning model, has now been hacked together to do some sort of image compression2, see also3.
I’m going to do that thing where I’m going to be annoyed at the people who write headlines and subheads because oh my gosh. Here’s the subhead for the Medium article:
Stable Diffusion makes for a very powerful lossy image compression codec.
Which, you know, I know what a lossy image compression codec is. It’s the kind where you, well, lose detail in the image and you can’t reconstruct, pixel for pixel, the original image. It’s lossy in a good-enough way because turns out our eyes (hey, wait for it) and our human visual cognition is super good at filling in gaps. MP3 compression works this way too! Fraunhofer modeled how we acoustically perceive sound and figured out what to throw away based on how our brains work, and that what we hear isn’t the same 1:1 mapping to the physical waveform. (Look, I am just wanking around here, I know I’m handwaving. Hopefully it is good-enough handwaving. You should go do your own research).
Matthias figured out a way to use parts of Stable Diffusion’s architecture (i.e. its different neural networks) to encode an image into the latent space of of the model, which (I’m handwaving, bear with me) roughly means you take all those concepts that are learned from the giant dataset, stick them all in a stupendous multidimensional array, and that’s how you get clusters of concepts around other concepts. Which means you encode the image from the pixels into a (handwaving) learned conceptual representation of it. This is, I think, a bit like saying the image has a table, an owl, the owl is roughly whatever shaped and it’s a brown owl, or whatever. I am, of course, and still, handwaving. So Matthias took an image, fed it into the model, and the representation of the image in the model was Very Big because the map to the representation used big 32bit floating point numbers. He tried a thing where he did the equivalent of, er, rounding down, the numbers to 8bit floats, which is (handwaving) kind of the difference between a 16 bit, 44khz stereo waveform and quantizing it down to an 8 bit, 22khz stereo waveform and the kind of thing you had to put up with if you had an early Sound Blaster in your PC.
Just like with that sound though, you can still hear it, so it turns out Matthias found out the image is good enough.
But also not quite. It’s listed in his limitations part, but neatly summarized by this part of the Ars Technica article:
Bühlmann’s method currently comes with significant limitations, however: It’s not good with faces or text, and in some cases, it can actually hallucinate detailed features in the decoded image that were not present in the source image. (You probably don’t want your image compressor inventing details in an image that don’t exist.) Also, decoding requires the 4GB Stable Diffusion weights file and extra decoding time.
Remember how I talked about how humans reconstruct memory? That there are probably shortcuts (we think) because it would be tremendously energy inefficient to transcribe literally the pattern of the rays of light hitting our retina that make up a memory, never mind how you’d transcribe, I don’t know, your proprioception at the time? That it’s a reconstruction?
Stable Diffusion isn’t necessarily compressing images, it is remembering them, and because it’s using a lossy neural architecture (or rather, we’re making it use a lossy, quantized approach in order to – ha! – be more energy efficient), then it is doing (handwaving, again) what we do: hallucinating and filling in gaps with something that is kind of right, and perhaps good-enough.
That is a thing that definitely caught my attention.
Americans are weird
I do not know where this one came from, but here is a literal thought that went through my fingers into a tweet, and what can I say, you decided to subscribe to this (and in some cases, for which I am very grateful, are supporting it monetarily):
also, so visiting disney for a lot of people is like visiting a place where the infrastructure works and has been continually invested in? (tweet)4
Oh wait, now I remember. It was because one of my internet friends went on their first flights since the beforetimes and there was no potable water. (You should buy Tim’s books. After you’ve bought them, you should read them.)
Anyway, the whole no-potable-water thing triggered a whole “ha, what is infrastructure in a modern society anyway” and before you know it, I’m realizing that Disney resorts are a vacation for Americans to the place where infrastructure works and you can drink the water.
This is… wild? And as a Brit (sigh) who’s lived in America for over a decade now, it also makes so much sense? There is a place that people will save up money for, a lot of money, that has rubbish collection, that is walkable, that has the equivalent of free public transit.
Look, on one level I know that this is part of the point of a vacation, and yet if I am to keep up this episode’s frankly unprecedented levels of handwaving, this totally reinforces some of my understandings of America? It’s as if you (generalising, of course) can’t collectively behave as if you just deserve this kind of theme park vacation infrastructure, that it can only be the reward for some sort of hard work, that it can’t be for everyone, and that obviously you couldn’t just live there. Instead you’ve all been punished and can visit a walkable functioning city, as a treat.
I am less interested in the whole (valid) point of view of Disney, the corporation, and its values and how it acts as a giant sinkhole in our time, how it protects its interests. All of that is there, and all of that is true. But this realization for me, such as it is, feels pretty revelatory.
Not entirely sure what I’d do with it, though. It’d probably be good fodder for a political speech or campaign, given how much a cultural bedrock position Disney enjoys.
Visual Basic But For AI
I am now reasonably convinced that Google Colab is some sort of Visual Basic successor in the ML/AI space. This is why:
- it is a sort of code/but not code environment
- it’s fairly accessible but not too accessible
- people are building a lot with it
- in fact I wonder how many Colab notebooks are being used in production
- it is also very Excel-like in how code executes in cells
It’s also not very Visual Basic at all, and that’s more a result of it being a product of Our Internet Times: it’s collaborative, it’s sharable, it’s forkable, it’s inspectable, it’s only really usable online, and so on.
Anyway. A potentially useful analogy, I thought.
Okay, that’s it for today!
The end of September is always a busy month for us so everything is kind of stressful. And yet tomorrow will be a new day. I should remind myself of that, at least. Maybe it will be helpful for you, too. Anyway, how have you been?
Best,
Dan
Here’s a brief note that if you’re enjoying, or at least stimulated, I can read the room and know the state of the world, this newsletter, then you can join 189 others and become a pay-what-you-want paid supporter. Maybe you’ll help me hit 200!
(If you’re the kind of person whose boss will pay for things, there are options for you, too5).
Paid supporters get a free copy of Things That Caught My Attention, Volume 1, collecting the best essays from the first 50 episodes. Free subscribers get a 20% discount.
-
No, it’s not a typo. It’s intentional, see, the actual quote would be wholesale, but because it’s tech, it’s at scale. ↩
-
Better than JPEG? Researcher discovers that Stable Diffusion can compress images, Benj Edwards, Ars Technica, 27 September, 2022 ↩
-
Stable Diffusion Based Image Compression, Matthias Bühlmann, Medium and Towards AI, 19 September, 2022 ↩
-
Okay, here’s the boss options: $25/month, or $270/year; $35/month, or $380/year; or $50/month, or $500/year. It’s coming up to the end of Fiscal! ↩