It’s Friday, 6 May 2022, a grey day that’s an improvement over the full-on rain that we had during most of yesterday.
It’s also episode 49 of season 11, the penultimate episode before my arbitrary 50 episode cut-off. Arbitrary excitement!
Two things today:
“OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes” and that the model can make harmful content “even when provided with a relatively innocuous prompt”
At this point, this observation is not new and should not be a surprise to anyone involved or interested in the field: the toxicity likely comes from “a primary source” (training data) of the model, which was unmoderated text from Reddit.
Look. Reddit is not an unmitigated cesspool. I have made this point before: the good bits of Reddit (of which there’s a non-negligible number) are bits that, generally, have active human involvement in community management as opposed to content moderation. Toxic language is as much the context as it is the literal words otherwise we lose the ability to talk about things. But the internet has been great at so-called context collapse.
Now, I’m not sure why this feels so controversial or such a surprise, is it because I’ve been interested in childhood language development for ages? (Thanks, linguistics PhD mum!) It is because I’m a parent now? Is it because my temperament is more towards the control side, when not being taken for a free-wheeling, wonderful ADHD rollercoaster?
I mean, the point is this: what values are you trying to instill in a language model? There are certainly some values that you’re literally trying to instill in a language model and you need to understand that while you may not think those are values, they are implicitly about what is important and what is not important. That’s why, I think, people are excited about attention-based transformers4!
What is important? Finding cancer. Or lung damage. That’s what you think is important. But sometimes instead, what turns out to be important to the model, because of imperfections in being able to clearly and in detail explain what it is you want, you get a model that’s also good at identifying people who were lying down when they had x-rays taken.
In a way, this feels exactly like science fiction tropes about AI, the kind of “well, I did it because I learned it from you, parental unit” – we exposed them, insufficiently moderated and guided, to “the human experience”, didn’t take into account that some spaces of the human experience (funnily enough, technology-mediated internet spaces that don’t prioritize or recognize the value of adequately resourced and supported community management, instead prioritizing some sort of “free speech”, itself a difficult concept) are actually super shitty and toxic places to be, and… are surprised that what comes out isn’t a nice, benevolent AI and is a reflection of ourselves? But it strikes me that we are human and we like finding shortcuts and, if you were to handwave it and say, well, in retrospect, the evolutionary pathway of AGI involved humans taking the shortest, least-energy path to AGI, which happened to involve “not spending that much time on quality of training” because, you know. Evolution didn’t do that for us, either. Which is a bit depressing but hey, we have the ability to reason.
I did not know this story about how an in-joke in PHP’s parser existed for many, many, many years, was demonstrably bad for for the language, and the main reason for not fixing it was “tradition”. Short story: the internal parser for PHP has a token (say, a representation) for unexpectedly encountering two double colons (“::”), and what used to be displayed to a developer (or a user, if the error wasn’t handled correctly) was this:
Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM in Command line code on line 1
The end result being one of the most common Google searches relating to PHP was, roughly, “what the fuck does T_AAMAYIM_NEKUDOTAYIM” mean. This, if you’re a reasonable person who assumes a language should be relatively clear and easy to use and on the side of the developer who’s trying to get shit done, is, uh, “non-optimal”. There’s a bunch of ways to fix it, one of which being: you don’t need to show the internal token from the parser, i.e. there’s no need to show T_AAMAYIM_NEKUDOTAYIM in the first place, you can display literally anything else to the developer/user.
AAMAYIM_NEKUDOTAYIM apparently means double colon in Hebrew, and was kept to “recognize the contribution of Israel to PHP”, which, you know, fine. Also what Thanks and Contributor sections in READMEs etc are for.
But the main objection was “having to Google to find this out is a reasonable thing to require, a reasonable cost in developer experience, and if you don’t know what it means or have forgotten after your first Google, are you really a Real Programmer?”
To which the response was, quite rightly: fuck you, no, are you out of your goddamn mind.
Caught my attention because: a nice writeup of in/outgroup dynamic, the changing audience of a programming language and requirements around accessibility (look, if your language is successful… more people are going to use it?), the handling of outright trollish behavior combined with a neat technical solution that shut the trolls up.
I’ve launched three new tiers to support this newsletter as a professional expense and my goal is to sign up five supporters to each tier by the end of May. One sign up so far, so thank you!
The new tiers are:
If you’ve got any questions or comments about these new tiers, drop me a line.
Meanwhile, if your boss isn’t paying, then here’s the regular link to become a paid supporter.
That’s it for this week. It’s been a long one, but I bought Googly Eyes and now I have a whole bunch and it’s ended/ending on a high note.
How are you?
Arthur Holland Michel’s commentary on the Meta OPT-175B language model, Twitter, 4 May 2022 ↩