s17e04: You don’t want an intelligent assistant; Protocols, Not Platforms
0.0 Context Setting
Thursday, 11 January, 2024.
Yesterday I used Llamaindex to shove a corpus of several hundred newsletter episodes into a word vector store, hooked it up to a query engine and chatbot engine and started figuring out how to ask questions about what I’ve written. First I tried to do it against a local model, gave up, and then became the biggest hypocrite by spending about $13 worth of OpenAI credits against GPT4. One thing that didn’t work out was not marking up the individual episodes as date-based documents, so I couldn’t get good responses to queries like “when did I write about x” or, as one suggestion I had, “give me examples where I’ve changed my opinion over time”. So I’ve got to do that.
The entire experience was... weird, but not so weird to put me off playing with it. I still want to get it running against a local model, though.
1.0 Some Things That Caught My Attention
1.1 You don’t want an intelligent assistant
I mean clearly you do, because who doesn’t want a magic thing that magically does what you tell it to do? A sort of WYGIWYM, a What You Get Is What You Mean machine.
This follows my writing about the Rabbit R1 from the previous episode1.
Benedict Evans had kicked off a discussion on Threads2 about what he thought was the failure of the pen computing paradigm, because Apple have shipped “a technically flawless pen computer, and it’s pretty much useless for anything except actually drawing. Pen computing didn’t happen”. He then compares it to voice/natural language processing and its surfacing in chatbots. I think this is also in response to the Rabbit R1 demo, of which there’s a brilliant writeup in Aftermath3. You should read the Aftermath piece because it takes the time to go note all of the demoed activities. This included:
- Order me a 12 inch pizza from Pizza Hut, the most ordered one on the app is fine.
(Who the hell orders like this? At the very least I would’ve re-written the demo to “the last one I ordered is fine”)
I wrote last time, briefly, about the potential technical implementation of what Rabbit is supposed to do -- which could easily be in the form of an app rather than the interesting hardware -- and how I really can’t see how it works in today’s economic landscape. Rabbit can only work, I think, in these ways:
- formal API integration with the underlying service (at the whim of the providing service -- and I emphasize whim)
- using your credentials to access a website on your behalf, and hooking up a ReAct4 loop to control a headless browser
- using your credentials to access a native application on your behalf (running on... a virtualized device? An actual device?), and hooking up a ReACT loop to control some sort of scripting or robotic process automation (i.e. the “show it what you want it to do by letting it watch you use Photoshop”)
All of these methods rely on the good graces of the underlying providing service, none of which I think are particularly incentivized to hand-off access (value!) to a third party provider, even if they’re being paid. Never mind the security implications if this access isn’t mediated through a formal API.
Anyway, that’s all beside the point!
A useful intelligent agent that can be invoked over voice or text requires, I think, a frankly terrifying amount of contextual and personal information about you. Here’s one of the other demonstrated cases:
- I want to take my family to London, it’s going to be two of us and a child of age 12. We’re thinking of January 30th to February 5th. We like cheap nonstop flights, grouped seats, a cool SUV, and a nice hotel with wi-fi.
- Could you come up with a schedule of fun things to do while I’m over there?
- It seems like this is too intense. Could you plan an easier schedule for us? 5
The response to the results -- that aren’t shown! --- involve just tapping “confirm, confirm, confirm”.
Look, I don’t know demo guy’s family situation. But I would have to be fantastically rich to use this to not just plan but also book a family trip because holy shit do you know what’s involved in a family trip? These are best cases.
In this example, for the agent to be useful, you’d want it to have access to your calendar so it doesn’t book flights that are less convenient. If it didn’t do that, then you end up cross-referencing against your calendar anyway to make sure you’re OK with getting up at 6 in the morning, or that you’ve got enough time to pack.
One way of looking at this is that useful agents have an experienced, mature world model. A good assistant -- human or not -- isn’t likely to suggest flights with a 30 minute layover when I’m traveling with two young kids.
A human assistant might know that because they either have direct experience, or because, you know, they’re able to imagine what-it-is-like-to-fly-with-two-kids, which I have to admit must be easier to imagine than what it is like to be a bat6. (As an aside, I would like Matt Berry to record a reading of this paper)
But I digress. The agent -- whether human or software -- would need to know where you live, your calendar, in this case it’s explicitly told the age of your kids. If you’ve been flying for a while, you’d want it to know that you prefer one airline over another. You might want to use your reward points. You might want to pay with one credit card over a different one.
The point being that an agent becomes more useful the more it knows about you, which includes how you operate in the world, which also includes how the world works.
In our current environment, what would you trust with that kind of information?
If you were going to supply that information explicitly, through voice, here’s some considerations:
- you’d probably want some defaults so that you don’t end up repeating yourself every single time. i.e. remember that when I say a family trip, I don’t need to remind anything that I have kids. (And ideally remember that people age over time!)
- for this reason it is clearly an idiosyncratic affectation that Captain Picard requests “Tea, Earl Grey, Hot” every single time, and I maintain, still, that “Tea, Earl Grey, Hot” is just the name of the 13 million loc macro he wrote
- which implies some sort of hybrid interface, it’s not like text entry is going to go away completely (I don’t think) - yes, gesturing and talking are more natural, but that doesn’t also mean that they’re more efficient or that they can’t also be more precise.
At this point, I’m now thinking about voice-visual assistants, and imagining a Panama-hatted colonial Brit POINTING AND SHOUTING at things. But seriously, voice-visual assistants.
Now I assume there are people out there who’ll throw their hands up and declare a sort of “fuck it, you all know everything about me anyway, it’s not like there’s any use fighting anymore” and that the utility of the purported intelligent assistant will outweigh any qualms about any further abuse, invasion, or third party breach of privacy. But what I’m saying is that the need for more contextual information is asymptotic: more information will always be better, so there will always be requests for more information if, for example, we are lazy and want to utter some magic words and have brooms sweep everything up for us.
Which is, apparently, what we have wanted forever and ever since we thought digital watches were a neat idea.
1.2 Protocols, Not Platforms
For some reason I re-read Mike Masnick’s paper, Protocols, Not Platforms: A Technological Approach to Free Speech7 recently. I lie, it wasn’t for some reason, it was because Substack were succeeding at annoying as many people as possible.
Mike’s smart, it’s a good paper, and it’s a good companion to my thinking over the past years (decades?) and also what’s recently been borne out, for example, when I was going on about Substack being stuck, for now, because they’re based on the open protocols that make up email8.
The very, very, very short version is that lots of different people have very different values of what they consider acceptable speech, and not only are those different values amongst people, but also different values apply to different spaces. This is clearly reflected in the approaches not only of different nation states, but also within those states.
Second, platforms are big! Too big! What they do well is they shave the edges off in terms of user experience, which invariably opens up access to more people through which to communicate (in general, something most democracies agree is a good thing), and also makes it easier to manage and moderate both content and communities.
Masnick’s proposal is choice in moderation. Right now, like I’ve written about before, the regulatory framework assumes choice between platform, and that’s the layer at which competition works. You can choose between Twitter and how Twitter is managed and Reddit and how Reddit is managed. (Reddit is a more interesting example, which I’ll go into below)
Separating out a protocol layer would be a way of unbundling part of how social network platforms work. One way this would work would be for them to be responsible for the ingest and display of feeds and the storage and transmission of content. In this way, Facebook would be separated out, I think, into your mailserver, (both smpt and imap/jmap(ooh!)) and your mail client. What’s different in this analogy is that you’d be able to choose your spam filter. But Facebook would still offer a first-party client -- in the same way that you can use other clients with Gmail (for now!).
You may have figured out that this is, essentially, the deal behind Twitter and Reddit renegotiating the commercial terms behind API access to their platforms, instantly killing off or mortally wounding the viability of third party clients. One of the purported reasons for this is that third party clients were circumventing the ability of the platform owners to display ads and monetize the feeds and content. Which is fair! And yet.
A protocol-based approach would, at a high level, seek to provide a marketplace of filters for each platform. You don’t even need to have interoperable platforms for this (i.e. federation and cross-posting -- though it would be nice, if only because a consistent interface would make it cheaper for those operating filter products).
Masnick suggests that there are new business models here -- one might subscribe (pay?) for moderation filters or other services provided by the ACLU or the EFF. I think a sticking point here is the cost and infrastructure required to sustain such filters at scale, if that scale is needed. One example or reference I keep coming back to just because of my age is the introduction of local loop unbundling in Europe and the UK, which eventually required the local telco monopolies over the last mile of copper to provide ISPs at-cost rack space in exchanges to kickstart ADSL provision and ISP competition.
Thinking about it, these are filters on incoming content, from content that you might have subscribed to. A marketplace in moderation filters, but not a marketplace in recommendations, and a lot of talk right now -- see the previous episode -- is in how For You recommendations work.
Another aside. LinkedIn’s homepage/feed is now effectively a For You feed with no option for a following feed. I hate it, especially now that post Twitter’s implosion that friends and peers are posting there more and I’d like to hear what they’re up to.
Anyway. One thing that stood out to me from Masnick’s 2019 paper was its mention of Reddit. First, he compares Reddit to Usenet -- both of which are collections of communities, of which management of those communities is devolved down to members of those communities.
I’ve written before that because management subreddit communities is in practice devolved to the moderators of those communities, you can get “good” subreddits and “bad” subreddits: it’s purely down to how those communities are managed by the individuals involved, their approaches, and so on. Sometimes really big subreddits are managed really well! Sometimes smaller ones are managed really badly.
The thing is, Reddit offers a great positive example of Masnick’s protocols-based approach. The moderators of pretty much all the big subreddits (citation needed) use bots to manage their communities. These bots use the Reddit APIs to do content and community moderation tailored to that community and its needs, because the bots are, well, written by highly integrated (and responsible?) members of the community. The kerfuffle around Reddit’s leadership dicking around with API terms wasn’t just about third party clients, it included moderators pointing out that they too relied on Reddit’s APIs to keep their communities in order as places that people would like to visit and post.
A shift to protocols will not happen absent regulation. I don’t think there are sufficient incentives for platforms to open up themselves, never mind the fact that having to offer support for protocols like these would be... expensive. They would only come about, I think, as a proactive response to threatened regulation, and even then, I think Facebook has demonstrated that it would rather pull out of a market than be threatened with unbundling.
The thing is, a protocol-based approach would, I think, be a more efficient way -- a less hacky one, at least? -- to deal with the increasingly different demands of nation states in terms of how platforms treat content. Platforms are already localized and yes there’s clearly a difference between private, internal APIs and commercially supported external APIs.
With so many companies now wanting to speedrun content moderation, plus the rise in usage of protocols like ActivityPub, and then on top of that Meta’s roadmap for ActivityPub federation through Threads9 I think there’s actually room for some practical demos and experimentation. I don’t think I’ve yet seen big-enough examples of what a choice in computational moderation would look like, in an end-to-end stack for something like Mastodon/ActivityPub10. Would the user client be aware of the store? The client would need, I think, to present the storefront? Does a content moderation provider become an intermediary between your home Mastodon account’s server and your client? Who knows! I should probably figure this out. Same with Threads, at the point where Mastodon users will be able to read Threads posts.
Well, that’s Thursday. How’ve you been?
Oh, a quick question -- do you read the footnotes? Are they annoying?
Best,
Dan
How you can support Things That Caught My Attention
Things That Caught My Attention is a free newsletter, and if you like it and find it useful, please consider becoming a paid supporter.
Let my boss pay!
Do you have an expense account or a training/research materials budget? Let your boss pay, at $25/month, or $270/year, $35/month, or $380/year, or $50/month, or $500/year.
Paid supporters get a free copy of Things That Caught My Attention, Volume 1, collecting the best essays from the first 50 episodes, and free subscribers get a 20% discount.
-
s17e03: Personality; A Rule about Designing Games (archive.is), me, 10 January, 2024 ↩
-
Benedict Evans, Threads, 10 January, 2024 ↩
-
Why Would I Buy This Useless, Evil Thing? - Aftermath (archive.is), Chris Person, Aftermath, 10 January, 2024 ↩
-
ReAct: Synergizing Reasoning and Acting in Language Models (archive.is) ↩
-
Introducing r1, rabbit, 9 January, 2024 at 14:25 ↩
-
What Is It Like to Be a Bat? on JSTOR (archive.is) (PDF copy) ↩
-
Protocols, Not Platforms: A Technological Approach to Free Speech | Knight First Amendment Institute (archive.is), Mike Masnick, Knight First Amendment at Columbia University, 21 August, 2019 ↩
-
s16e11: Substack's Thin End of the Quora Wedge (archive.is), Me, this here newsletter, 9 October, 2023 ↩
-
How Threads will integrate with the Fediverse – plasticbag.org (archive.is), Tom Coates, Plasticbag.org, 11 January, 2024 ↩
-
Ugh, it’s like I just said GNU/Linux ↩