s16e18: Generative People Personalities; Content Distribution Networks (Taylor's Version)
0.0 Context Setting
It's Friday, 27 October, 2023 in Portland, Oregon and I am tired and have a headache and need to look at something other than this monitor.
0.1 Hallway Track 004 Work, the internet, and technology in fiction
If you're interested in how technology and the internet has affected work and what work might look like in the future, then you should probably register for Hallway Track 004.
I'll be chatting with two of my favorite authors and writers, Joanne McNeil and Tim Maughan and hosting around 25 of you to pretend we're mobbing them after an exceptionally interesting panel.
Next Thursday, 2 November, 2023, free and on Zoom at 9am Pacific, 12pm Eastern and 5pm London. Come join us!
1.0 Some Things That Caught My Attention
1.1 Generative People Personalities
OK, I'm calling it. Via Metafilter1, Boston Dynamics glued together:
- their robot dog Spot;
- OpenAI's gpt-4 for controlling Spot and also Spot's speech output;
- BLIP-2, for identifying objects in images and answering questions about images;
- Whisper, to convert incoming audio into English text; and
- ElevenLabs to create text-to-speech
to create a Spot-based robot tour guide2.
This is all fine and good but what caught my attention is not the gluing-together of things, but exactly how well the things behave when they're glued together. The qualitative glue-result, as it were.
For that, there's a video on YouTube3.
Assuming this isn't an elaborate prank, what really caught my attention is the quality of the generated text and speech output.
The video shows people (most of them Boston Dynamics employees) demoing interactions with these tour guides:
- Fancy Butler4
- Precious Metal Cowgirl5
- Excited Tour Guide Robot6
- "Josh"7
- 1920s Archaelogist8
- Shakespearean Time Traveler9
- Teenage Robot10
- Nature Documentary11
The speech synthesis is off the charts to the extent that I'm a bit suspicious, but on the other hand, ElevenLabs'12 demos sound super good.
Listening to the demos and the different personalities is amazing. I think it's the speech synthesis that tips the whole thing out of the uncanny valley and into something different.
One personality stuck out in particular -- "Josh", if only because the Josh examples feel like they have the same sensibility and attitude as TARS and CASE from Nolan's Interstellar. These all feel like, well, if not genuine people personalities13, then I'm going to call them generative people personalities. The thing about Adams' GPPs was that they were more a commentary on how those with the power to create and deploy those personalities would use them. I am relatively convinced that, somewhere out there, there's a team trying to figure out how to shoehorn in the above types of technologies into making sure smart home windows and doors can deliver surprise and delight, a sort of horrific torment-nexus manifestation of Share and Enjoy.
And, these systems don't necessarily require the processing power the size of a planet, but seriously: inference on GPT-4 takes a lot, and I have no idea what sort of hardware ElevenLabs' speech synthesis is running on.
Now that I'm thinking about it, there's one other thing that feels surprising, or at least was interesting to see, provided there's no sneaky editing in the film: there's totally enough room in the human experience needed to fool people into thinking they're talking with something to cover round-trip latency. Although that said, there's no interrupting in any of the videos. That's still hard.
Anyway, it's nice to see that Boston Dynamics had fun with this, if only in the creation of different kinds of personality and that at least one of them is ornery.
1.2 Content Distribution Networks (Taylor's Version)
This is how my brain works: Taylor Swift's album 1989 (Taylor's Version) was released today, which I think for most people means that the streaming content became available at some timezone's local midnight.
Apparently Taylor Swift is a big deal!
So I wonder: what's the infrastructure required to support the release of a new Taylor Swift album?
How many copies of 1989 exist at the edge? How many were seeded to the edge ahead of time?
Some random Twitter accounts says 1989 (Taylor's Version) has hit over 700 million streams on Spotify. The version I've got downloaded over Apple Music is ~380MB, so naively, if that were streamed and not cached at the client, that's 266 petabytes of Swift, just on Spotify, in the last (as of time of writing) 16 hours.
Meanwhile, the Mirror breathlessly and tabloidly reports that Spotify and Apple music were DOWN14 after the release of the album.
I would love to see an internet traffic report, something like "on 27 October, x% of the internet's traffic was Taylor Swift-related".
OK, that's it for today, and that's it for this week.
How have you been?
Best,
Dan
How you can support Things That Caught My Attention
Things That Caught My Attention is a free newsletter, and if you like it and find it useful, please consider becoming a paid supporter.
Let my boss pay!
Do you have an expense account or a training/research materials budget? Let your boss pay, at $25/month, or $270/year, $35/month, or $380/year, or $50/month, or $500/year.
Paid supporters get a free copy of Things That Caught My Attention, Volume 1, collecting the best essays from the first 50 episodes, and free subscribers get a 20% discount.
-
You have 20 seconds to comply, old sport | MetaFilter (archive.is), Rhaomi, 27 October, 2023, Metafilter ↩
-
Robots That Can Chat | Boston Dynamics (archive.is), Boston Dynamics ↩
-
Making Chat (ro)Bots - YouTube (archive.is), Boston Dynamics, 26 October, 2023, YouTube ↩
-
Fancy Butler @ 0:00 ↩
-
Precious Metal Cowgirl @ 1:39 ↩
-
Excited Tour Guide Robot @ 2:36 ↩
-
1920s Archaeologist @ 3:50 ↩
-
Shakespearean Time Traveler @ 4:19 ↩
-
Teenage Robot @ 6:02 ↩
-
Nature Documentary @ 6:42 ↩
-
ElevenLabs - Generative AI Text to Speech & Voice Cloning (archive.is) ↩
-
Genuine People Personalities | Hitchhikers | Fandom (archive.is) ↩
-
Spotify and Apple Music DOWN after Taylor Swift's 1989 rerecord release - Mirror Online (archive.is), Zoe Forsey, 27 October, 2023, The Mirror ↩