s2e29: Robots; Other things 

by danhon

60.0 Sitrep

5:35pm on Thursday 17th December, having interleaved conference calls and reviewing documents and making a particular point about a particular thing, *for emphasis* during a conference call, with running a bunch of errands and getting a whole bunch of Adult Things done.

1.0 Robots

And so in thinking about podcasts from last episode[1], a few follow-on thoughts:

– no, I’m not saying that all podcasts *must* provide transcripts
– nor that podcasts must be *required* to provide transcripts
– instead, that podcasts should *strongly consider* providing transcripts because it makes the information in the podcasts *more* and *differently* accessible

This is because:

– *requiring* people to produce transcripts is obviously a barrier to entry, and we like letting people create things because creating things, in general, is good, and reducing barriers to entry for things like freedom of expression is also good[citation needed]

And yet:

– would transcripts of podcasts kill podcasts? I have no idea. I just know that there are things in podcasts that I want to know that are *hard for me to know*. The argument here is, I think, that sponsors might not have as much incentive to sponsor podcasts when there’s a whole bunch of other sponsored content (namely: just the transcripts) floating around, to which my response is: well, make sponsoring the podcast include covering producing the transcription, duh.

But then:

– a dollar-a-minute transcription cost is still *too high* for some podcast producers, in which case we throw software at the whole problem and you get automatic transcription that for a lot of purposes is probably *good enough* and everyone else can get handmade artisanal transcripts that are better, but then a whole bunch of people will say well what are commas and punctuation good for anyway.

OK, so, robots.txt, right? Because *one* of the aspects of the medium of podcasts is that there is a barrier to entry to participate or to consume with which I can draw a parallel to this newsletter. I *like* newslettering instead of blogging because of the feel of the medium and the material of the newsletter form. I like no public comments. I like it appearing in your inbox. I like that when people reply, they reply to just me, and not to the world. I like the different relationship I appear to have made with the people I’m writing to, or, at least, the people who have chosen to receive my writing. Those are all inherently newslettery-things and, I feel, there are podcasty equivalents as well. Namely, it’s harder to just point and link and then do an internet pile-on if someone says something unfortunate or ill-advised in a podcast.

So, what to do when the googlebot that indexes things comes along?[2] There are podcasts that I don’t want indexed / made searchable, so I can just include them in my robots.txt, right?

At *that* point something starts going ding ding ding in my brain because yes I get it robots.txt is talking about excluding a spider from grabbing content and indexing it, but there’s something different, some sort of step change when you’re talking about excluding a robot from grabbing a piece of content (ie: podcast audio) and *understanding it* and then making *its understanding of it* available to the public. It’s like you let a toddler loose and you couldn’t tell it to not ask questions about EVERYTHING THAT IT SAW.

The point of this is a robot exclusion protocol[3] not for saying things like:

User-agent: Googlebot # all Google services
Disallow: /secret-feelings-journal/ # don’t index my secret feelings journal
but instead for a robot exclusion protocol for processing or understanding or some sort of higher level intent our outcome. Instead, a sort of robot exclusion protocol for things like this:
User-agent: googlebot-transcription # Google audio transcription bot
Allow: /podcasts-cleared-for-public-consumption/ # these ones are ok
Disallow: /podcasts-recorded-with-friends-for-fun/ # look, I don’t see the harm in just having a bit of fun with the lads, oh okay, maybe yeah that’s totally sexist and misogynist

which is a bit of a no-brainer, but also things like this:

User-agent: googlebot-faceid # Google face recognizer bot
Disallow: / # don’t fucking come into my house and start recognizing all of my friends and then linking my photos to their now-defunct Google+ pages

User-agent: googlebot-sentimentrank # content sentiment engine
Disallow: / # yeah, and don’t come in thinking you know what I *mean* when you’re just a piece of deep-learning cloud-deployed architecture and then associating sentiment analysis of everything I’ve ever written against my now-defunct Google+ page
Allow: /resume/ # apart from my resume, please index and show the world the awesome sentiment in that

User-agent: googlebot-traitengine
Disallow: / # oh and I don’t want you coming in here and reading everything I’ve ever written and using it to infer character traits that you’ll also associate against my now-defunct Google+ page

Of course, you could accomplish this with HTTP response headers too, and if you’re using *those* then you might be able to have more fun:
X-Robots-Tag: notranscription # don’t transcribe anything you see here
X-Robots-Tag: nofaceidentification # don’t recognize any faces you see here
X-Robots-Tag: nosentimentanalysis # don’t perform any sentiment analysis
X-Robots-Tag: nosocialassociation  # don’t make any social associations

In other words, do we need robot tags for “index this, but don’t understand it”?

[1] s2e28: Podcasts; Other Things; Hue
[2] Robot Exclusion Protocol (Ftrain.com) (this is, I note, not the *first* time I have linked to Ford’s Robot Exclusion Protocol story)
[3] Robots exclusion standard – Wikipedia, the free encyclopedia

2.0 Other things

– Stubbornly minimally viable products are ones where there are very “small” changes that could be made that would make the product “better”, but the product owners are being Really Fucking Obtuse about their aesthetic[1]

[1] William Henderson on Twitter: “It would be trivial for @craigslist to require serial numbers for bike listings. Only their idealism prevents them. https://t.co/m9OohJ5A5S”

– The code that is displayed on on-screen-displays in THE MARTIAN (e.g. when, say, Watney turns something off and leaves), is actually code from NASA’s PVSlib[1], which is their formal verification library (which is a *very* NASA thing to have). I know this because I am about to humblebrag that for some very good reasons that were very good at the time, I happen to be a full voting member of the British Academy of Film and Television* Arts[2] and I got to have some freeze-frame fun with a DVD screener. Bonus points for the Rover having an IP address (10.30.80), minus some points for it being a weird one (I initially misread it as a valid 24bit private network), but hey…

[1] nasa/pvslib
[2] https://www.bafta.org I say “*” because it includes Videogames now which is the ostensible reason why I’m in it because of valid yet for some people inexplicable historical reasons

– I continue to have opinions about how to get governments and technology from here to over there.

6:09pm. OK, got to run. Notes welcome as ever.