s07e27: Tattletales, or, OK Google, what do you have on me?

Not all popular consciousness

        December 17, 2019

        0.0 Situation report

Tuesday December 17 2019.

The thing about calling your doctor’s office to make an appointment because you’re having chest pain and tightness of breath is that when you talk to them about your symptoms, they’re very thorough and tell you that if certain other things happen, you should go to the emergency room straight away. They also make you an appointment for as soon as reasonably practical.

The thing about seeing your doctor when you have that appointment a few days later is that it actually turns out to be quite reassuring when, after a pretty thorough consult, the most likely cause of the pain and shortness of breath is a viral chest infection.

The thing about getting an EKG done just in case, just to be sure, you see, after you’ve gotten an initial diagnosis of that viral chest infection is that you get to talk to the nurse about the 12-lead EKG she’s setting up and that time you got to write about the wrist EKG on the Apple Watch and find out what this particular nurse and doctor think about it.

The thing about your nurse handing the EKG print-out to your doctor is that you’d normally hope for them to take a quick look at it and pronounce everything is absolutely fine, and that it’s definitely most probably a viral chest infection, just make sure you get rest and you should be fine.

The thing about your doctor taking a bit longer to look at that EKG and making a sort of hmmmm face which is entirely understandable and then stepping out to speak to their attending (whom you met previously and was a thoroughly nice and reassuring attending) is that clearly there’s something not quite right about the EKG, but on the other hand, not catastrophic because people aren’t yelling and you’re not getting the feeling that surgery is imminent.

The thing about getting some labs done to rule out a blood clot that might be indicated by the EKG and your other symptom is that these days, you go to see the phlebotomist and they’re very good at taking your blood, something that you’re used to when you get your a1cs done, and it doesn’t happen to hurt you at all, but the thing about that is you’re told to expect a call if the results are abnormal but in this day and age, you actually get the mychart notification that the results are available before anyone is going to call you. So you get a little bit anxious when you see the results, because you might not know what they mean.

The thing about looking at the results in mychart is that you think you know what they mean and you’re pretty relieved and then you get another mychart notification and it’s your doctor and yes, everything is fine, but they would still like you to get an echocardiogram anyway because of those readings at the bottom of your heart.

Anyway. How are you?

1.0 Things that caught my attention

This episode: one big thing about tattletales and data exhaust and so on. Then some smaller things.

I keep teasing you all but this is… probably? the last newsletter of the year and we’ll start a new season in 2019. We’ll see if I can exercise any self-control and not write anymore, hm? 

1.1 Tattletale

The generally accepted idea now is that pretty much anything that’s connected to the internet is spying on you. 

In the space of ideas, “everything is spying on you” is close to “if you’re not buying, you’re the product being sold”: when the Portland FBI office feels like it’s appropriate (or, at least, not stupid) to issue a notice about securing your smart TV around Black Friday, I think enough of these concerns have seeped through into the popular consciousness. (At least, the popular consciousness that spends time on the internet. So, you know. Not all popular consciousness). 

One point, that I’m not going to get into here, is that consumers (sorry, people) don’t appear to have much of a choice. If you want a TV (never mind a smart one), it’s going to be connected to the internet. When you buy it, are you going to know, as a high-level product feature, whether it will or won’t transmit data about your viewing habits back to its manufacturer or a third party? Yeah, I thought not. This is exactly why Mozilla’s *privacy not included effort exists, to help people be more aware about the data and privacy implications of whatever gadget. But I still think that while we say we care about these things it’s… you know, effort to have to check whether any of these devices will tattle on you, and what they might tattle.

There’s a general problem here which is for pretty much any internet connected device, I imagine you probably have zero practical information before you purchase (or after) about what kind of information it’s sending back to, well, whomever.

Which is kind of the point.

One of way framing this issue in, say, a more emotional tone would be: people don’t like being gossiped about. People don’t like other people talking about them behind their backs. Although I suppose people in general would at least understand the behavior and some reasons why gossiping happens. But my point here is that gossiping is more-or-less seen as shameful behavior. It’s not polite. 

So it’s in this context that voice assistants like Alexa are yet again even more troubling. Because it’s not necessarily the Amazon Echo that’s tattling on you and the Amazon Echo that’s acting as a smart home IoT hub and sending whatever information back to whomever. There’s the related point, which is that an Echo presents itself as Alexa. You deal with Alexa-the-interface, not really Echo-the-object. And Alexa defaults to presenting as a woman. The feminization of intelligent voice assistants strikes again, and Alexa ends up with a reputation as a gossip. Great.

As an aside, if you watch The Good Place, there’s a whole continuing bit about how Janet, a, well, interface, looks like a woman and sounds like a woman but constantly has to remind everything that she is not a woman. (Spoiler-filled recap video) 

Alexa, Siri, “Hey Google” and Cortana (who is kind of disappearing) are the front-end interfaces that, to varying degrees, have access to vast amounts of data about you, and are intimately connected to and inseparable from that data and its collection.

So it’s a bit strange when I feel like we can step back and say: hang on, I get a digital wellbeing dashboard on Android and a screentime dashboard with management controls on iOS but… not a dashboard about what sort of data is being exfiltrated and who’s getting it? 

In other words, why don’t these commands exist?

OK Google, what data have you shared about me today?
Alexa, what data have you shared about us today? 

Or why don’t these commands about location exist?

OK Google, who are you sharing my location with?
Hey Siri, what apps have used my location today?

I made a sort-of-joke in the conversation that prompted this thought that perhaps after everyone learns to code maybe the next step is for everyone to learn deep packet inspection. The easiest analogy about DPI would be that it’s the equivalent of opening all the packets of mail to see what’s being sent (which is fine for governments to do when they follow their own, er, rules, but with the caveat that gentlemen do not open each others’ mail apart from when they absolutely try to do that all the time, always). 

In other words, if we’re inviting these devices into our homes and our lives, don’t we have some sort of right to understand what they’re saying about us? Or you know, if you don’t want to go so far as to say you have an automatic right to it, aren’t you at least interested? And how would you even go about finding out?

Apple’s iOS has, I think, more data than Android about what sort of data is being used, in part because of its slightly more fine-grained access controls and what feels like slightly more nuanced and thought-through use restrictions. The whole kerfuffle around iOS 13 upgrades suddenly slamming users with “this App wants to use Bluetooth, is that OK?” and users being quite confused as to why, say, your banking app has been using  Bluetooth, a short-range communications protocol that also happens to sometimes be used for getting your location, revealed that Apple has, on the outside at least, a particular opinion about what they want software to be able to do. 

I remember, a while ago, the idea that various home routers might do something like this - aggressively inspect every single packet of information coming into and out of a houeshold and produce some sort of dashboard or overview report. They do it now at a coarse level: routers will let you know how much bandwidth and data is being transferred for gaming or p2p or streaming. 

But we’re probably being a bit optimistic if we think that the default router/gateway provided to an internet subscriber by the telco is going to voluntarily tell you which programmatic ad network is getting the most pings, given that (shock horror!) your telco is probably attempting to do deep packet inspection on all your traffic anyway. It’s not as if they haven’t been trying to inject ads into your web browsing in the first place, not as if they don’t already have a history of that. 

A somewhat facetious answer to “why don’t these commands exist?” is that the answer would probably take bloody forever. But, isn’t that the point? It would be like the goddamn awful CVS receipts, the forever itemization of exactly what has been shared and where.

There are opposing tensions at work here, too, though. Generally speaking, if your traffic is encrypted—and much web traffic is now, especially after Google introduced its policy of using https as a ranking signal in search results in 2014– then you can’t quite use vanilla deep packet inspection: your mail can’t easily be steamed open and looked at by any random third party. There are ways around this because, of course, people are extremely financially motivated to have a look at your traffic, even when it’s encrypted. 

So what might this kind of report show? Thinking aloud, privacy and data reports on iOS might be able to cover:

which apps have accessed your location in a time period (24h, week, etc), but only when they’re using OS-supplied APIs and not doing, say, some browser-based shenanigans to try to determine it, or coarsely, by your IP 
… and how often

But it’s difficult to tell (maybe practically impossible?) — without seriously looking at the content of all network traffic leaving the device — exactly who or where any location information is being sent. An operating-system level report can tell you what APIs have been accessed (at least, the ones it knows about), but not what’s been done with that data or even where it’s been sent. Easy example: your location data could get encrypted or hashed or obfuscated in whatever way even before it gets sent over the wire over an encrypted connection. So, really, you don’t know. 

On the other hand, I do believe that some information is better than none, and information presented in a certain manner (ie: with the purpose to disclose and provide transparency, as opposed to information in order to inform an immediate choice) is definitely useful. 

I’d love a privacy and personal data-focussed dashboard report, whether at the mobile device OS level, or reportable at a home smart device level (a la asking Alexa), or even, say, across a browser and covering the usage of tracking beacons etc, this report to cover two fundamentals:

what data
to whom

with, you know, how often and so on.

But it’s not that simple because this is the real world and nothing is ever simple. 

Such a report can easily lull you into a false sense of security:

Masque of the Red Death @doctorow
@hondanhon Though caveat: "I shared your data with ____" tells you nothing about who ____ shared it with after that.December 12th 2019
1 Like

Cory’s right, and I’ve written about this before, obliquely, in the form of gossip networks. It’s useful, on one level, to be told what my phone has been gossiping and to whom. But whoever that data’s exfiltrated to can easily forward that data on. Once it’s out there, it’s out there. We’re all[1] familiar with this theory, of course:

[1] I swear to god if someone OK Boomer’s me about a fucking Wayne’s World reference…
There is not much to be done about this. Or, rather, it’s a completely different problem. There is no transparency for data gossiping or data promiscuity. Never mind unintentional breaches resulting in the dissemination of personal data, there’s no visibility right now on, say, the data that Facebook has about you and who it’s sharing it with, or co-mingling it with. 
I mean, just imagine if before going into a physical store you were aware, whether on a conscious or subconscious level, that your credit card transactions were being logged and who they were being shared with. You have no idea. You might have a slight belief that they’re being shared with “some people” who “make some decisions” maybe about things like a “credit report” but, really, who knows?
I think this is part of why people are suspicious about algorithms, even though they’re all human designed. It’s because they’re invisible. The crude, citation-needed evo-psych angle here is that we’ve evolved in a physical world where cause-and-effect (if that even exists, ha!) at a very simple level is governed by visible, interrogate..able newtonian mechanics. I hit X, it goes this way. But data, information about you, about people, is intangible and invisible now. Where does it go? How is it transformed? It has zero cost of duplication. How are people supposed to understand this as a fundamental concept?
Occasional attempts — like, say, requesting your Facebook data and then printing it out and binding it into books a la pre-CD, pre-internet encyclopaedias — to me feel like a sort of 3 dimensional peek into a stupendously high-dimensional universe. They’re just one fraction. You’re not seeing so many of the other dimensions like the directed graph of data flows, what goes in what direction, how data is split and recombined and multiplied, how it cycles. 
The closest we get to this are those sort of simplified Water Cycle diagrams from high school geography textbooks where 25 years later, I can tell you that water evaporates and condenses and rains. Three fundamental concepts! Three points in a circle! The behavior and dissemination of personal data is nothing like that, everything like that, that times a gazillion degrees of freedom. 
And consider this: these types of cycles that we’re familiar with again come from the natural world. There’s only so much water. There’s as much data as we want to harvest. It doesn’t go away if we don’t want it to. It multiplies. It does all the things that you don’t want to do in a living dynamical system because if you did those things, you’d be a virus and even viruses are smart enough not to destroy their hosts.
My point is this: there is a giant underground network that isn’t even a network, it’s a whole goddamn warren of dataflows that aren’t visible. Worst case, they’re intentionally invisible, best case, even if we could make them invisible, they’d still be too complicated to fully grok. (Good job we have computers who might help us grok them, then). 
I have to admit that at the back of all of this there is an itchy feeling that I’m trying to ignore because it is yelling BLOCKCHAIN or rather IMMUTABLE LEDGER SUPPORTING ENCRYPTION which is to say that if you really wanted to restrict access to your personal information to only people you wanted to, then a) you can’t, because as soon as you can read information you can copy it, but b) maybe you can figure out where it came from and who was the gossipy leaker, if you’re enforcing some sort of evidentiary chain. That is, here’s the canonical information, you three entities get to decrypt it using my key, and if you want to share it with anyone else, then it’s signed with your key and my key so if they do anything with it, I know you are also partly to blame for trusting that they’d act as good custodians for my data. I am sure, of course, that many other people, some of whom are very into crypto have probably thought about this a lot. But nothing will stop information from spreading. 
Laws and regulations might help restrict behavior, though. But at the same time, have we learned nothing? Who actually reads all the TOS and Ts&Cs on, say, a credit card? Even the little info-boxes about minimum payments and interest rates now required by U.S. legislation? I might make the argument that maybe if that information had been subject to user testing and user needs (ie: what actually helps meet the need of paying off a credit card and not being stuck?) then those infoboxes might be more effective, but that really is one digression too far into government and user needs in this somewhat unrelated point. 
Anyway, here’s my underpants->gnomes->profit summary. Get industry leaders either of their own volition or through hook/crook regulation and legislation, to:
Produce usable, understandable regular reports about data exfiltration that are based on user behavior (e.g. Spotify asked for your location 24 times in the last 7 days) 
Produce usable, understandable, real-time reports about who that information is shared with, which likely isn’t programmatic but instead required via legislation and publishing the appropriate information over an API (e.g. Spotify has data sharing agreements with x, y, z that it has shared your data with n times this week)
Do the same but for the web: who are the adtech tracking companies? What do they know about you? How often has an adtech tracking company been pinged through your browser today? This week? What does it know? In other words, something like inverse Google analytics but for you. 
I’m under no illusions that anything like the above would change user behavior or corporate behavior on its own or even “quickly”. But it feels like we are blind right now, describing an invisible bogeyman. I would like to know, easily, quickly, the answers to these questions. I may be scared by the answers and elect to ignore them, but I don’t feel like I can make any informed decisions right now without at least some order-of-magnitude understanding. And right now, that understanding is “it’s all bad”. 
Yes I know some of you are going to tell me to use a different browser. Sure, fine. But this is about more than that. 
1.2 Some smaller things
Robinson Meyer noticed that iOS’ FaceID icon references Susan Kare’s original Happy Mac icon, which is just a reminder that my first vintage Mac purchase was merely a gateway drug, and I will probably need to keep going until I also have an all-in-one original Mac. 
You might have already seen the winner, but this page has all of the finalists for the 2019 Illusion of the Year award, which is only partly interesting, and more interesting because it reminded me of a phrase I’d been trying to remember: the neural correlates of consciousness, the idea that you can map 1:1 a particular physical state of the brain to a particular experience of consciousness.
In a past life I co-founded a games studio with my brother that is now very well known for chasing you with zombies, which is to say that I’m still super into game design (and environmental storytelling, and so on), so I am a sucker for this piece on Kotaku pulling together concept art for the video game Control.  
In fact, when I was doing games, gamification (sigh) was one of the things that exploded. One of the things our studio tried to do well (and I think succeeded at!) was smartly using game design to teach, but not in that horrible badge-centric corporate HR course way. Which is why I really appreciate this Knight Labs SQL Murder Mystery that teaches SQL (and also see its Github project). There’s a bunch to note about this, namely that yes: of course it’s hard, but it’s for a specific audience, and it is supposed to teach you SQL. There’s another thought that it prompts which I think is Simon Willison-bait, more recently of Datasettte, which is: what are the good visual tools for interrogating SQL datasets? Or, for datasets in general? 
For later reading, if/when I ever get around to it, so just noting for now: an interesting paper from NeurIPS on the use of indigenous data (sacred waveforms, spiritual data) in machine learning.
Joke’s on you, that was 3,500 words!
If this is the last newsletter of the year, and if you have been enjoying it and you’re able, a subscription is always nice :)
Do drop me a note. I’m not entirely sure I stuck the landing on that long bit about tattletales, but I remember thinking there was something there when I started it, and being only mildly surprised about where it ended up.
And again, if I don’t write again: have a good rest of the year. It’s been a long hard difficult one for some of us, and we get a do-over and another chance in just a couple weeks.
Best,
Dan

                                Don't miss what's next. Subscribe to Things That Caught My Attention:

            Email address (required)

                    ← Newer

                s08e01: Learning the world

                    Older →

                s07e26: Ambient information radiators; butter trucks; trains