s07e21: Personal Data Gossip Networks
0.0 Sitrep
Hello, I’m back.
Friday, November 22 back in Portland, Oregon. I was in Missouri last week on a sort-of vacation, dealing with the farm we prematurely inherited. If I’m completely honest, I’m probably dealing with some sort of horrendous freezing anxiety attack so I’ll see if I can write my way out of this one.
1.0 Some things, etc.
1.1 On data
For various reasons, I’ve been thinking about “data”. There’s a newsletter section I have in draft I started a couple weeks ago along the same lines, but this one is subtly different.
Here’s the first thought:
No decisions without data.
This one feels dumb and obvious, so I feel the need to explain it. Of course you can’t make decisions without information! But I feel like the implications of scale are the ones that aren’t intuitive. People have talked before about data exhaust and plumes and so on, but I don’t think there’s a good folk understanding of what we mean when we say data as it relates to how people live in the world right now.
What I mean by this is: when you’re thinking about any function of government or administration, any decision needs to be based on information. Even something like UBI, which purports to reduce the number of decisions involved in social programs (like figuring out if you’re worthy or deserving — sorry, “qualified” — to receive assistance) by just giving you the money still presumably needs to keep track of whom money has been given to, at the very least to make sure you don’t accidentally give it to the same person twice. That would not be good!
So we need to accept that nothing really happens without information, whether that’s a government entity or Facebook or whomever we’re concerned about these days.
But! One of the things I’ve been thinking about is how, in general, data just keeps piling up.
There are data lakes. Data mountains. Warehouses full of the stuff.
There’s a default belief in VC-style tech, I’d say, where the more data the better and the thought that I’ve been noodling about of never throwing data away because it’s just so (relatively) cheap to store these days.
And who knows, you may as well hoard the stuff because it could be valuable in the future. It’s not like it’s going to become less valuable, right?
There is a big black hole here. What if we thought about the opposite? What pressure could create this incentive: What’s the absolute minimum amount of data or information you can get away with, while also delivering a good enough present decision?
This isn’t about economic value, and this isn’t about potential future value. This is a negotiated consideration of what does a system or process need to know to make a good enough decision right now.
I remember now (you can tell that this is stream of consciousness stuff) that part of why this is stuck in my head is the news about how Apple’s credit card has its credit decisions outsourced to Goldman Sachs, and that Goldmans “algorithm” is clearly off due to weird (sexist) decisions and assumptions about peoples’ creditworthiness.
One part here is being able to understand what sort of information goes into making a decision. Or maybe not even being able to understand it, but being able to be told what sort of information goes into a decision. In other words:
Should you have a right to know a complete list of all the information a bank uses to make a lending decision? Right now, you have access to an abstraction of this information: a credit score. Is it important to society that we have access to how that score is computed and what information goes into making that score?
There’s a difference between knowing the types and categories and underlying entities providing information. On the one level, there’s “we use your rental history”, on another, there’s “we used your rental history from these people/entities”. There’s “we talk to your utility providers” and there’s “These are the exact utility providers we talked to”.
My folk understanding is that if I looked at the TOS for companies or entities that make decisions about me, they’d list some examples and want, to cover their asses, to make clear it was a non-exhaustive list. They would give examples of things, but not the specific things. Their lawyers would say that this would give them wiggle room so they can update their business practices without needing to update the TOS every single time.
I do not think, in general, that requiring companies to disclose all the sources of information they use to make decisions about us would make a practical difference in behavior in the short term. I think the value would be in freaking people the fuck out, that there’s so much information out there. I think people understand this in theory, but not in practice. And I think it would be useful — although hard work — for society in general to have a better understanding of this.
Which led me on to the next thought:
Visible gossip networks
Over the last few years there’s been more public talk about “data brokers”, especially since Cambridge Analytica-style political and privacy scandals and the continued lack of clarity about whether all of this devices we carry and live with are actually listening to us to provide us with really unhelpful “targeted” advertising.
I wonder what it would be like if we had a better understanding of what products and services did in terms of gossiping about us.
We don’t like it [citation needed] when people talk about us behind our backs, especially when they talk about things we shared in confidence. This feels like a small-groups-of-humans social behavior.
What we have now are large networks of entities that do gossip behind our backs. They learn everything they want to learn about us, and then they sell it to each other, for whatever nefarious (or altruistic!) ends they may have.
But those networks are invisible. I can’t for the life of me imagine how I might go about something like this:
Me: Hey, Facebook. Do you know how much I spend on groceries every year?
Facebook: Oh, ha. Yes. Well. Caught me red-handed there. Yep. I know that.
Me: Well I wish you wouldn’t do stuff like that, but it’s not like I have any negotiating power with you anyway. At least you could tell me how you got it and who from?
Facebook: Well I guess since you had someone subpoena it from me and talk about it in testimony I guess I’ll tell you: I bought it from Dave.
Me: Dave…?
Facebook: Yeah, Data Broker Dave.
Me: Do you have Dave’s phone number?
Facebook: Yeah, got a pen?
Me: Oh hey, I’d like to speak to Dave.
Dave: Who’s asking?
Me: Yeah, it’s me. Facebook gave me your number. Says you sold him some data about me? About my groceries?
Dave: Oh, OK. Sure yeah. I know about groceries.
Me: OK, well… where did you find that out from?
Dave: Well! A bunch of people actually. I called up Amy who lets me know what you spent on your Amex. And then there’s Nicole who knows everything about what’s being sold at the grocery stores. But she only does one grocery store, Nick does the other one…
It would be simpler, I would think — and there would be a visible, non-externalized cost — if TOS didn’t just describe what could be done with your data in legalese but if we were shown what could be done with our data in legalese.
Look, it’s not like I’m saying being presented with a directed graph network of information flows and being able to see that annotated with the types of information is going to fix everything (obviously not), but we don’t even have that information. We don’t even really know where Facebook is buying additional data about ourselves from, or what kind. When I go to the grocery store, all I might know is that they “share data with partners”, but it strikes me that in this day and age, I might want to know:
a) which partners
b) what data
Because they’re gossiping, that’s what.
And this doesn’t even get in to third party APIs and libraries embedded into mobile applications, products and services that make it easy to de-anonymize and produce fuller records of users.
So. That’s today’s thing. Gossip networks for data.
People might have a vague idea that they’re leaving a cloud of data behind them, some sort of exhaust. But if we mapped that exhaust and made it visible, then I suspect we might freak a bunch more people out.
I want a transparent credit report for the data that’s being shared about me. Not just “what data Google has about me”. I want to know who’s buying and who’s selling and who’s giving it away for free in exchange for favors.
OK. That wasn’t too bad. Heart didn’t completely explode. I do think the above was a bit of a horrific mess, but a horrific mess is slightly more valuable (sigh, valuable) than nothing at all, so I guess there’s that.
I’ve been away for a couple weeks - how have you been? Are you as tired as I am?
Best,
Dan