s20e02: A Model for the People
Monday, 11 August 2025 in Portland, Oregon where I am sitting in my basement office. In here, it’s a comfortable 71f/21.7c. Outside, it’s apparently 96f/35.6c which is not comfortable. That is why I am inside.
Me and the smaller kid went and saw &Juliet the Musical the other night as a father’s day present. Laughed all the way through.
0.1 Hallway Track
While there are no Hallway Tracks coming up, I have three planned in my head that I’m super excited about. Hopefully something coming soon.
1.0 Some Things That Caught My Attention
1.1 A Model for the People
Here’s a thought experiment. I’ve written about this before, but it came back into my head recently when I told Ted Han about it a few weeks ago and he said his mind is still blown by it.
National copyright libraries should have been entrusted with building public ML datasets and models.
There you go. I’ll walk through it.
- They’re supposed to have it already
Broadly, national copyright libraries exist as the expression of copyright policy. They’re entitled to a copy of everything, even if they sometimes don’t actually have possession. Because copyright is a social policy, they are a method to make material available to everyone. They are entrusted and, so far (*looking pointedly at Washington, D.C.) have been trusted independent institutions.
- They set copyright policy
You know, how copyright works in practice. They can make rules like when you’re allowed to circumvent protection methods.
- They are supposed to be for everyone
I made this point earlier, but it bears repeating.
I like referring to and reminding people of the reasons for things existing, especially when it can be used to bind toward a particular direction or outcome.
So you have an institution that, broadly, “has all the books”. That’s a good starting point.
That institution decides to train a model on “all the books”. Yeah, you can drive a truck through the practicalities of “train”. Bear with me.
That institution decides to make the resulting model/weights/etc freely available to the public but, say, preferentially available to Persons with status in the institution’s state.
Everyone benefits. Anyone and everyone can use the model under the same terms.
The model can be presented as a public good that everyone contributes to and that everyone can benefit from.
Benefits can accrue from applications built on the model, through to actual revenue based on model usage. Sure, everyone’s output (including creative output) is going into the model.
But maybe usage is taxed, and, well, that’s tax revenue. Everyone benefits from tax revenue, everyone gets a say in what it goes into. Hell, it goes to NEA, NIH, whoever.
I don’t know. Not a national oil dividend/fund, but a national data dividend/fund.
The point is community equity and accessibility. A library as a repository of knowledge and creative expression of the members of a polity, held in trust for that polity.
I mean it sounds good to me?
Now, an argument here is that the government shouldn’t be doing this kind of thing because it requires lots and lots of investment because nobody would do it on their own.
But organizations did invest a shit-ton of speculative money in it because, I don’t know. It would let them make more money. They are in a race with each other. It offers more control. Reinforcement of monopolies, of gate-keeping.
Those last two things don’t sound great to me. But it’s been the general policy of the U.S. (let the market compete and discover, it’s better at it) as opposed to Europe (set a top-down standard and requirements in order to enable a competitive market).
Fine, how much money would it have taken? It’s not like governments have lots of spare money.
Ah, but that’s the thing. You decide to. That’s it. Same way you decide to spend more money on defence. What you spend money on is a reflection of what’s important. If it’s important enough, you find a way. That’s it. Nothing more complicated.
(A comparison here might be “the People’s Model” as the output of basic research, and governments fund (ha) basic research because they are bets that can be transformatively enabling. Ugh, what a sentence. You know what I mean)
So. At some point in the last ten years, the government would have had to have decided:
- Hey, this generative AI stuff is pretty interesting.
- If there’s something here, it could be a national asset that multiple sectors could benefit from.
- What, we need to do policy wrangling too? Sure, we can do that. We’re the government.
- OK, let’s do it. Let’s buy all the GPUs. Let’s label all the data. Let’s hire the people.
Fine, I’ll play the other side. Who are you going to hire? If this is really a big deal, the private sector is going to want to do it, and hell, did you read how much Meta is dangling in front of people to poach them for its superintelligence group?
My counter is ideology and idealism because I choose to believe enough people still believe in the common good.
Fine, you’re declaring that you’re effectively seizing all this valuable output. Nationalizing it, even! Disney is pissed! Whatever giant entity Penguin is now is also pissed! Great. Look I’m not saying charisma and persuasion doesn’t play a part here. This is the lobbying part, and a political part. Everyone benefits.
“But no!” cries Disney. The model will have Mickey in it! Anyone will be able to make a Mickey do unspeakable, dirty, dirty things!
OK, and? They can do that now.
“We have to stop them!”
Fine, go ahead and stop them. You could make a ton of Harry Potter stuff. Who’s going to stop you? Oh right. Rowling (spit) and Max HBO Warner Bros Discovery. Sure looks like you could spend a bunch of time suing each other if you want.
“But children!”
Yes, children can certainly draw cartoons of Mickey doing horrific, dirty, unspeakable things. Hopefully you’re not asking us to infringe upon their freedom of expression?
I mean if only there were ways to coordinate self-interested parties into behaving in a way that might benefit all of them? Surely such a thing couldn’t be impossible.
OK, fine. We’ve wrangled stakeholders and lobbyists. Maybe this even has Support at the Highest Levels!
You need a lot of GPUs? Billions of dollars worth? Sure, government doesn’t have any track history of spending billions of dollars on things.
I mean, until this whole A.I. thing came along you know who was buying the most FLOPs in one place? The government!
(It occurs to me that another play you could make, in national economic interest of course, is that you could require all car manufacturers selling vehicles in the U.S. to contribute training data to a national autonomous driving model that U.S. manufacturers could benefit from and that may or may not be made available to non-U.S. manufacturers. Huh. And where else are all the training miles being driven, hm?)
Oh, you need pork? Senators, please line up to require datacenters to be build in your districts. Please line up to request foundries for your states.
(I find it funny that this might also be a lever to wrest the CUDA moat from nVidia)
The window to do this was 2012-2016. Andrej Karpathy’s Unreasonable Effectiveness of RNNs was published in May 2015. Sure this is the benefit of hindsight, but I would say that then, someone forward looking enough could see the hint of something and start putting together policy. Hell, without looking it up I’m half sure there already was national A.I. policy by then. I am not surprised that it would’ve been industry-led. Something for the people would have been too disruptive. How dare the government throw its weight around and choose favorites?
(What favorites? Isn’t everyone getting access to the data, the models, the open weights?)
You’d have to be careful, though. Don’t impose too many conditions under which the models may be used. You’ve got to be impartial. You’ve got to make sure you stay to true to the aim of access for everyone to benefit everyone. But it’s not like private entities impose conditions upon which their models are made available or may be used.
It would be a Manhattan project, people would say.
It might very well look like that. But are we not already in a nation-state race?
Anyway, we’re only talking about training. The national library could train a model, but it could also provide the dataset for anyone else to train a model, too. Or to fine-tune the model that’s made available. And you still need a bunch of compute for inference. There can still be a race. Meta and Microsoft and everyone can still use all their free cash flow and borrow money for the billions of dollars of capex.
I’m only sad that it feels too late for something like this to have been tried and talked about at the policy level. I mean, it’s not like any other western, English-language government was busy putting together its A.I. strategy either.
2.0 Sponsored Content: How People Work
Hello. It’s me again, sponsoring my own content.
Even though you know how to build great software, you might still be getting stuck. You need to know how to work with other people -- but we don’t really get taught how to do that.
How People Work is my four-week workshop teaching teams in the tech industry the skills they need to work with other people.
Get the decisions you need to get things done by learning how to become more influential.
Get the answers you need faster, spot where people are lying (and then get the truth) by building stronger relationships.
Get heard, get your point across, and reduce miscommunication -- which means less risk! -- by learning how to communicate clearly to different audiences.
And get focussed, unstuck, and remove distractions by learning and applying strategy and tactics in ways that work for you.
It’s four weeks. We spend three hours a week learning how people work, and then three hours a week applying what we’ve learned on your real work, together.
Grab a quick chat with me if this sounds like something you need, or send this along to someone if you think it could help them.
OK, that’s it! I experimented with writing the body for this ON A DIFFERENT DAY, almost as if I’m breaking my rule of writing something and sending it on the same day. It seems this might be healthier?
How are you?
Best,
Dan
How you can support Things That Caught My Attention
Things That Caught My Attention is a free newsletter, and if you like it and find it useful, please consider becoming a paid supporter.
Let my boss pay!
Do you have an expense account or a training/research materials budget? Let your boss pay, at $25/month, or $270/year, $35/month, or $380/year, or $50/month, or $500/year.
Paid supporters get a free copy of Things That Caught My Attention, Volume 1, collecting the best essays from the first 50 episodes, and free subscribers get a 20% discount.