s21e03: The Problem is Defining the Problem; The Unreasonable Effectiveness of Caring
0.0 Context Setting
I am sneakily writing this on Wednesday, January 14, 2026 in Portland Oregon, where I just finished writing the previous episode.
It is taking a lot of energy to deal with my impulse control to post this straight away, and what I’m going to do instead is schedule it for Thursday, just to annoy Pavel and beat his issue of The Product Picnic1.
1.1 The Problem is Defining the Problem
I’m not, personally, worried too much about AI completely devastating my work. I may be completely misplaced in this assumption, though.
There’s a big fight going on about the usage of AI in developing software. There’s a big continuum here!
On one end you have my friends Jesse Vincent and Simon Willison, who are more or less forging ahead at using agentic coding (i.e. LLMs for writing software, commonly Claude Code) to become even more irritatingly productive than they already are, a sort of Captain America-serum supercharged already-10x developer.
The thing is, you have to be so precise about what someone is using generative AI for these days because as a recent saying goes, AI progress and capability is jagged, which is another way of saying “all over the heckin’ place”. They can do some things really well, and are stunningly bad at other things (those other things include stuff like “how many rs are there in the word irresponsible”).
If you recognize that you’ve got a tool with wildly different capability in different domains, then you also recognize that you have a capricious tool and you need to know what you’re doing to not shoot yourself in your foot or deglove yourself or whatever. (Don’t look up degloving if you don’t know what it is, trust me.)
Simon and Jesse are experts with decades of software development knowledge. I also understand them to be systems thinkers - they know enough about how everything works. In temperament as well, they’re curious but also untrusting. Not everyone is like them. The point here is that the tools aren’t easy to use well, where “well” means producing software that meets certain criteria. What Simon and Jesse are also good at is deciding what those criteria are in ways that are appropriate to the task and context. This is also something that not everyone is good at.
I think “deciding what well is” is an inherently human skill that is very difficult for AI to replicate well enough consistently.
Iris Meredith wrote this piece that Kevin Wriggle quoted2:
More generally, what I got from this is that LLM-assisted coding is only more flexible and more chill than doing the thing manually if you don't care about results at all. The moment you start caring about a specific output rather than something vaguely output-shaped, it all of a sudden becomes a whole lot more rigid and finicky than just writing the thing manually. And that's quite the opposite of what LLM assistants promise.3
In case it’s not clear, the jaggedness and unevenness of the tools means that I disagree with Meredith, to a degree. I’ll rewrite with emphasis in a way that I hope doesn’t invalidate Meredith’s point:
More generally, what I got from this is that LLM-assisted coding is only more flexible and more chill than doing the thing manually if you don't care about results at all. The moment you start caring about a specific output rather than something vaguely output-shaped, it all of a sudden becomes a whole lot more rigid and finicky than you just writing the thing manually. And that's quite the opposite of what LLM assistants promise.3
I think the point here is that agentic coding/LLM-assisted coding (and therefore I think all LLM tools in general) is that they are not general purpose tools. Or, rather, the domains in which they are general purpose tools that are good enough for people are also uneven, but that’s not exposed when you’re “using ChatGPT”. This problem is also obscured (intentionally?) by the chat-based interface, which provides no real indication or feedback about the specific things the LLM is good at -- there are no task-specific UIs for general agents.
I think what Simon and Jesse have done is to massively customize the tools for their own particular usage. That’s on a different end of the continuum of using LLM-assisted coding out of the box. They just aren’t ready for mass use yet.
This is not a criticism, in any way, of Meredith’s capability or experience. And at the same time maybe the insight isn’t particularly deep and it’s just that I came about it another way: if you don’t know how to use a tool, then you won’t use it well and you won’t know when you’re using it dangerously. I’m quite happy to place irresponsibility here on the lawn of the product marketers, who are presenting these as general purpose products. I’ve written before about how egregiously irresponsible OpenAI is in hiding what you shouldn’t use ChatGPT for in fine print, (in a filing cabinet, disused toilet, basement, beware of leopard etc) in its terms and conditions of usage. Bullshit, I say.
So I think I disagree as to the utility of LLM-assisted coding agents in that finicky lawyerly way of “okay, but here are the instances in which they’re clearly useful and some people are clearly deriving benefit”.
I will lay aside (yes, I know), the immoral manner in which the data for these agents has been gathered/stolen/appropriated under a political and regulatory environment that outright encourages such stealing and also now that it’s happened wishes to preserve those interests “Oh no, we’ll go bankrupt if we have to compensate artists” and “well we couldn’t possibly let that happen”. I will acknowledge that yes, we have legislated to stop doing things before, in the face of incredibly powerful lobbying, when indoor smoking was banned and that yes of course we can just do things.
Anyway, I digress, but also do not intend to dismiss a stupendously big societal elephant in the room.
Meredith wrote about caring about a specific output, rather than something vaguely output-shaped which is not an LLM-problem (which Meredith makes clear) but a human problem. Humans -- people -- are the ones who ultimately set goals. There is nothing else in our entire experience right now that can do this. I mean we could decide that dolphins are sentient and collaborate with them to set goals, but like I said “in our entire experience right now”. In that same conversation with Riggle, Riggle said:
As I keep saying, human understanding will always matter
and to which I said:
To the extent that we exist and tools don’t exist in a vacuum, yeah! There is no no-human-in-the-loop, it’s like some sort of anthropic principle for technology.
The argument, again, here is that technology inherently has a bias because humans have a bias and like the theory of the anthropic principle (the universe only exists because we are here to exist, may as well call it the eogist principle), technology and tools don’t exist without people.
The first mover is always a human: a human aways, always, always started it.
What I mean here is that we define goals, which in this case means what the software is supposed to do, which means we define outcomes. We define and set the parameters of the outcomes. Like I’ve written before, those parameters define what’s good enough, what trade-offs exist because everything is a trade-off. Everything is a negotiation of what you will accept, and in software development that’s normally boiled down to fast/cheap/correct, and you can only realistically pick one.
The problem is that figuring out what we want is one of the hardest things in the world, and I don’t just mean in the personal “what is the point of my life” sense. There is always more detail. There are always more edge cases because the universe goes on. On one level you can say because of entropy, on another level you can say it’s because you don’t control everything, on one more level you can say it’s because humans are batshit crazy and will come up with a whole bunch of ways to do things that you never conceived of which, to be fair, is one of the strengths that got us here in the first place.
Now, one way that we invented to deal with the “we don’t know what we want” problem in software development is this whole agile development process. Which as a refresher is roughly along these lines:
- we don’t know what we want
- but if we’re given something, we can provide feedback on it (I don’t want that, why did you do that?!)
- we can take that feedback and then change the thing
- repeat until the heat death of the universe
What this ultimately emphasizes is that a practice of iteration/repetition allows the possibility of improvement.
I was very careful there to say the possibility of improvement, because humans in general are lazy which is totally fine and not a value judgment, and when we’re lazy we tend to skip steps and think that iteration/repetition automatically leads to improvement. Which it doesn’t, not if you put it that way.
When I work with people and we really slow down then I’ll ask: well, what improvement are we looking for?
And invariably people get a bit stumped there. One common reaction (but a minority, to be clear) is a bit of anger and lashing out because they think they should know the answer and isn’t there an obvious answer. The reason why they’re angry is that there isn’t a good (sorry, a useful) answer.
My pithy version of this insight [sic] that I use in my work is to rephrase as:
Agile is a great way to do the wrong thing faster
which a) sticks in peoples’ minds because it’s pithy and I guess mean? and b) is an opening for a bunch of people to admit that nobody (or leadership) knows what it is they’re supposed to be doing here in the first place in useful detail.
Agentic coding -- using LLMs to write code -- is certainly a way to write more code more quickly. The argument I’m piecing together supported by Meredith is that writing more code more quickly is useless (or even worse, actively harmful) towards achieving your desired outcome.
You achieve what you want by being able to define what you want. Defining what you want is hard. In the language of AI researchers, defining your problem is something that nothing, nobody, not ever can ever one-shot (i.e. get right first time) because part of defining what you want involves almost limitless context to determine what’s good enough and hoping that matches up with what you think is good enough, dealing with the difference between the two, and then a human making a decision.
Many people don’t, in general, like making decisions because they might be wrong and then they are shouted at by mean people. This is a genuine problem holding us back as a society, I am being absolutely sincere.
I concede that LLMs might be able to produce a range of good-enoughs. I see this with Claude Code when it’s in planning mode and I explicitly ask it to provide me options with different trade-offs and then explore those trade-offs so I can think about them. I don’t want the one thing to do. Or rather, there are cases where I know enough that I don’t want the one thing to do. There’s an argument here that at one level the implementation of software doesn’t fundamentally matter and your choices are more a matter of taste, and that in different areas the architecture and implementation do have a much more significant impact. Knowing that comes with experience, or at least something that has access to experience. We used to also call these things expert systems, but it turns out the lazier way to do that was to just throw the entire internet at them and assume that means expertise just sort of spontaneously emerges. (Seriously).
So. What might AI coding be good for?
Well, in small cases where you don’t care as much about correctness and for low-stakes sure, it’s fine.
If you want to use it for prototyping something that you don’t use in production, sure, because then you can prototype something more quickly and might realize earlier that you missed a critical thing. This is good! You have more things to provide feedback on to interrogate the tradeoffs and decisions you have to make.
One way of thinking about this -- and I am thinking out loud -- is that agentic coding is a bit like a really inefficient way of modularizing software development. One way to think about this is Python or Node where there are modules or packages for anything and you can reuse code someone else has written. When done this way you have some assurance (ha, supply-chain attacks) that the code you’re using works, there are tests and so on. Agentic coding has the sort of ability to provide modularization of functionality, but also in a non-deterministic sense. Instead of importing a package you can totally be lazy and tell an agent to implement it for you. That is probably not a good idea because you’re losing the ability to reuse something, you’re just regenerating something. But if it looks like it does the job and it does the job well-enough and it was easier to tell an agent to regenerate that functionality then I’m afraid humans are going to do that. It is stupid and it is inefficient but the impulse is understandable, at least.
More experienced people will use tactics like tell an agent to use an existing package or have an idea of what package they already want to use, or even implement a package that can be reused in a particular project, plus have the ability and presence of mind to check whether that package has actually been reused. And again, that’s if they’ve decided it matters.
I suppose one equivalent aspect is if you can trust an agent about information retrieval. It is easier (but with no guarantees about accuracy or correctness) to ask a coding agent for best practices in the same context in which you’re working than to go look up what OWASP standards, read them, and then translate them back to the work you’re doing. But that’s if you can be sure that the agent is returning the actual OWASP standards. Are you going to check? Well if you’re going to check, why not just go read them yourself in the first place?
But a lot of functions are common. A lot shouldn’t be reimplemented. And that “shouldn’t” is a mixture I think of taste and a mixture of experience and capability. Some people can keep and develop good best practices in their head and know when they’re appropriate and when they’re not according to the circumstance and context. For those common features, I don’t know, things like reference architectures that can get you started, things like Django that Willison co-created, you just use them because they fit the problem you’re trying to solve.
In what circumstances is it good enough to use a coding agent to step into the place of using existing packages or frameworks? An example question I might ask is: Hey, I want to make a tool that my family wants to use. How should I go about it? And I think a problem right now is that unless you’re someone who has the knowledge to say “don’t fucking overengineer it and build me a fucking react app, it just needs to be some html and javascript” (which is what Willison does!), then you’re likely (I think!) to get something that is kind of right but also not.
Sure, maybe what you get is good-enough. But I suppose the point that I’ve been trying to make for the last 2,600-odd words is how you decide what good-enough is. Humans decide that using “experience” which is the sum total of, I don’t know, everything you’ve ever experienced? And then some sort of decision-making process that figures out what to prioritize and what to de-prioritize. My standing point right now is that no model has anywhere near the capability to match a human in terms of that experience. A lot of discussion here is about world models and embodiment which is to say that if you want a machine to make suggestions (or worse, decisions) about things that affect humans, then it would be good for those machines to have the best understanding of the world in which humans live. The pitch from AI companies right now is that “good enough” for that world model is “whatever you can access on the internet by hook or crook”, and I think that’s a combination of a business decision (driven by, you know, wanting to make money) and a particular mental outlook and philosophy of a particular temperament which we may as well loosely label as “overly STEM-focussed and reductive” that of course the internet represents the sum of human experience and is good enough. Clearly there are people who disagree. I am one of them. And yes, I will acknowledge that some of the things on the internet are video of things that happen in the real world, but all of that video on the internet is the result of a human editorial decision about what’s important enough to upload even if it’s mundane.
Look, let’s segue. It’s not as if this isn’t all over the place anyway.
1.2 The Unreasonable Effectiveness of Actually Caring and Giving a Damn
I am fighting a losing battle against good-enough. I wrote about it not too long ago4. Dan Sinker wrote eloquently (i.e. in a Dan Sinker-y way) about it in The Who Cares Era5.
I was talking today with an organization design professional and the perils of being a systems thinker. She was talking about seeing baggage claim at an airport and realizing that it could be organized in a different way so people wouldn’t have to wait so long. I talked about how I have a hobby of taking pictures of signs people have to put on things, like on the inside of the door at a hospital toilet where a makeshift sign very clearly tells you how to lock the door, because how you’d assume locks it totally does not.
Here’s a story about the door. The door gets bought because the building needs a door and the requirements for the door in this case are, say, “is a powered door suitable for an accessible single-user bathroom” and then presumably the physical requirements. Maybe an architect puts these requirements together or the general contractor and then the client signs off on them and then ta-da you’ve got a renovated ER wing.
But the door is a shit door because hey guess what -- now, this is a bit facetious -- nobody thought about how a door is supposed to work and made a bunch of assumptions. I get we’re all busy. I get we’re all burned out. But I genuinely believe that if someone actually gave a damn and had thought things through then the nurses (invariably this stuff is done by the people who have the most contact with the public using a service or the users or whomever) wouldn’t have to make a fucking sign and keep making sure that the sign is visible. Just get a fucking door that works properly.
It is horrible that it feels like a luxury to be able to care in your work. No, wait. People do care. It is horrible that it feels like a luxury to actually care in your work.
I mean, it’s not as if you’re working at the world’s largest retailer that also happens to be obsessively focussed on the customer. Because that retailer definitely shows signs of caring.
YES. I beat the last newsletter episode and we’re at ~3,300 words for this one.
I am that particular combination of angry and motivated, which I’m reminded is the point of anger in the first place: something is not the way you want it to be and now you have some energy to deal with it. (Hopefully productively, usefully, and without harming anyone. If not, go get some good therapy).
How are you?
Let’s fix things together
Aside from my regular consulting, I also do team workshops and individual coaching based on the workshop curriculum. Get in touch if you’d like to find out more about how to spend that newly reset professional development and training budget you’ve got.
-
Home | The Product Picnic (archive.is), Pavel Samsonov ↩
-
(3) Post by @kevinr.free-dissociation.com — Bluesky (archive.is), Kevin Riggle, 13 January 2026 ↩
-
My week with opencode | deadSimpleTech (archive.is), Iris Meredith, 13 January 2026 ↩↩
-
s20e09: An End Of Year Opinion About AI Because Why Not; Good Enough Mitigation of Reasonably Foreseeable Harm (archive.is), me, 29 December 2025 ↩
-
The Who Cares Era | dansinker.com (archive.is), Dan Sinker, 23 May 2025 ↩