“Make No Mistakes” and Other Things That Don’t Work

Andrew Yager

5 hours ago

There’s a running joke online that the trick to getting good work out of ChatGPT or Claude is to tell it to “make no mistakes.” Sometimes it gets dressed up — “you are a world-class expert, take a deep breath, and make no mistakes” — but the punchline is always the same. Say the magic words and the machine stops being lazy. It’s a good joke. It’s also wrong, and the reason it’s wrong tells you most of what’s worth knowing about getting useful work out of these tools.

When you tell a colleague to make no mistakes, it works because they have a setting they weren’t using; they can slow down, check their work, ask someone else, and the instruction unlocks effort they were holding in reserve. There is a version of this in the newer tools — the reasoning models, the ones that “think” before they answer, genuinely do spend more effort when you put them in that mode, and on a hard problem it shows. But that mode is mostly a switch you flip, not a thing you flatter loose. It isn’t that the words have no effect at all; wording changes what a model does in all sorts of ways. It’s that “make no mistakes” doesn’t reliably change it for the better. The phrase adds no real information about the task — it’s “be correct” written at the top of an exam paper, and it earns about the same improvement in the result. The careful mode, where one exists, is engaged for the most part by choosing it, not by asking nicely.

So if flattery and stern words don’t reliably move the needle — and the careful testing says they don’t; they help on one question, hurt on the next, and wash out to nothing you can count on — what does? The answer is less satisfying than a magic phrase, but it’s also the whole game: change what you put in, and change how you ask. Everything useful I’ve learned about working with these tools sits underneath those two ideas.

Why they drift toward the obvious answer

Two things are worth understanding first, because the techniques only make sense once you can see what they’re correcting for.

The first is that a language model predicts likely text, not true text. Most of the time those are the same thing — the likely answer to “what’s the capital of France” is also the right one — and you never notice the difference. But where the correct answer is unusual, or surprising, or sitting underneath a more popular wrong one, the model is working against its own grain. It reaches for the conventional answer because the conventional answer is, statistically, what usually comes next. Ask it something where the obvious answer is a trap, and it will walk into the trap and sound completely sure of itself doing so.

The second is subtler, and it matters more day to day. Once a framing is sitting in the conversation, everything that follows is shaped by it. Ask “is this a good plan?” and you have already told the model that what follows is a plan worth assessing, and you’ve tilted it toward assessing favourably; it isn’t being a sycophant on purpose, it’s that the question set the direction of travel, and now it’s predicting the kind of text that comes after a question like that — which leans toward agreement, toward reasons the plan is fine. I wrote a while ago about how these tools confidently agree on the same wrong answer — models trained on much the same material will line up behind a mistake and none of them will blink. Confirmation bias in an LLM isn’t an attitude you can talk it out of. It’s structural. The question conditions the answer before the answer is written.

Put those two together and you have the core problem. By default, one of these tools tells you the obvious thing, in the direction you already pointed it. That’s useful when you’re right, and quietly dangerous when you’re not — and you don’t always know which one you are at the time.

The question is most of the answer

If there’s one thing to take from this, it’s that the way you frame a question is most of the answer you’ll get back. The same facts, asked two ways, produce genuinely different work. “Review this proposal” and “find the three weakest points in this proposal” are not polite variations on one request — the first points the model at defending and tidying, the second points it at attacking, and you get back different content, not a different tone.

The same facts, asked two ways. The framing is the fork in the road.

Once you see framing as a lever rather than a courtesy, the techniques worth using mostly fall out of it on their own. None of them are secret, and all of them work for the same reason: they change the inputs instead of nagging the machine.

The first is to front-load what matters. Put the brief, the constraints, the source documents — the thing you actually care about — at the start of your prompt. These tools pay most attention to the beginning and end of what you give them and tend to lose the middle; I wrote about why in the piece on why coding agents forget what they’re building. The newest models have largely solved the simple version of this — ask one to fish a single buried fact out of a long document and it usually will — but the moment the task needs it to connect several things scattered through the middle, performance still falls away sharply, and a bigger context window doesn’t rescue it. If the requirement that matters most is buried in paragraph nine of a wall of text, don’t be surprised when it gets quietly dropped. Claude in particular does noticeably better work when you lead with the context rather than dribbling it in over several messages.

The second is to start a fresh conversation when you want something reviewed. Asking a model to check its own work in the same thread rarely helps, and you can now see why — the original framing is still sitting there, it’s already committed to the answer it gave, and it will defend it. A clean conversation has nothing to defend. Paste in just the work, ask for a critique cold with none of the history that produced it, and the read you get back is far more honest.

The third is to send the draft to a different tool entirely — Claude checking ChatGPT’s work, or the other way around. Different training, different blind spots, different habits, and one will often catch what the other couldn’t see. The honest caveat, the same one I made in the hallucinations piece, is that the big models were trained on a lot of the same material and can agree confidently on the same wrong answer; a second opinion is a real check, not a guarantee, and agreement should reassure you rather than settle the matter.

The fourth is the one most people never reach for, and it’s the direct antidote to everything in the section above. Instead of asking “is this right?”, ask the model to find the flaws in the argument. Instead of “does this plan work?”, ask it to argue that the plan will fail. Instead of “check this,” ask what a hostile reviewer would say. You haven’t changed a single fact; you’ve flipped the model from defending to attacking, and the likely response flips with it, from agreement to actual scrutiny. The work comes back sharper because you pointed it at the weaknesses instead of inviting it to admire the strengths.

The fifth is to tell it what good looks like. A vague request gets a vague standard applied to it, so instead of “review this proposal,” give it the rubric you’d use yourself: check it for factual errors, unstated assumptions, financial risk, and anything that won’t survive implementation, and reject it if it fails on any of them. You’re not motivating the model, you’re handing it the test it’s meant to apply — and a concrete test does more for the quality of what comes back than any amount of encouragement. It’s the same principle as everything above: the more of the real task you put in, the less the model has to guess at what you actually wanted.

Go deeper than the search box

There’s a technique that sits slightly apart from framing but earns its place here, because it’s another case of people using the shallow tool when a better one is sitting right there. When you ask Claude or ChatGPT a question and it does a quick web search, you’re getting the fast version — a handful of pages, skimmed, summarised. Both now have a proper research mode (Claude calls it Research; ChatGPT calls it Deep Research), and it’s a different thing entirely. Instead of grabbing the first few results, it works through dozens of sources over several minutes, follows threads, reconciles what it finds, and comes back with something closer to a briefing than an answer.

The difference matters most when the question is one where the obvious answer is exactly the one you shouldn’t trust. A quick search rewards whatever is most popular and most repeated, which is the same failure mode as the model reaching for the conventional answer — the loudest result wins, not the most correct one. Research mode reads widely enough to surface the disagreement, the dissenting source, the detail that the top three results all happened to leave out. For anything where being right matters more than being fast — due diligence, a decision with money attached, a claim you’re about to repeat in public — it’s worth the few minutes. The search box is for what time the shop shuts. Research mode is for whether you should be doing this at all.

You’re the editor, not the requester

The thing the “make no mistakes” joke gets wrong is the relationship it assumes. It treats the model as the worker and you as the boss handing down an instruction to someone who could try harder if they felt like it. That’s the wrong picture. The most reliable way I’ve found to get strong work out of these tools is to treat them like a capable junior colleague — brief them well, take the first pass, then put it through review before it counts, whether that review is a fresh conversation, a different model, or a deliberately hostile reading. They are good first-drafters and unreliable final-checkers, and the whole craft is arranging things so that something other than the first confident answer gets the final word.

So you’re not the boss issuing orders. You’re the editor, and the framing is your real tool. There are no magic words — no phrase that talks a model into trying harder than it already is. What there is — and it’s far more powerful than any incantation — is the ability to decide what the model sees first, and to decide whether you’re asking it to defend an idea or to take it apart. Get those two right and the quality of what comes back changes completely.

Stop telling it to make no mistakes. Start asking it to find yours.

This post was drafted with AI assistance and reviewed across more than one model, which is, as it happens, exactly the technique it recommends. If your team is trying to use these tools well and wants a hand telling the useful patterns apart from the folklore, that’s a conversation we’re always happy to have. Real World Technology Solutions — 1300 798 718.

Why they drift toward the obvious answer

The question is most of the answer

Go deeper than the search box

You’re the editor, not the requester

Share this: