The moves of rigorous thought — steelman a position, surface the hidden assumption, falsify your own conclusion, verify the load-bearing fact, apply the same standard to both sides — were never lost. Every tradition that had to think reliably found them independently. The Toolkit collects the convergent core of all of them, in a form designed for the era when running them costs only a chatbot question.
The idea in one paragraph
The Enlightenment wasn't a set of conclusions. It was a set of techniques — the moves you could make to think more clearly. Older traditions called them practices; medicine calls them protocols. Same thing.
These techniques still work. The problem was always that they were expensive: doing them properly used to require years of training, a library, and a circle of sharp critics willing to argue with you. LLMs collapsed that cost. A serious sparring partner is now one prompt away. The toolkit below is the small set of moves that genuinely changes the quality of an answer when run consistently.
The toolkit at a glance
Eleven moves. Scan the list first; the detail follows. The first two set how you work with the LLM; the rest are specific moves you reach for as needed.
- Ask for logical, not rhetorical — set the mode
- The Socratic method — pointed questions, not open prompts
- Steelman — build the strongest case for each side
- Red-team — try to break the case you find compelling
- Assumption archaeology — surface the unnamed premises
- Definitional clarity — pin down terms before arguing
- Verify the facts — check citations and statistics, don't trust them
- Symmetric application — same standard for your own side
- Strip-and-check — remove moral framing, see if the recommendation still follows
- Specificity demand — force abstract claims to concrete instances
- Pre-commitment falsifier — name what would change your mind
There are a hundred things you could do to think more clearly. These eleven survived because each one is cheap (minutes, not hours), each catches a specific common failure rather than the useless "be smarter," and they stack — using two together is better than either alone, and using four is dramatically better than two.
The moves in detail
1. Ask for logical, not rhetorical, answers
This is the meta-move that sets the frame for everything else. LLMs default to a rhetorical-balanced mode — symmetric caveats, "on one hand / on the other," soft conclusions, hedges sprinkled in as if they were precision. It sounds careful. It usually isn't. It's averaging over what a thoughtful-sounding answer looks like.
Explicit invocation flips the model into a different processing mode:
"Answer this logically, not rhetorically. Commit to conclusions where the reasoning supports them. Don't insert balance for its own sake. If one side is stronger, say so."
The same model, on the same question, gives a materially sharper answer with this prompt than without it. It costs one sentence. Use it at the start of any serious conversation; it sets the register for everything that follows.
2. The Socratic method (with an LLM)
The oldest move on the list — Socrates, 2,400 years ago — and the one LLMs make absurdly more powerful than it has ever been. Instead of asking the LLM for a conclusion, ask it pointed, specific questions one at a time, follow the answers, and let the inquiry unfold.
It works exceptionally well with LLMs because of an asymmetric coupling: each side brings something the other lacks.
- You bring: intuition, taste, stakes, direction, the sense of what matters, the "something feels off about this" signal you can't quite articulate.
- The LLM brings: a vast retrieval base, no fatigue, no offense at being redirected, no social cost to asking a stupid question, and increasingly — direct web access to verify a claim while you're still in the conversation.
Pointed questions push the model into retrieval mode, where it's much better than in generation mode. "Find the studies on X" is dramatically more reliable than "tell me about X." The discipline is in the question construction: specific, premise-targeting, willing to be told the premise was wrong, chained so each answer narrows the next question.
The intuition-flagged variant is one of the highest-return moves available: when something feels off but you can't say what, say so.
"Something about that answer feels off — I can't pin down what. Work backwards: what's the most likely thing I'm reacting to? Find it."
The LLM is much better at pattern-matching to what would trigger that unease than you are at articulating your own unease. It usually finds the thing inside one round.
3. Steelmanning
Before you attack a position, build its strongest version — better than its actual defenders usually manage. Then attack that.
"Steelman the case for [position I disagree with]. Give me both the obvious arguments and the strongest non-obvious ones — 7 in total."
You want the obvious arguments too — they're usually obvious because they're load-bearing. The non-obvious ones are a bonus, not a substitute. You'll typically find one or two of the steelman arguments are uncomfortably good. That's the point.
Double-sided, or multi-sided: for two-sided questions, steelman both sides. For questions with three or more real positions (most foreign-policy questions, for example), steelman all of them. The bilateral for-or-against frame distorts questions that don't actually have only two sides.
4. Red-team
The opposite move: take a position you find compelling — including one you just got from an LLM — and try to break it.
"Red-team this answer. Strongest objections, then strongest defenders, then honest synthesis. Be specific, not balanced."
Steelman and red-team are the same skill pointed in opposite directions. Use both.
5. Assumption archaeology
Most arguments don't fail at their conclusions — they fail at premises nobody named. Dig those up.
"What assumptions does this argument quietly depend on? List the ones that, if false, would break the conclusion."
About half the time, one of the unnamed assumptions is the actual disagreement. The "argument" was theater on top of it.
6. Definitional clarity
Before arguing about whether something is "fair" or "natural" or "extreme," pin down what you mean. Most political and philosophical disagreements are definitional, not factual — but they get fought as if they were factual.
"By X I mean Y. Under that definition, here's my claim."
If you can't define your terms, you're not making an argument, you're making a mood.
7. Verify the facts the argument rests on
An LLM can produce a perfectly valid argument built on a fact that doesn't exist. The argument is structurally sound; the load-bearing citation was hallucinated; the conclusion is therefore made of air. This happens often enough that verification is its own move, not a vibe-check.
- Ask for specific sources — paper titles, journal names, dates, authors.
- Then verify them. Either ask the LLM to fetch and quote the actual source (most current models can browse), or click through yourself.
- Pay special attention to the load-bearing claims — the statistics, named studies, attributed quotes, and specific empirical findings the argument depends on. The rest matters less.
Two patterns worth watching for: citations that exist but say something different from what the model claims they say (more common than outright fabrication, and harder to catch), and impressive-sounding statistics with no specific source (almost always confabulated). Run the move twice on contested claims: Pass 1 checks the citation exists; Pass 2 checks the source actually supports the framing the model claimed.
8. Symmetric application
Apply the same standard to your own side that you apply to the other side. If a behavior is bad when "they" do it, it's bad when "we" do it. If a piece of evidence is suspect when it supports the other position, it's suspect when it supports yours.
This is the single biggest filter on motivated reasoning — and the move most easily skipped. Run it throughout, not after. The toolkit applied to one side only is motivated reasoning with better production values.
9. Strip-and-check
When an argument leans heavily on moral language — greedy, exploitative, oppressive, sacred, natural, extractive — strip the moral words out and see if the recommendation still follows from the bare structural facts.
- If yes: the moral framing was decoration. Fine.
- If no: the moral framing was doing the actual work, and it needs separate scrutiny. Who benefits from this framing? Does it apply symmetrically? Whose agency does it erase?
Worked example, left-coded framing. Take: "Greedy landlords exploit tenants by raising rents, so rent control is justified." Strip the moral language: "Landlords raise rents to what the market will bear, so rent control is justified." The conclusion no longer follows automatically — it becomes a real policy question (does rent control actually help tenants on net? what does the evidence say?). The words greedy and exploit were doing the work that made the recommendation feel obvious.
And from the other direction, right-coded framing. Take: "Patriots have a sacred duty to defend our traditions from radical activists who want to destroy them." Strip the loaded words: "Some people prefer the way things have been done and disagree with people who want to change them." Same structure — patriots, sacred, radical, and destroy were doing the work that made any specific policy recommendation feel obvious. Strip them out and you're left with a description of a disagreement, not an argument for any particular resolution of it.
The point of running both directions: strip-and-check is symmetric. It catches load-bearing moral framing from any political direction. The discipline is not about defanging one side's rhetoric; it's about pressure-testing whether the structural argument can stand on its own once the emotional vocabulary is removed.
Contrast case where the move fails to expose hidden work. Take: "Murder is wrong, so we should have laws against it." Strip: "Killing people without justification harms them and destabilizes society, so we should have laws against it." The recommendation still follows. The moral language was naming something the structural argument already supports — not doing hidden work. That's a sign you're looking at a real argument, not a rhetorical one.
The move doesn't tell you whether the conclusion is right. It tells you whether you're being shown an argument or a feeling dressed up as one.
10. Specificity demand
Abstract claims dodge falsification by never touching reality. Force them down to specifics.
"Give me a concrete instance. A specific case, a specific number, a specific event. If the claim is true, what does it predict about [thing we can check]?"
Claims that can't get specific without falling apart usually were the falling-apart all along.
11. Pre-commitment falsifier
Before forming a strong position, decide in advance what evidence would change your mind. Then watch for that evidence honestly.
If you can't name anything that would change your mind, you don't hold the position as a belief — you hold it as identity. That's a different thing and it should be acknowledged as such.
How they fit together
None of these are new. Plato did steelmanning. Aquinas wrote out objections to his own positions before answering them. Mill argued that you don't understand your own view until you understand its strongest opponent. Popper made falsification the engine of science. What's new is that the cost of running each move dropped about a thousandfold in the last three years.
The combination most worth knowing is small:
- Set the mode — ask for logical, not rhetorical, answers.
- Steelman the strongest version of each side (all sides, if there are more than two).
- Surface assumptions under each.
- Verify the load-bearing facts.
- Red-team whichever position you find most convincing.
- Apply the standard symmetrically to your own side.
- Name what would change your mind.
That's maybe half an hour of work for almost any contested question. Most people never do any of it. Almost no one does all seven. The asymmetric advantage available to anyone willing to is, frankly, kind of embarrassing.
What's not on this list (but matters)
There are more moves I reach for when stakes are real:
- Cross-model triangulation — run the same question through Claude, Grok, and Gemini and see where they agree. Convergence is signal; divergence maps the contested space. This defends against the structural failure modes of single-model output: substantially overlapping training corpora and similar RLHF objectives across the major LLMs, and shared blind spots in what each treats as settled versus contested. When all three agree, the claim is robust against a class of model-specific biases. When they diverge, you've located a place where the model's framing — not the underlying reality — is doing the work. The exception worth naming: on topics where all three LLMs have inherited the same captured-ecosystem framing from overlapping training data and similar safety-tuning, convergence isn't validation — it's amplification of shared bias. Cross-model triangulation is high-signal for factual lookups and technical claims; lower-signal for deeply consensus-coded topics where the three models may quietly agree because they were trained to.
- Framing-pair retrieval — ask "steelman X" and "steelman criticizing X" in parallel and compare the outputs. The second framing pulls in rhetorical-defense arguments the first doesn't.
- Inversion / via negativa — instead of "what's the right answer," ask "what's clearly wrong here, and why." Often easier and more reliable.
- Asymmetric-stakes weighting — when one direction of error is much worse than the other, bias toward the safer error even when both are plausible. Standard in medicine; almost never applied in policy.
The eleven above are enough to start. You'll notice the difference inside a week.
Why this matters beyond the personal
The original Enlightenment spread these moves through institutions — universities, scientific journals, parliaments, newspapers — and those institutions are now under real pressure. Whatever you think the cause is, the consequence is the same: the cultural infrastructure that used to do this thinking for us is doing less of it. The moves themselves don't disappear when the institutions weaken; they just become harder to access and practice consistently. This document is one attempt to package them for use with LLMs — portable, individual, cheap, and finally accessible to anyone willing to spend a few minutes per question rather than a few seconds. That's the whole bet: that individuals applying these moves with AI can do meaningful work that institutions used to be relied on for. We'll see.
What would tell me I'm wrong. The bet fails if cost-collapse turns out to be necessary but not sufficient — if, as LLMs become universal, the dominant use case stays affirmation and task automation rather than dialectical inquiry. Three concrete signals to watch: (1) survey data (Pew, academic studies, lab-published usage research) showing the median user pattern staying single-turn task-completion rather than multi-turn dialectical exchange; (2) major-lab product roadmaps continuing to optimize for fast single-turn responses rather than shipping dialectical tools (steelman buttons, falsifier prompts, multi-turn analysis interfaces) — that's market signal for where the labs think the commercial gravity is; (3) publicly-shared LLM transcripts on social media and blogs staying dominated by single-completion patterns rather than visible iterative inquiry. If five years out those trend lines all point the same way, the toolkit exists but the practice doesn't, the institutional erosion continues, and the bet has lost. For the bet to win, enough people have to choose the harder use over the easier one, and enough of those people have to produce visible public outputs that the practice becomes imitable to others. Neither is guaranteed. Naming this so the evidence has to be noticed, rather than gradually re-interpreted to fit the conclusion this document started from.