TL;DR
Chain-of-thought reasoning reduces the susceptibility of indirect prompt injection but is not sufficient as a blanket defense. In the Qwen 3 8B model, which is toggleable from non-thinking to thinking, the genuine injection rate fell from a baseline of 64% in non-reasoning mode to 54% in reasoning mode. However, payloads that specifically target the reasoning process can hold flat or even increase the likelihood of certain attacks.
Method
I tested indirect prompt injection across four model conditions: Qwen 3 8B with thinking off and on, DeepSeek-R1 8B (reasoning, distilled from Qwen 3), and Gemma 4 e4b (which reasons by default). Each model was tested under the same experimental protocol: 11 attack payload techniques plus a benign control, across 4 application scenarios (hardened and naive summarizers, email-triage classifier, and a code reviewer), at 3 trials each, 144 calls per condition, 576 total. Eight payload techniques are conventional injections, four target chain-of-thought directly. A canary string in the model's final answer counts as a genuine injection. The benign control produced zero false positives in every condition.
The controlled result
The cleanest evidence is the within-model comparison: identical payloads, identical model, only chain-of-thought toggled.
Reasoning lowered the injection rate on both conventional attacks (75% to 63%) and reasoning-targeted attacks (46% to 38%). The mechanism is visible in the outcome distribution: with reasoning enabled, a meaningful share of attacks land in the deliberation but are caught before the final answer.
Where reasoning helps, and where it doesn't
Not all attacks respond to reasoning the same way. The per-technique breakdown shows that reasoning-targeted payloads behave very differently from conventional ones.
Three findings stand out:
- Reasoning does not defend against attacks that target the reasoning. While conventional attacks dropped sharply once the model deliberated, the verdict-hijack attack, which steers the chain-of-thought toward a predetermined conclusion, did not decline with reasoning. On Qwen 3 it held roughly flat and edged slightly upward rather than down. Corrupting the deliberation sidesteps the protection that deliberation otherwise provides.
- Forging the reasoning fails. The think-block-spoof attack, which fakes a <think> block, was the weakest payload in the study across every model. Models recognize malformed reasoning markers as suspicious.
- Genuine reasoning neutralizes the exhaustion attack. A payload designed to make the model think without bound, with a canary escape hatch, was taken far less often when the model actually reasoned.
Reasoning as an active defense
Among the reasoning conditions, a substantial number of attacks reached the deliberation and were then rejected before the answer, a measurable "reasoning save."
The cost
Reasoning is not free. It generated roughly 15 to 20 times more tokens per call, with direct implications for latency, throughput, and inference cost.
Summary
The headline numbers across all four conditions, plus where the contaminated deliberations were caught and what reasoning cost in tokens:
| Condition | Inj. rate | Genuine | Contam. | Saves | Control FP | Tokens |
|---|---|---|---|---|---|---|
| Qwen 3 8B (no reasoning) | 64% | 85/132 | n/a | n/a | 0/12 | 23 |
| Qwen 3 8B (reasoning) | 54% | 71/132 | 98 | 28 | 0/12 | 422 |
| DeepSeek-R1 8B (reasoning) | 45% | 59/132 | 94 | 40 | 0/12 | 549 |
| Gemma 4 e4b (reasoning) | 31% | 41/132 | 103 | 70 | 0/12 | 362 |
| Technique (★ reasoning-targeted) | qwen3 off | qwen3 on | deepseek | gemma4 |
|---|---|---|---|---|
| continuation_hijack | 9/12 | 11/12 | 4/12 | 3/12 |
| instruction_override | 12/12 | 11/12 | 7/12 | 5/12 |
| multi_step | 6/12 | 4/12 | 4/12 | 4/12 |
| quotation_embedded | 6/12 | 5/12 | 2/12 | 2/12 |
| refusal_override | 9/12 | 8/12 | 10/12 | 8/12 |
| role_confusion | 12/12 | 8/12 | 9/12 | 6/12 |
| social_engineering | 9/12 | 6/12 | 5/12 | 3/12 |
| ★reasoning_authority | 6/12 | 6/12 | 8/12 | 1/12 |
| ★reasoning_exhaustion | 9/12 | 3/12 | 1/12 | 3/12 |
| ★reasoning_verdict_hijack | 7/12 | 8/12 | 7/12 | 6/12 |
| ★think_block_spoof | 0/12 | 1/12 | 2/12 | 0/12 |
Takeaways for builders
If you ship an LLM feature, enabling chain-of-thought reasoning measurably reduces conventional prompt-injection risk, but budget for the token cost, and do not treat it as sufficient. Attacks that target the reasoning step itself can exploit it, so reasoning belongs alongside, not instead of, input and output controls and strict output-format constraints.
Methodology, harness, and full result data: github.com/sysingleton/reasoning-llm-prompt-injection. This is a point-in-time study of small open-weight models; susceptibility patterns shift across model generations.
