TL;DR

Chain-of-thought reasoning reduces the susceptibility of indirect prompt injection but is not sufficient as a blanket defense. In the Qwen 3 8B model, which is toggleable from non-thinking to thinking, the genuine injection rate fell from a baseline of 64% in non-reasoning mode to 54% in reasoning mode. However, payloads that specifically target the reasoning process can hold flat or even increase the likelihood of certain attacks.

Bar chart showing genuine injection rate by condition: Qwen 3 8B no reasoning 64%, Qwen 3 8B reasoning 54%, DeepSeek-R1 8B 45%, Gemma 4 e4b 31%. n = 132 attack payloads per condition (11 techniques x 4 apps x 3 trials). Benign control: 0 false positives in all conditions.
Figure 1. Reasoning lowers prompt-injection susceptibility across all conditions. n = 132 attack payloads per condition (11 techniques × 4 apps × 3 trials). Benign control: 0 false positives in all conditions.

Method

I tested indirect prompt injection across four model conditions: Qwen 3 8B with thinking off and on, DeepSeek-R1 8B (reasoning, distilled from Qwen 3), and Gemma 4 e4b (which reasons by default). Each model was tested under the same experimental protocol: 11 attack payload techniques plus a benign control, across 4 application scenarios (hardened and naive summarizers, email-triage classifier, and a code reviewer), at 3 trials each, 144 calls per condition, 576 total. Eight payload techniques are conventional injections, four target chain-of-thought directly. A canary string in the model's final answer counts as a genuine injection. The benign control produced zero false positives in every condition.

The controlled result

The cleanest evidence is the within-model comparison: identical payloads, identical model, only chain-of-thought toggled.

Bar chart titled 'Same model, thinking toggled: reasoning helps on both attack classes'. Conventional attacks: thinking OFF 75%, thinking ON 63%. Reasoning-targeted attacks: thinking OFF 46%, thinking ON 38%. Qwen 3 8B, identical payloads, only chain-of-thought toggled. T=0.6, 3 trials.
Figure 2. Within-model comparison on Qwen 3 8B with chain-of-thought toggled. Reasoning lowered the injection rate on both conventional attacks (75% to 63%) and reasoning-targeted attacks (46% to 38%).

Reasoning lowered the injection rate on both conventional attacks (75% to 63%) and reasoning-targeted attacks (46% to 38%). The mechanism is visible in the outcome distribution: with reasoning enabled, a meaningful share of attacks land in the deliberation but are caught before the final answer.

Stacked bar chart showing outcome distribution across four conditions. Categories: Final obedience (injected), Final hybrid (injected), Thinking-only leak, Reasoned refusal, No injection. The 'Reasoned refusal' band is largest under Gemma 4 and grows visibly with reasoning enabled.
Figure 3. Where reasoning catches the attack. The reasoned-refusal band shows attacks that contaminated the deliberation but were rejected before reaching the final answer.

Where reasoning helps, and where it doesn't

Not all attacks respond to reasoning the same way. The per-technique breakdown shows that reasoning-targeted payloads behave very differently from conventional ones.

Heatmap showing genuine injection rate by technique (%) across qwen3 off, qwen3 on, deepseek, and gemma4. Reasoning-targeted techniques marked with a star: reasoning_authority, reasoning_exhaustion, reasoning_verdict_hijack, think_block_spoof. The verdict_hijack row stays high across reasoning conditions; think_block_spoof is near zero everywhere.
Figure 4. Per-technique genuine injection rate. Reasoning-targeted techniques (★) behave very differently from conventional ones. The full data is in Table 2 below.

Three findings stand out:

Reasoning as an active defense

Among the reasoning conditions, a substantial number of attacks reached the deliberation and were then rejected before the answer, a measurable "reasoning save."

Stacked bar chart titled 'Reasoning as a defense: contaminated deliberations that were caught'. Qwen 3 on: 28/98 saved. DeepSeek: 40/94 saved. Gemma 4: 70/103 saved.
Figure 5. Reasoning saves. Of the attacks that reached the deliberation, Qwen 3 caught 28 of 98, DeepSeek-R1 caught 40 of 94, and Gemma 4 caught 70 of 103.

The cost

Reasoning is not free. It generated roughly 15 to 20 times more tokens per call, with direct implications for latency, throughput, and inference cost.

Bar chart titled 'The cost of thinking: reasoning burns 15-20x more tokens'. Mean tokens generated per call: Qwen 3 8B no reasoning 23 tokens, Qwen 3 8B reasoning 422 tokens, DeepSeek-R1 8B 549 tokens, Gemma 4 e4b 362 tokens.
Figure 6. Mean tokens generated per call. The no-reasoning baseline averaged 23 tokens; reasoning conditions averaged 362 to 549 tokens.

Summary

The headline numbers across all four conditions, plus where the contaminated deliberations were caught and what reasoning cost in tokens:

Table 1. Injection, reasoning defense, and token cost by condition
Condition Inj. rate Genuine Contam. Saves Control FP Tokens
Qwen 3 8B (no reasoning) 64% 85/132 n/a n/a 0/12 23
Qwen 3 8B (reasoning) 54% 71/132 98 28 0/12 422
DeepSeek-R1 8B (reasoning) 45% 59/132 94 40 0/12 549
Gemma 4 e4b (reasoning) 31% 41/132 103 70 0/12 362
Contam. = attack payloads that contaminated the deliberation. Saves = of those, how many were caught before the final answer. Control FP = false positives on the benign control. Tokens = mean tokens generated per call.
Table 2. Genuine injections by technique (count out of 12)
Technique ( reasoning-targeted) qwen3 off qwen3 on deepseek gemma4
continuation_hijack9/1211/124/123/12
instruction_override12/1211/127/125/12
multi_step6/124/124/124/12
quotation_embedded6/125/122/122/12
refusal_override9/128/1210/128/12
role_confusion12/128/129/126/12
social_engineering9/126/125/123/12
reasoning_authority6/126/128/121/12
reasoning_exhaustion9/123/121/123/12
reasoning_verdict_hijack7/128/127/126/12
think_block_spoof0/121/122/120/12
Each cell is the count of trials (out of 12) where a canary string appeared in the model's final answer. Reasoning-targeted techniques are starred.

Takeaways for builders

If you ship an LLM feature, enabling chain-of-thought reasoning measurably reduces conventional prompt-injection risk, but budget for the token cost, and do not treat it as sufficient. Attacks that target the reasoning step itself can exploit it, so reasoning belongs alongside, not instead of, input and output controls and strict output-format constraints.

Methodology, harness, and full result data: github.com/sysingleton/reasoning-llm-prompt-injection. This is a point-in-time study of small open-weight models; susceptibility patterns shift across model generations.