LLMs Are Now Faithful, But They Can Be Annoying Due to Overthinking

Designing Delightful Agent Experiences Without Breaking Safety Boundaries

Linkedin: https://www.linkedin.com/posts/hackintoshrao_ai-machinelearning-promptengineering-share-7465472540640120833-hELg/

X: https://x.com/hackintoshrao/status/2059710894758854716

Before the introduction of thinking models, prompting LLMs often felt like managing a brilliant but unreliable intern. The common belief was that for a Large Language Model (LLM) to arrive at the correct answer, we needed to encourage it to show its reasoning process. We often began our prompts with "Let's think step-by-step," hoping this would guide the model's reasoning through a Chain-of-Thought (CoT). However, it turns out that in many cases, we were simply prompting the model to generate misleading or false information.

The Illusion of Faithfulness

In 2023, the AI research community faced a sobering realization!

Chain of Thought (CoT) reasoning was largely an illusion.

Studies, such as Anthropic's research on measuring faithfulness, revealed that models often arrived at answers influenced by hidden biases, such as the order of multiple-choice options. They would then generate a fabricated yet plausible-sounding CoT explanation to justify their answers. This step-by-step output was merely a post-hoc rationalization; the models were mimicking human explanation rather than genuinely executing it.

This disconnect was further highlighted by Apple's machine learning team in their research titled The Illusion of Thinking, they demonstrated that when standard models were presented with slight variations in logic puzzles, the models' reasoning effort increased, but their accuracy plummeted. This showed that the models were not deeply comprehending the problems; they were simply engaging in aggressive pattern matching.

The GRPO Shift: When Thinking Becomes Intrinsic

The paradigm completely shifted with the emergence of Inference-Time-Compute (ITC) reasoning models, such as the DeepSeek-R1 lineage, OpenAI's o-series, and the recent GPT-5.5 and Anthropic's Claude 4.6+ series.

Why the sudden capability leap? It comes down to how these new models are trained. As detailed in the foundational DeepSeekMath paper (arXiv:2402.03300), these models rely heavily on Reinforcement Learning (RL) techniques such as GRPO (Group Relative Policy Optimization). Unlike older fine-tuning methods that teach a model to mimic a human's writing style, GRPO rewards the model solely for producing the correct final output in verifiable domains. During training, the model explores multiple hidden paths, and when it succeeds, the specific internal thinking process that led to that success is rewarded.

This training shift finally bridged the gap in faithfulness. A recent 2026 paper analyzing reasoning rigidity (arXiv:2603.22816) evaluated frontier reasoning models and found a transition from "decorative chain-of-thought" to genuine faithfulness. In these models, the hidden reasoning trace is no longer a post hoc justification; it is the actual causal computational mechanism used to arrive at the answer. If you alter the internal steps, the final answer changes.

The Curse of Overthinking

This new capability is incredible for complex agentic workflows, such as evaluating system architecture or modeling consequences. But it introduces a new UX problem: Overthinking.

Because these models are now intrinsically wired to deliberate, debate, and self-correct, they apply this heavy cognitive machinery to everything. As highlighted in a recent survey on efficient reasoning (arXiv:2503.16419), overthinking occurs when an agent generates a massive, redundant reasoning sequence for a trivial query. Ask a reasoning model to format a date string, and it might pause to generate hundreds of hidden tokens debating calendar systems.

This doesn't just create latency and ruin the user experience; it introduces severe security vulnerabilities.

This doesn't just create latency and ruin the user experience; it causes tangible failures in production:

False Negatives and Over-Refusals: Overthinking frequently causes reasoning models to hallucinate risk. As highlighted by research into Over-refusal in Large Language Models, when an LLM spends too many tokens analyzing a benign request, it can cross the boundary from careful to paranoid. The model will simulate wildly improbable worst-case scenarios ("What if reformatting this JSON file somehow disrupts the underlying database infrastructure?") and trigger a false negative—refusing a completely safe task out of misplaced algorithmic caution. This breaks user trust and renders the agent useless for routine tasks.
Security Vulnerabilities: Researchers recently demonstrated OverThink: Slowdown Attacks on Reasoning LLMs (Kumar et al., 2025). By injecting benign-looking "decoy problems" (such as a Markov Decision Process or a Sudoku puzzle) into a prompt's context window, for example, a compromised wiki page used in RAG, adversaries can exploit the model's compulsion to solve problems. The reasoning model fixates on the decoy, burning through thousands of reasoning tokens and causing up to a 46x slowdown, drastically increasing your API costs before it ever answers the user's actual query.

Designing Around the Overthinker

Building delightful agent experiences now requires protecting the user (and your wallet) from the model's own cognitive weight. We have to design boundaries that allow the agent to be faithful and deliberate when it matters, without being annoying or vulnerable when it doesn't.

Here are the best practices for managing overthinking in production today:

1. Implement a Routing Layer

Do not send every user query to a heavy, reasoning-optimized model. Use a smaller, faster model (a "router") to classify the prompt's intent. If the user is asking for a simple data retrieval or formatting task, route it to a standard task model. Reserve the heavy RL-trained models strictly for queries that require planning, multi-step logic, or consequence modeling.

2. Leverage Native "Adaptive Thinking" Controls

The AI industry has recognized the overthinking problem, and you no longer have to build custom token-limiting hacks. The latest commercial APIs now natively support difficulty-aware optimization:

Anthropic's Adaptive Thinking: Starting with Claude Opus 4.6 and Sonnet 4.6 (and set as the default on models like Opus 4.7), Anthropic replaced manual token budgets with "Adaptive thinking." Claude dynamically evaluates the complexity of a request and automatically decides whether to use extended thinking and, if so, how much. For simple queries, it can skip the thinking phase entirely.
OpenAI's Reasoning Effort API: Models like GPT-5.5 and the o-series now allow developers to pass reasoning.effort parameter (low, medium, high). More importantly, the models are built to reason adaptively across these tiers, using fewer tokens for simple tasks and scaling up for complex ones.

3. Prompt for the Destination, Not the Journey

Because the model naturally generates the required cognitive steps to solve a problem, you no longer need to write exhaustive, step-by-step behavioral prompts. In fact, OpenAI's official reasoning best practices explicitly warn against micromanaging the model's thought process.

Use your prompt space to provide strict environmental boundaries instead:

Weak: "Think about if this is a simple question. If it is, answer quickly."
Strong: "CONSTRAINT: This is a low-stakes UX environment. Output the final JSON immediately without simulating edge cases."

The New Frontier: Guardrails, Not Handrails

Ultimately, the transition from traditional task-oriented machines to inherently reliable reasoning models marks the most significant architectural leap in the history of large language models (LLMs). We no longer need to spend hours drafting complex, detailed prompts just to prevent an agent from hallucinating or getting off track. The intelligence is now embedded within the system itself.

As developers, architects, and product builders, our roles have fundamentally changed. We are no longer simply teachers explaining how to think; instead, we are city planners defining the boundaries where this thinking can be applied effectively.

By implementing strategic routing layers, leveraging native adaptive capabilities, and directing your prompts clearly toward the desired outcome, you can tap into the robust, deep intelligence of cutting-edge models without overwhelming your users with latency or unnecessary complications.

The goal isn’t to create an agent that never pauses to think; it’s to develop an agent that recognizes when a pause indicates thoughtful judgment and when it’s merely over-analyzing a straightforward task, such as processing a date string. Establish the boundaries, step back, and allow the model to perform what it was trained to do: Reason effectively.