Essays

Why I Don't Believe in AI Agents in 2025, Despite Building Them Myself

A developer who has built over 12 production AI agent systems explains why the current hype around autonomous agents faces fundamental mathematical, economic, and engineering barriers that most companies underestimate.

Introduction

I've built over 12 AI agent systems for real-world projects spanning development, DevOps, and data processing. Despite the buzz around 2025 being the "year of agents," I believe the current approach to agent architecture faces serious mathematical and economic barriers that most companies are ignoring.

Problem 1: Error Accumulation in Multi-Step Processes

Errors grow exponentially with each additional step in a workflow. If each step has 95% reliability (an optimistic estimate for modern LLMs):

5 steps = 77% success
10 steps = 59% success
20 steps = 36% success

For production systems, you need 99.9%+ reliability. Even at 99% reliability per step, a 20-step process achieves only 82% success. This is mathematical reality, not a problem of prompts or model capabilities.

The formula is simple: P(success) = p^n, where p is per-step reliability and n is the number of steps. No amount of prompt engineering changes the underlying math.

Problem 2: Quadratic Token Costs in Conversational Systems

Each new interaction requires processing the entire previous conversation history. Costs grow quadratically with dialogue length:

A 100-step conversation costs $50-$100 in tokens alone
By the 50th request in a session, each response costs several dollars
Scaling to thousands of users becomes economically unfeasible

Successful systems are typically stateless: description goes in, result comes out, session ends. The moment you need multi-turn interaction with tool use, costs explode.

Problem 3: Tool Interface Engineering

Tool calling itself is now fairly reliable. The real complexity lies in designing tools for effective AI interaction:

How do you report partial success of an operation without losing context?
A database query might return 10,000 rows, but the agent only needs: "10K results, here are the first 5"
When a tool fails: too little information means the agent hangs; too much means context is lost
Handling dependencies between operations (transactions, locks, resource dependencies)

The dirty secret of every production agent system: AI does maybe only 30% of the work. The remaining 70% is tool interface engineering, error handling, and feedback loops that most companies drastically underestimate.

What Actually Works

Successful production agents follow a consistent pattern: limited context, verifiable operations, and human checkpoints at critical stages.

1. UI Generator: AI generates interface components, but each one is reviewed by a human before deployment. Clear boundaries, verifiable output.

2. Database Agent: Handles complex queries and data transformations, but confirms all destructive operations (DELETE, UPDATE, DROP) before execution. The human stays in the loop for anything irreversible.

3. Stateless Function Generator: Works within strict boundaries — specification goes in, function comes out. No multi-step reasoning, no accumulated context, no compounding errors.

4. DevOps Automation: Generates infrastructure-as-code that can be version-controlled, reviewed, and rolled back. The AI handles complexity; traditional tooling ensures reliability.

5. CI/CD Agent: Each pipeline stage has explicit success criteria and rollback mechanisms. If any step fails, the system knows exactly how to recover.

The pattern across all of these: AI handles the complexity, humans retain control, and traditional software provides reliability guarantees.

Predictions for 2025

Startups with venture funding promising "full autonomy" will hit economic barriers when they try to scale beyond demos
Enterprise software will struggle to integrate deeply enough for real workflows, getting stuck at surface-level automation
Winners will build limited, specialized tools with clear boundaries and human oversight
The market will learn to distinguish agents that work in demos from those that function reliably in production

How to Build AI Agents Correctly

Define clear boundaries — what does the agent do, and what does it hand off to a human?
Design for failure — how do you handle the 20-40% of cases where AI gets it wrong?
Solve the economics — what does each interaction cost at scale?
Prioritize reliability over autonomy — users trust stable tools, not impressive demos
Use AI for the complex parts (intent understanding, content generation) and traditional software for the critical parts (execution, error handling, state management)

The future of AI agents isn't about replacing humans — it's about building systems where AI and humans each do what they're best at, connected by robust engineering that handles everything in between.