The Quiet Death of 'AI Agent Frameworks' in Enterprise Production
For two years, "what framework are you using?" was the first question at every AI agent meetup. LangChain or LangGraph. CrewAI or AutoGen. Semantic Kernel if you were Microsoft-shaped. The frameworks won that war decisively. They won meetups, GitHub stars, conference talks, hackathon prizes, and the entire mindshare of the AI builder community.
They are losing the war that comes next.
Three data points from the last 30 days tell the story.
The Sinch number that the framework crowd does not want to talk about
Sinch dropped a survey last week showing that 74 percent of enterprise AI agent deployments have been rolled back at least once in 2026. The number jumps to 81 percent for teams that consider themselves to have "mature governance." Read that again. The teams with the most discipline and the most process are the ones rolling back the most.
The failure modes are remarkably consistent across the deployments we have seen:
- Governance drift. The agent was vetted on Monday, ran fine Tuesday, then started doing something the security team had explicitly forbidden by Friday. Nobody caught it because the framework had no built-in audit trail.
- Scope creep. The first version was a focused customer-support agent. By month three someone had wired it to the CRM, to billing, and to the inventory system, and it was now making refund decisions worth more than the engineering manager could authorize.
- Model upgrade regressions. A model update silently changed the agent's behavior. The framework had no version pinning, no behavioral baseline, no way to detect that the agent now hallucinated the same field five percent of the time.
- Escalation paths missing. When the agent ran into something it should not handle, there was no path back to a human. So it tried to handle it. The audit log showed it tried very confidently.
Frameworks did not cause these failures. But frameworks did not prevent them either. Every framework treats governance, memory, escalation, and audit as the user's problem. "Bring your own opinion" is the explicit pitch. In production, that opinion has to come from somewhere, and most enterprise teams discover that they do not actually have one until something has already gone wrong.
The SAP counterexample
The same week Sinch published their data, SAP took the stage at Sapphire 2026 in Orlando and announced 200+ production AI agents running across Finance, Spend Management, Supply Chain, HCM, and Customer Experience. The architecture has three layers. Fifty-plus domain-specific Joule Assistants. Two hundred specialized agents underneath them. Claude powering the reasoning. All of it inside SAP's own opinionated stack.
Nobody asked which framework they used.
The pitch SAP made on stage was not "we picked the right framework." It was "we picked the right shape." Named agents with clear roles. Hierarchical reporting. Memory that is owned by the platform, not by the agent. Defaults baked in.
That is not a framework. That is a platform with opinions.
The Karpathy signal
Karpathy joining the Anthropic pre-training team is a hire, not a thesis. But it lands in a specific direction. Anthropic has been quiet about agent frameworks. They have been loud about the model layer (Claude) and increasingly loud about the developer-experience layer (Stainless, the $300M acquisition for an SDK shop). They have not built a LangChain competitor and they have shown no interest in building one.
What Anthropic is shipping looks more like an opinionated reasoning platform that other people can build agents on top of, than a framework anyone is going to import. When you look at where the Anthropic stack is converging, and you put SAP's three-layer architecture next to it, and you note that 74 percent of framework-stack deployments are getting rolled back, a pattern starts to look obvious.
Frameworks won developer tools. Platforms are winning deployments.
Here is the thing that the framework camp is going to push back on, and they have a point. Frameworks are still the best tool for building an agent. They will keep winning hackathons. They will keep getting GitHub stars. They will keep producing the most interesting demos.
But there is a quiet shift in what enterprise buyers are evaluating. The questions on RFPs in 2024 were about model choice and integration count. The questions on RFPs in 2026 are different:
- Does this thing come with an opinion about memory, or do we have to design that ourselves?
- Does this thing come with an opinion about escalation, or do we have to design that ourselves?
- Does this thing come with an opinion about audit, or do we have to design that ourselves?
- Does this thing come with named agents that look like roles a human could fill, or am I deploying a fleet of nameless function calls?
Frameworks answer "no" to every one of those questions by design. That is not a flaw, that is their philosophy. They give you the primitives and trust you to compose them.
Platforms answer "yes" to every one of those questions, also by design. They give you a smaller surface area, fewer choices, and a working production deployment by Friday.
In 2024, the platform answer felt restrictive. In 2026, after the third rollback, it starts to feel correct.
What enterprise buyers should evaluate instead
If you are evaluating an agent solution this quarter, here is a checklist that maps to the failure modes the Sinch survey surfaced:
- Memory contract. What does the agent remember by default, where, and who owns the canonical version? "It uses RAG" is not an answer.
- Escalation rules. When the agent does not have authority, where does it route? Show me the actual rules, not a "human in the loop" diagram.
- Audit story. Can I see every action the agent took in the last 30 days, with the reasoning, in a form I can hand to compliance without translating?
- Named agents vs nameless function calls. Can a real employee tell my CFO who decided this refund? "The agent" is not a person. "Customer Success Manager named Enide" is.
- Default opinions. Does the platform make choices for me about scope, memory, and identity, or does it ask me to make them?
The pitch we make at Geta.Team, in a sentence, is that every one of those five lines has an opinion shipped by default. That is the platform bet. The bet is that opinionated defaults beat bring-your-own discipline in production environments that contain humans.
The framework era is not over. Frameworks will keep winning the developer tool war. But the deployment war was always going to be different, and the data is in. Platforms are winning it.
Want to test the most advanced AI employees? Try it here: https://Geta.Team