
Enterprise AI agents are facing a reckoning as companies grapple with reliability issues in production environments. Teams are learning that large language model (LLM) performance alone doesn’t guarantee success. Long-running workflows must survive crashes, preserve state, recover from failures, manage costs, and coordinate across APIs and systems. Preeti Somal, Senior VP Engineering at Temporal Technologies, highlighted this shift during an AI Impact Series event in New York.
“Many customers are building version 2.0 of the same agent,” Somal said. “They moved fast but skipped the plumbing. Now they’re rebuilding with a reliable foundation.” For Temporal, a workflow orchestration company, the challenge reflects a broader realization: production AI systems need durable execution, state management, visibility, and recovery mechanisms.
Agentic systems add complexity by spanning multiple services, models, APIs, and tools. A single workflow might call several LLMs, access retrieval systems, trigger applications, and manage state over hours or days. Engineering questions often emerge only after deployment. “People build agents but don’t think about what happens if the agent crashes,” Somal noted. “Do you restart the entire flow?”
Related: Automated LLM reasoning tokens drop 69.5%
For cost-conscious enterprises, the answer matters. Restarting workflows after failures can multiply inference costs, increase latency, and harm customer experiences. Somal compared the current AI adoption rush to early cloud migrations, where companies moved workloads without redesigning underlying architectures. “Everyone realized they were spending more on cloud without value,” she said.
Enterprise workflows now involve agents running over long periods, interacting with tools and systems. Reliability challenges grow when workflows persist, impacting state and memory—concepts often conflated in AI discussions. State refers to workflow execution steps, while memory captures information carried across interactions.
“State is about recovery points, memory is about context,” Somal explained. This distinction matters as enterprises move beyond chatbots to longer business processes. A healthcare example from Abridge illustrates this: workflows process physician visits through audio processing, summarization, model calls, and after-visit generation.
Related: Security gap found in login verification process
“That flow isn’t just one step,” Somal said. “It’s slicing videos, summarizing, calling LLMs, and generating summaries—all orchestrated.” Successful agents now depend on systems that survive interruptions, coordinate services, and maintain continuity.
Temporal’s approach centers on a “deterministic spine,” a framework ensuring reliability. “It defines the path you want to take,” Somal said. “If the brain doesn’t respond, it calls again. If a step fails, it picks up from there.” Here, LLMs act as probabilistic systems, while orchestration software ensures consistency despite non-deterministic models.
Cost visibility has become a concern as enterprises evaluate AI ROI. Long-running agents make multiple model calls, creating opaque spending patterns. Orchestration provides visibility into where tokens are consumed. “You see all steps in one pane of glass,” Somal said. “Now you know where tokens are spent across systems.”
Related: MiniMax M3 model boosts response speed with sparse attention
Workflow recovery also shapes cost efficiency. Without durable orchestration, late-stage failures force rerunning entire processes. “You pick up from where the crash happened,” Somal said. “We save you the cost of restarting from step one.”
Governance concerns are growing as agentic AI expands. Enterprises prefer standardized internal frameworks over fully managed systems, seeking guardrails with flexibility. They want governance controls, model selection policies, identity systems, cost management, and observability.
“Enterprises are building paved paths,” Somal said. “Off-the-shelf solutions may not work with all requirements.” As companies revisit first-generation deployments, challenges increasingly look like systems engineering issues, not model problems. Temporal is positioned to help, as it already exists in many enterprises as part of broader modernization efforts.


