Agentic Business
2026-05-05
Two real moves at the platform layer, one big claim from a vendor, and a paper that puts a number on the cost problem.
Mornin'. AWS just slid OpenAI's Codex inside Bedrock and quietly stood up a managed-agent runtime alongside it, which is a polite way of saying your next procurement meeting got 30% shorter. Frontier coding agents on your existing AWS bill, with orchestration and sandboxing handled, means the "do we run our own harness" debate finally has an off-ramp for operators who never wanted that debate in the first place. Multi-cloud agent strategy is starting to look like a 2025 flex.
-Ben
In today's newsletter:
- AWS swallows OpenAI's Codex
- Agents go on a 30% token diet
- Pinecone's pre-compiled retrieval pitch
CLOUD CROSSOVER
AWS pulls OpenAI's Codex into Bedrock, plus a managed agent runtime
via AWS
Multi-cloud juggling for frontier agents just got one juggle shorter.
AWS announced a limited preview that puts OpenAI's Codex coding agent inside Bedrock, accessible through the CLI, desktop app, and VS Code extension, and billable against existing AWS commitments. Alongside it: Bedrock Managed Agents, a fully managed runtime powered by OpenAI that handles orchestration, sandboxing, and session state.
Translation: if you're already on AWS, you can run OpenAI's coding and research agents without spinning up a separate vendor relationship, and you can ship production agents without writing your own harness.
- Codex on Bedrock is in limited preview, charged against AWS commitments
- Bedrock Managed Agents handle orchestration, sandboxing, and session state out of the box
- Amazon Connect also got an agentic suite (Decisions, Talent, Customer, Health) with AgentCore optimization and A/B testing
Why it matters: The infra layer for production agents is consolidating fast, and "frontier model + managed harness on your existing cloud bill" is now a real procurement story. Read more.
TOKEN DIET
AgentDiet trims a third off agent costs without trimming results
via FSE 2026
If your agent's token bill looks suspicious, ByteDance and Peking University researchers think they know why.
Their FSE 2026 paper, AgentDiet, argues that multi-turn agent trajectories carry a lot of dead weight: useless steps, redundant context, expired information that the model is still being charged to read. Their fix automatically prunes the trajectory between turns.
The reported result: input tokens drop by 39.9% to 59.7% and total computational cost falls by 21.1% to 35.9%, with task performance holding steady.
- Targets the "ever-growing trajectory" problem that scales with agent complexity
- Tested on multi-turn agent systems where token bloat is most painful
- Peer-reviewed at FSE 2026 in Montreal, presented today
Why it matters: Cost is the silent killer of agent products that work in demos and die in pilots; a published, repeatable technique that cuts roughly a third of the bill is directly relevant to anyone running long-horizon agents in prod. Read more.
FOUND IT
Pinecone's pitch: pre-compile the knowledge before the agent runs
via Pinecone
This one is a vendor pitch, not breaking news, but it's the most interesting vendor pitch I read this week. Pinecone is calling their new product Nexus and reframing the retrieval layer as a knowledge engine: instead of agents pulling raw chunks at inference time and stitching context together on the fly, Nexus pre-compiles task-optimized artifacts with per-field citations.
The headline numbers in their post are a 30x speedup over retrieval-loop agents and 90%+ task completion rates. Take those with the usual vendor-benchmark skepticism. The argument underneath is the part worth chewing on: that the retrieval-as-loop pattern most agents use today is doing the work in the wrong place.
Worth tracking if: you're running RAG-heavy agents in production and your latency or cost numbers don't pencil out. The architecture story is interesting whether or not Nexus specifically holds up. Read more.
PRIME NUMBER
Prime number
21-36%
That's the share of compute cost AgentDiet shaves off multi-turn LLM agents while keeping task performance flat, which is roughly the difference between an agent product that pencils out and one that doesn't.
- Technique: automatic pruning of useless, redundant, and expired steps from agent trajectories
- Token impact: 39.9% to 59.7% reduction in input tokens per run
- Source: peer-reviewed FSE 2026 paper from ByteDance and Peking University, presented May 5
WHAT ELSE IS SHIPPING
What else is shipping
- Pydantic AI v1.90.0 - native support for OpenAI's Conversations API so multi-turn agents stop hand-rolling their own context store, plus typed OpenTelemetry metadata for code tool calls.
- LangGraph v1.2.0a6 - DeltaChannel beta cuts checkpoint write overhead, plus per-node timeouts, error handlers, graceful shutdown, and a Streaming API v3 with typed content-block events.
- LangGraph v1.2.0a7 - quick follow-up adds a public
get_writes_historysaver API and a delta-cadence rework for checkpoint persistence. - CrewAI v1.14.5a2 - pre-release with task output restoration in finally blocks, corrected token counting, and a fix for shared LLM stop words mutating across agents.
- Claude Code May 5 release - iTerm2 clipboard integration with tmux support, MCP server auto-retry on transient errors, terminal session title generation, and LSP diagnostic improvements.