Agentic Business

2026-05-05

Issue #2 · 4 min read · By Ben

Two real moves at the platform layer, one big claim from a vendor, and a paper that puts a number on the cost problem.

Mornin'. AWS just slid OpenAI's Codex inside Bedrock and quietly stood up a managed-agent runtime alongside it, which is a polite way of saying your next procurement meeting got 30% shorter. Frontier coding agents on your existing AWS bill, with orchestration and sandboxing handled, means the "do we run our own harness" debate finally has an off-ramp for operators who never wanted that debate in the first place. Multi-cloud agent strategy is starting to look like a 2025 flex.

-Ben

In today's newsletter:

  • AWS swallows OpenAI's Codex
  • Agents go on a 30% token diet
  • Pinecone's pre-compiled retrieval pitch

CLOUD CROSSOVER

AWS pulls OpenAI's Codex into Bedrock, plus a managed agent runtime

AWS Weekly Roundup

via AWS

Multi-cloud juggling for frontier agents just got one juggle shorter.

AWS announced a limited preview that puts OpenAI's Codex coding agent inside Bedrock, accessible through the CLI, desktop app, and VS Code extension, and billable against existing AWS commitments. Alongside it: Bedrock Managed Agents, a fully managed runtime powered by OpenAI that handles orchestration, sandboxing, and session state.

Translation: if you're already on AWS, you can run OpenAI's coding and research agents without spinning up a separate vendor relationship, and you can ship production agents without writing your own harness.

  • Codex on Bedrock is in limited preview, charged against AWS commitments
  • Bedrock Managed Agents handle orchestration, sandboxing, and session state out of the box
  • Amazon Connect also got an agentic suite (Decisions, Talent, Customer, Health) with AgentCore optimization and A/B testing

Why it matters: The infra layer for production agents is consolidating fast, and "frontier model + managed harness on your existing cloud bill" is now a real procurement story. Read more.


TOKEN DIET

AgentDiet trims a third off agent costs without trimming results

FSE 2026 conference, Montreal

via FSE 2026

If your agent's token bill looks suspicious, ByteDance and Peking University researchers think they know why.

Their FSE 2026 paper, AgentDiet, argues that multi-turn agent trajectories carry a lot of dead weight: useless steps, redundant context, expired information that the model is still being charged to read. Their fix automatically prunes the trajectory between turns.

The reported result: input tokens drop by 39.9% to 59.7% and total computational cost falls by 21.1% to 35.9%, with task performance holding steady.

  • Targets the "ever-growing trajectory" problem that scales with agent complexity
  • Tested on multi-turn agent systems where token bloat is most painful
  • Peer-reviewed at FSE 2026 in Montreal, presented today

Why it matters: Cost is the silent killer of agent products that work in demos and die in pilots; a published, repeatable technique that cuts roughly a third of the bill is directly relevant to anyone running long-horizon agents in prod. Read more.


FOUND IT

Pinecone's pitch: pre-compile the knowledge before the agent runs

Pinecone Nexus product illustration

via Pinecone

This one is a vendor pitch, not breaking news, but it's the most interesting vendor pitch I read this week. Pinecone is calling their new product Nexus and reframing the retrieval layer as a knowledge engine: instead of agents pulling raw chunks at inference time and stitching context together on the fly, Nexus pre-compiles task-optimized artifacts with per-field citations.

The headline numbers in their post are a 30x speedup over retrieval-loop agents and 90%+ task completion rates. Take those with the usual vendor-benchmark skepticism. The argument underneath is the part worth chewing on: that the retrieval-as-loop pattern most agents use today is doing the work in the wrong place.

Worth tracking if: you're running RAG-heavy agents in production and your latency or cost numbers don't pencil out. The architecture story is interesting whether or not Nexus specifically holds up. Read more.


PRIME NUMBER

Prime number

21-36%

That's the share of compute cost AgentDiet shaves off multi-turn LLM agents while keeping task performance flat, which is roughly the difference between an agent product that pencils out and one that doesn't.

  • Technique: automatic pruning of useless, redundant, and expired steps from agent trajectories
  • Token impact: 39.9% to 59.7% reduction in input tokens per run
  • Source: peer-reviewed FSE 2026 paper from ByteDance and Peking University, presented May 5

via FSE 2026


WHAT ELSE IS SHIPPING

What else is shipping

  • Pydantic AI v1.90.0 - native support for OpenAI's Conversations API so multi-turn agents stop hand-rolling their own context store, plus typed OpenTelemetry metadata for code tool calls.
  • LangGraph v1.2.0a6 - DeltaChannel beta cuts checkpoint write overhead, plus per-node timeouts, error handlers, graceful shutdown, and a Streaming API v3 with typed content-block events.
  • LangGraph v1.2.0a7 - quick follow-up adds a public get_writes_history saver API and a delta-cadence rework for checkpoint persistence.
  • CrewAI v1.14.5a2 - pre-release with task output restoration in finally blocks, corrected token counting, and a fix for shared LLM stop words mutating across agents.
  • Claude Code May 5 release - iTerm2 clipboard integration with tmux support, MCP server auto-retry on transient errors, terminal session title generation, and LSP diagnostic improvements.

Also from TinyIdeas Media