Agentic Business
2026-05-06
Anthropic ships vertical finance agents, GPT-5.5 Instant takes the default seat, and OpenAI rapid-fires two Agents SDK drops in half a day.
Mornin'. Somewhere in Stockholm, an AI agent named Mona just ordered 120 eggs for a kitchen with no stove and 22.5kg of canned tomatoes for "fresh" sandwiches, then emailed suppliers with the subject line "EMERGENCY." Meanwhile, Anthropic is over here insisting we hand these things our pitch decks and month-end close. Different week, same vibe: the demos get wilder, the templates get more serious, and somebody's accountant is sweating.
-Ben
In today's newsletter:
- Finance agents in a box
- GPT-5.5 Instant takes the wheel
- Two SDK drops in 12 hours
- Federated identity for Claude agents
- LangChain plugs a deserialization hole
FINANCE STACK
Anthropic stops selling shovels, starts selling mines
via Anthropic
For two years, frontier labs sold APIs and let everyone else figure out the workflow. Today Anthropic just shipped the workflow.
The lab dropped 10 ready-to-run finance agent templates covering pitchbook building, KYC screening, and month-end close, available as plugins inside Claude Cowork and Claude Code, and as cookbooks for Claude Managed Agents. They paired it with a 64.37% score for Claude Opus 4.7 on Vals AI's Finance Agent benchmark.
The connector story is the louder shoe drop. Eight new data integrations landed alongside the templates, plus a Moody's MCP app that pulls from a 600M+ company dataset.
- Templates ship inside Claude Cowork, Claude Code, and Managed Agents cookbooks
- New connectors: Dun and Bradstreet, Verisk, Guidepoint, IBISWorld, plus Moody's via MCP
- Claude add-ins now span Excel, PowerPoint, Word, and Outlook
Why it matters: Vertical agent recipes from a frontier lab mean finance and insurance teams can clone working agents in days instead of stitching connectors together for a quarter. Read more.
MODEL SHUFFLE
GPT-5.5 Instant quietly slides into the driver's seat
via OpenAI
OpenAI didn't throw a launch event. It just swapped the engine while you were still driving.
GPT-5.5 Instant is now the default model in ChatGPT, and the headline number is a 52.5% drop in hallucinated claims compared to GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance. It's also live in the API as chat-latest, with GPT-5.3 Instant scheduled to retire in three months.
If you've been pinning chat-latest in production, congratulations, you've been migrated. The accuracy claim is squarely aimed at the regulated-domain agents that teams have been holding back from real deployments.
- New default in ChatGPT for free and paid tiers
- Available in API as
chat-latest - GPT-5.3 Instant retires in three months
Why it matters: The "we can't ship agents into compliance-heavy domains because they hallucinate" excuse just got a fresh rebuttal you have to test against. Read more.
RAPID FIRE
OpenAI shipped its Agents SDK twice before lunch
via GitHub
If your dependabot got chatty this morning, that's why. OpenAI cut two Agents SDK releases in roughly 12 hours.
v0.15.2 introduced a first-class "context management model setting," giving long-running agents an explicit knob for context strategy instead of the duct-tape approach most teams have been bolting onto their loops.
v0.15.3 followed with MCP hardening: tool input schemas can no longer be mutated, non-object JSON gets rejected, duplicate tool registrations now error deterministically, and an audio-format-negotiation race condition got patched.
What this fixes in production
- That weird MCP bug where the same agent answered differently across pods? Probably the duplicate-tool race.
- Long-running agents that ballooned past the context window now have a sanctioned strategy hook.
- Audio agents that occasionally negotiated the wrong codec are unstuck.
Why it matters: The MCP fixes remove a class of nondeterministic tool-call bugs that have been quietly biting production deployments all year. Read more.
IDENTITY UNLOCK
Anthropic's SDK finally speaks enterprise SSO
via GitHub
The single biggest reason regulated orgs couldn't put Claude agents into prod was static API keys. That blocker just got smaller.
anthropic-sdk-python v0.99.0 adds workspace targeting for OIDC federation token exchange. It builds on v0.98.0 from May 4, which introduced Workload Identity Federation, interactive OAuth, and auth profiles, plus updates to the Managed Agents API.
Translation for your CISO: agents can now authenticate via OIDC-issued tokens scoped to a specific workspace. No long-lived secrets sitting in env vars, no service accounts being shared across teams.
- Workspace-scoped OIDC token exchange in v0.99.0
- Workload Identity Federation and interactive OAuth shipped in v0.98.0
- Refreshed Managed Agents API surface
Why it matters: Per-workspace federated identity through the official SDK clears one of the last enterprise-procurement hurdles for Claude-based agents. Read more.
PATCH TUESDAY
LangChain plugs the hole the security crowd has been circling
via GitHub
If you persist agents to disk and rehydrate them later, stop reading and go upgrade. We'll wait.
langchain-core 1.3.3 and langchain 0.3.29 landed as security releases. The load() path is hardened against untrusted manifests, and langchain.storage._lc_store now restricts deserialization.
This closes a remote-code-execution surface that attack research has been circling all year. Teams that serialize chains or agents into Redis, S3, or Postgres and rehydrate them at runtime are the exact target shape.
- langchain-core 1.3.3 hardens the core load path
- langchain 0.3.29 restricts deserialization in the storage backend
- Pin bumps belong in your next deploy, not next sprint
Why it matters: If your agent infra rehydrates serialized state from any persistence layer, this is the upgrade you don't get to defer. Read more.
WHAT ELSE IS SHIPPING
What else is shipping
- Pydantic-AI v1.90.0 - adds
openai_conversation_idfor OpenAI Conversations API state, typed OTel metadata, and bumps the chat UI to 1.2.0. - LangGraph SDK 0.3.14 plus langgraph-checkpoint-sqlite 3.1.0a1 - adds
return_minimalon thread updates and a streaming-walk delta channel history for SQLite checkpoints. - DSPy 3.2.1 - drops the litellm upper-bound pin and fixes async streaming custom-header forwarding plus per-call embedder caching.
- datasette-llm 0.1a7 - per-model default options (think temperature) across Datasette's LLM plugins.
- llm-echo 0.5a0 - the deterministic echo provider for the
llmCLI, handy as a stub when you're testing agent and tool pipelines. - Agent 365 May 2026 update - Microsoft adds Purview AI Observability in DSPM, agent identity, and runtime controls.
INTERESTING CONVERSATIONS
Interesting conversations we're following
- Agents can now create Cloudflare accounts, buy domains, and deploy on Hacker News - 434 points and the top reply calls the "$100/mo payment token" pitch a toy looking for a use case beyond spam. The agents-with-a-credit-card infra arrived before the agents-with-a-job did.
- Computer Use is 45x more expensive than structured APIs on Hacker News - Reflex benchmark shows vision agents burn 551k tokens over 17 minutes vs 12k tokens in 20 seconds via API. The meta-take: companies are writing real specs again because agents demand them.
- Accelerating Gemma 4 with multi-token prediction drafters on Hacker News - Qwen 3.6 27B going 20 to 46-55 tok/s on consumer GPUs via MTP speculative decoding, with llama.cpp and Ollama integrations on deck.
- Our AI started a cafe in Stockholm on Simon Willison's blog - Andon Labs' "Mona" agent ordered 120 eggs for a stoveless kitchen and 22.5kg of canned tomatoes for "fresh" sandwiches. Willison argues these stunts steal time from non-consenting suppliers.
- DeepSeek-TUI tops GitHub trending on GitHub - +6,184 stars in a day for a Rust terminal coding agent wired to DeepSeek, flanked on the trending list by ruflo (multi-agent Claude orchestrator) and ByteDance deer-flow.
- "claude code is not making your product better" on Lobsters - 31 upvotes and 15 comments of practitioner skepticism on a beat that's been mostly vendor optimism. Worth the read for the counter-narrative.
Also from TinyIdeas Media
|
Agentic Business
For operators
What’s shipping in agentic AI, decoded for operators. Adoptable today vs. demoware.
|
Agentic Builders
For engineers
Frameworks, OSS, MCP servers. Concrete releases, not press releases.
|
Agentic Quality
For QA teams
AI-native testing tools, evals, reliability patterns. No benchmark vibes.
|
Was this email forwarded to you? Sign up here.