Agentic Business
2026-05-11
Frontier labs go industrial, while an agentic security audit gets its first honest public scorecard.
Mornin'. The curl maintainer spent his Monday grading Anthropic's homework, and the homework came back with a big red pen. Mythos read 178,000 lines of curl, declared five "confirmed" vulnerabilities, and after a real human triage exactly one held up. That is not a model problem so much as a triage-cost problem, and it is the cleanest number we have for what agentic code review actually costs you per finding.
-Ben
In today's newsletter:
- OpenAI's $10B FDE factory
- Mythos vs. curl: 1-for-5
- Same-day Agents SDK patch
- Anthropic's new AWS lane
- Ads land inside ChatGPT
DEPLOYMENT INC
OpenAI just turned "Forward Deployed Engineer" into a $10B company
via OpenAI
Palantir spent a decade proving that the real product is the engineer sitting next to the customer. OpenAI just stapled that idea to a frontier model and gave it a balance sheet.
The new OpenAI Deployment Company is a majority-OpenAI joint venture, seeded with more than $4B from TPG, Bain Capital, Brookfield, McKinsey, Capgemini and 14 other backers. To staff it on day one, OpenAI is acquiring Tomoro and folding in roughly 150 Forward Deployed Engineers.
Translation: the people who actually get models into a Fortune 500 are no longer a cost center inside sales. They are a PE-backed services arm, co-owned by the same consultancies that already live inside enterprise IT budgets.
- $10B target valuation, OpenAI keeps majority control.
- Capital from TPG, Bain, Brookfield, McKinsey, Capgemini, plus 14 others.
- Tomoro acquisition brings ~150 FDEs in on day one.
Why it matters: If you build agents for enterprises, your new competition (and possibly your new channel) is a McKinsey partner with an OpenAI badge. Read more.
AGENT AUDIT
Mythos read all of curl and found one real bug
via daniel.haxx.se
Daniel Stenberg has been triaging curl bug reports since before most agents could spell "buffer overflow." So when Anthropic pointed Mythos at his codebase, he was the worst possible grader.
Mythos crawled 178,000 lines of C, flagged five "confirmed" security vulnerabilities, and shipped them over for review. After the curl security team did the work, the verdict landed: "yes, as in singular one." A single low-severity CVE, plus about 20 ancillary bugs that were real but not security issues.
This is the cleanest public number we have on agentic code-audit precision in the wild: top-tier model, famous codebase, adversarial human reviewer, 1-of-5 true-positive rate on the security claims.
- 178,000 lines of curl analyzed in one pass.
- 5 "confirmed" vulnerabilities reported, 1 held up as a real CVE.
- ~20 ancillary, non-security bugs surfaced as a side effect.
Why it matters: If you are wiring agents into a SAST pipeline, the bottleneck is not model capability, it is the human hour-cost of triaging four false positives for every real one. Read more.
SAME-DAY PATCH
OpenAI Agents SDK v0.17.1 closes three classes of prod-breaker
via GitHub
Agents SDK shipped a release, then shipped a patch on the same calendar day. That is usually a tell that something was on fire in production.
v0.17.1 hits almost every subsystem. Sandbox providers now surface errors instead of swallowing them, archive extraction is bounded so a hostile zip cannot blow out your disk, and tracing finally shuts down gracefully instead of dropping spans on exit.
The Realtime side gets the bigger glow-up: max_output_tokens is now exposed, tool-approval scoping is tighter, and the parallel tool-call path no longer duplicates content when two tools fire at once. Session retrieval against MongoDB and Redis is now corruption-resistant.
- Sandbox: archive-extraction limits + visible provider errors.
- Realtime: scoped tool approvals, exposed token caps, fixed tool-call iterator.
- Sessions: corruption-resistant retrieval on MongoDB and Redis.
Why it matters: If you run Agents SDK in prod, this is a no-brainer upgrade: it closes a sandbox safety gap and stops two real, observable failure modes in Realtime. Read more.
CLOUD CHANNELS
Anthropic gives AWS shops a direct lane that is not Bedrock
via GitHub
For two years, "Claude on AWS" effectively meant "Claude through Bedrock." That changed quietly in a Python SDK release note.
anthropic-sdk-python v0.101.0 introduces a first-class AWS client for Claude Platform on AWS, a native access path that sits alongside (not inside) the Bedrock integration.
It is a small surface change with a big procurement implication: Anthropic is hardening its enterprise AWS story without making Bedrock the only door.
- New AWS client distinct from the Bedrock route.
- Vendor-supported, first-class SDK surface.
- Cleaner upgrade path for teams standardized on AWS.
Why it matters: Fewer abstraction layers between your agent and the model, and one less reason to argue with your AWS rep about which catalog SKU to expense. Read more.
ADS ARRIVE
ChatGPT becomes an ad surface
The most-used consumer LLM is no longer a pure subscription product. OpenAI confirmed it has started testing ads inside ChatGPT, with formats, objectives, and ranking rules "evolving over time."
The detail is thin on purpose. The structural shift is not: there is now an ad auction sitting somewhere between the user prompt and the model response, and every custom GPT, GPT Store listing, and Apps SDK integration is going to live next to it.
The economics of being a third-party agent on ChatGPT just got more complicated, in the same way the economics of being a third-party app on Google did around 2009.
- Ads now in active testing inside ChatGPT.
- Formats, objectives, and capabilities still TBD.
- No public eligibility or attribution rules yet for third-party agents.
Why it matters: If your distribution plan was "ship a GPT and let ChatGPT route users," you now have a second variable to model: ad ranking. Read more.
WHAT ELSE IS SHIPPING
What else is shipping
- Can I Use Agents - a "caniuse"-style compatibility matrix for MCP, tool use, and sub-agents across Claude Code, Cursor, Aider, Copilot, and Codex.
- BrowserCode - Claude Code compiled to WebAssembly and running in the browser, a clean preview of the sandboxed agent-IDE pattern.
- SLayer - a semantic layer maintained by agents that explore models, run queries, and evolve DB connections.
- SkillOS - skill-tree curation for agents that accumulate reusable skills across sessions.
INTERESTING CONVERSATIONS
Interesting conversations we're following
- An AI coding agent needs to reduce your maintenance costs on Hacker News - 280 points in 16 hours arguing agent ROI is a maintenance-delta question, not a lines-shipped one. The HN frame on coding agents is shifting.
- Local AI needs to be the norm on Hacker News - 1,565 points and 608 comments, the biggest agentic-adjacent thread of the day, debating whether Gemma 4 31B on a 3080 is "good enough."
- Running local models on an M4 with 24GB on Hacker News - concrete model and quant picks for what actually fits and runs usably on Apple silicon.
- Claude as a userspace IP stack: how fast does it ping? on Hacker News - someone wired Claude in as a TCP/IP stack and benchmarked it. A useful sanity check on agent-as-runtime latency floors.
- UI-TARS-desktop trending #1 on GitHub Trending - ByteDance's open-source multimodal computer-use agent at 32.8k stars, the OSS counterpart to closed Anthropic and OpenAI demos.
- Learning on the shop floor on simonwillison.net - Simon Willison on Shopify's "River" agent being pinned to public Slack so every action becomes searchable osmosis learning.
Also from TinyIdeas Media
|
Agentic Business
For operators
What’s shipping in agentic AI, decoded for operators. Adoptable today vs. demoware.
|
Agentic Builders
For engineers
Frameworks, OSS, MCP servers. Concrete releases, not press releases.
|
Agentic Quality
For QA teams
AI-native testing tools, evals, reliability patterns. No benchmark vibes.
|
Was this email forwarded to you? Sign up here.