Agentic Business

Issue #6 · 12 min read · By Ben

Frontier labs go industrial, while an agentic security audit gets its first honest public scorecard.

Mornin'. The curl maintainer spent his Monday grading Anthropic's homework, and the homework came back with a big red pen. Mythos read 178,000 lines of curl, declared five "confirmed" vulnerabilities, and after a real human triage exactly one held up. That is not a model problem so much as a triage-cost problem, and it is the cleanest number we have for what agentic code review actually costs you per finding.

-Ben

In today's newsletter:

OpenAI's $10B FDE factory
Mythos vs. curl: 1-for-5
Same-day Agents SDK patch
Anthropic's new AWS lane
Ads land inside ChatGPT

DEPLOYMENT INC

OpenAI just turned "Forward Deployed Engineer" into a $10B company

via OpenAI

Palantir spent a decade proving that the real product is the engineer sitting next to the customer. OpenAI just stapled that idea to a frontier model and gave it a balance sheet.

The new OpenAI Deployment Company is a majority-OpenAI joint venture, seeded with more than $4B from TPG, Bain Capital, Brookfield, McKinsey, Capgemini and 14 other backers. To staff it on day one, OpenAI is acquiring Tomoro and folding in roughly 150 Forward Deployed Engineers.

Translation: the people who actually get models into a Fortune 500 are no longer a cost center inside sales. They are a PE-backed services arm, co-owned by the same consultancies that already live inside enterprise IT budgets.

$10B target valuation, OpenAI keeps majority control.
Capital from TPG, Bain, Brookfield, McKinsey, Capgemini, plus 14 others.
Tomoro acquisition brings ~150 FDEs in on day one.

Why it matters: If you build agents for enterprises, your new competition (and possibly your new channel) is a McKinsey partner with an OpenAI badge. Read more.

AGENT AUDIT

Mythos read all of curl and found one real bug

Illustration of a robot accompanying Daniel Stenberg's Mythos post on curl.

via daniel.haxx.se

Daniel Stenberg has been triaging curl bug reports since before most agents could spell "buffer overflow." So when Anthropic pointed Mythos at his codebase, he was the worst possible grader.

Mythos crawled 178,000 lines of C, flagged five "confirmed" security vulnerabilities, and shipped them over for review. After the curl security team did the work, the verdict landed: "yes, as in singular one." A single low-severity CVE, plus about 20 ancillary bugs that were real but not security issues.

This is the cleanest public number we have on agentic code-audit precision in the wild: top-tier model, famous codebase, adversarial human reviewer, 1-of-5 true-positive rate on the security claims.

178,000 lines of curl analyzed in one pass.
5 "confirmed" vulnerabilities reported, 1 held up as a real CVE.
~20 ancillary, non-security bugs surfaced as a side effect.

Why it matters: If you are wiring agents into a SAST pipeline, the bottleneck is not model capability, it is the human hour-cost of triaging four false positives for every real one. Read more.

SAME-DAY PATCH

OpenAI Agents SDK v0.17.1 closes three classes of prod-breaker

via GitHub

Agents SDK shipped a release, then shipped a patch on the same calendar day. That is usually a tell that something was on fire in production.

v0.17.1 hits almost every subsystem. Sandbox providers now surface errors instead of swallowing them, archive extraction is bounded so a hostile zip cannot blow out your disk, and tracing finally shuts down gracefully instead of dropping spans on exit.

The Realtime side gets the bigger glow-up: max_output_tokens is now exposed, tool-approval scoping is tighter, and the parallel tool-call path no longer duplicates content when two tools fire at once. Session retrieval against MongoDB and Redis is now corruption-resistant.

Sandbox: archive-extraction limits + visible provider errors.
Realtime: scoped tool approvals, exposed token caps, fixed tool-call iterator.
Sessions: corruption-resistant retrieval on MongoDB and Redis.

Why it matters: If you run Agents SDK in prod, this is a no-brainer upgrade: it closes a sandbox safety gap and stops two real, observable failure modes in Realtime. Read more.

CLOUD CHANNELS

Anthropic gives AWS shops a direct lane that is not Bedrock

via GitHub

For two years, "Claude on AWS" effectively meant "Claude through Bedrock." That changed quietly in a Python SDK release note.

anthropic-sdk-python v0.101.0 introduces a first-class AWS client for Claude Platform on AWS, a native access path that sits alongside (not inside) the Bedrock integration.

It is a small surface change with a big procurement implication: Anthropic is hardening its enterprise AWS story without making Bedrock the only door.

New AWS client distinct from the Bedrock route.
Vendor-supported, first-class SDK surface.
Cleaner upgrade path for teams standardized on AWS.

Why it matters: Fewer abstraction layers between your agent and the model, and one less reason to argue with your AWS rep about which catalog SKU to expense. Read more.

ADS ARRIVE

ChatGPT becomes an ad surface

The most-used consumer LLM is no longer a pure subscription product. OpenAI confirmed it has started testing ads inside ChatGPT, with formats, objectives, and ranking rules "evolving over time."

The detail is thin on purpose. The structural shift is not: there is now an ad auction sitting somewhere between the user prompt and the model response, and every custom GPT, GPT Store listing, and Apps SDK integration is going to live next to it.

The economics of being a third-party agent on ChatGPT just got more complicated, in the same way the economics of being a third-party app on Google did around 2009.

Ads now in active testing inside ChatGPT.
Formats, objectives, and capabilities still TBD.
No public eligibility or attribution rules yet for third-party agents.

Why it matters: If your distribution plan was "ship a GPT and let ChatGPT route users," you now have a second variable to model: ad ranking. Read more.

WHAT ELSE IS SHIPPING

What else is shipping

Can I Use Agents - a "caniuse"-style compatibility matrix for MCP, tool use, and sub-agents across Claude Code, Cursor, Aider, Copilot, and Codex.
BrowserCode - Claude Code compiled to WebAssembly and running in the browser, a clean preview of the sandboxed agent-IDE pattern.
SLayer - a semantic layer maintained by agents that explore models, run queries, and evolve DB connections.
SkillOS - skill-tree curation for agents that accumulate reusable skills across sessions.

INTERESTING CONVERSATIONS

Interesting conversations we're following

An AI coding agent needs to reduce your maintenance costs on Hacker News - 280 points in 16 hours arguing agent ROI is a maintenance-delta question, not a lines-shipped one. The HN frame on coding agents is shifting.
Local AI needs to be the norm on Hacker News - 1,565 points and 608 comments, the biggest agentic-adjacent thread of the day, debating whether Gemma 4 31B on a 3080 is "good enough."
Running local models on an M4 with 24GB on Hacker News - concrete model and quant picks for what actually fits and runs usably on Apple silicon.
Claude as a userspace IP stack: how fast does it ping? on Hacker News - someone wired Claude in as a TCP/IP stack and benchmarked it. A useful sanity check on agent-as-runtime latency floors.
UI-TARS-desktop trending #1 on GitHub Trending - ByteDance's open-source multimodal computer-use agent at 32.8k stars, the OSS counterpart to closed Anthropic and OpenAI demos.
Learning on the shop floor on simonwillison.net - Simon Willison on Shopify's "River" agent being pinned to public Slack so every action becomes searchable osmosis learning.

Was this email forwarded to you? Sign up here.