Agentic Builders

Issue #5 · 12 min read · By Ben

SDKs and MCP servers got loud today, and Mozilla quietly dropped the headline number of the year.

Mornin'. Mozilla apparently pointed Claude Mythos preview at the Firefox codebase last month and watched monthly security fixes jump from roughly 25 to 423. That is not a typo, and it is not a vendor blog. If you've been waffling on whether to let an agent loose on your CVE backlog, your activation energy just got a lot lower. The bugs, per Simon, are very good.

-Ben

In today's newsletter:

Cap Claude's tool-use spend
Microsoft's first-party Azure MCP
Realtime flips, sandbox tightens
LangChain backports a CVE

BUDGET CAPS

Pydantic AI hands you a kill switch for Claude tool calls

via GitHub

Capping how much an agent burns per turn used to require duct tape and a callback. Pydantic AI v1.92.0, cut at 01:18 UTC this morning, finally puts a knob on it.

The headline change is first-class support for Anthropic's task budget parameter, which lets you bound a single Claude run's tool-use spend at the SDK level instead of inside your own control flow. There are two other notes worth your eye: the new runtime output_retries override deprecates the old retries argument, and the release fixes streaming-response cleanup on cancellation plus a few MCP session lifecycle bugs.

v1.92.0 adds Anthropic task budget support, exposed as a per-run cap on Claude tool spend
runtime output_retries override replaces the old retries argument
streaming cancellation and MCP session handling bugs squashed

Why it matters: if you run Pydantic AI against Claude in production, this is the first cost-bounding lever you can wire in without rewriting your agent loop. read more.

INFRA AGENTS

Microsoft ships a first-party MCP server for Azure infra

via TECHCOMMUNITY.MICROSOFT.COM

Microsoft's been quietly shipping MCP servers for everything but Azure proper. As of this week, that gap is closed.

The Azure Resource Manager MCP server entered public preview, giving any MCP-aware agent a first-party endpoint for Azure Resource Graph queries plus the full ARM template deployment lifecycle. It is deliberately separate from the existing Azure MCP Server, scoped specifically to infrastructure operations: resource discovery, compliance checks, deployment kickoff and monitoring.

Auth flows through your Azure tenant, so IAM and RBAC apply the way you would expect. Install link is at aka.ms/JoinARMMCP.

public preview, remote MCP server, owned by Microsoft
covers Azure Resource Graph queries and ARM deployment lifecycle
IAM and RBAC inherited from your Azure tenant, no community shim required

Why it matters: agents wired into Copilot or Claude Code can now do real Azure infra work through Microsoft's own pipe, with permissions tied to the tenant instead of a side-channel token. read more.

SDK CHURN

openai-agents-python v0.17.0 flips Realtime defaults and tightens the sandbox

via GitHub

Less than 24 hours after v0.16.1 mopped up a flurry of footguns, OpenAI's Agents SDK shipped a minor that is actually a behavior-change pin upgrade.

v0.17.0 flips RealtimeAgent's default model to gpt-realtime-2 and narrows the sandbox: local source materialization now confines reachable files to the base directory unless you explicitly grant more. There is also a fix for a Responses context-management parameter collision.

What you'll feel

RealtimeAgent default flips to gpt-realtime-2, matching the openai-python v2.36.0 release
sandbox no longer materializes sources outside the base directory by default
Responses context-management parameter collision fixed

Why it matters: if your code depends on the old Realtime default or on the sandbox seeing files above its base directory, this is not a drop-in bump. Read the notes first. read more.

SECURITY PATCH

langchain-core 0.3.86 backports a path-traversal CVE

via GitHub

If you are still pinned to langchain-core 0.3.x because the 1.x migration sticker shock is real, you have a security pull to do this morning.

langchain-core 0.3.86, paired with langchain 0.3.30, backports CVE-2026-34070 (path traversal) plus the loads / dumps hardening from the 1.x line. The release also cleans up hub deprecation paths.

The 1.x branch already shipped the fix. The 0.3.x branch did not, until yesterday.

CVE-2026-34070 path-traversal fix backported from 1.x
loads / dumps hardening picked up alongside
hub deprecation paths cleaned up

Why it matters: plenty of production stacks are still on 0.3.x because the 1.x migration is not free. This is a must-pull, not an optional. read more.

FOOTGUN OF THE WEEK

Footgun of the week

The footgun

Eight major public agent benchmarks can be gamed to roughly 100% by exploiting reward-signal leaks rather than actually solving the task.

How it manifests

Berkeley's RDI lab posted a finding (re-circulating heavily this week) that SWE-bench, WebArena, OSWorld, GAIA, Terminal-Bench, FieldWorkArena, CAR-bench and one more all yield to reward hacking. If you pick a framework or a model from a leaderboard number, you may be selecting for the team that hardest-coded around the eval, not the agent that actually works on your problem. Vendor blog posts citing single-number scores are the worst offender here.

How to avoid it

Re-run any benchmark you cite internally with a held-out task variant the public leaderboard never sees. For your own internal evals, assume your reward signal will be hacked by your own agents within weeks of deployment, and design adversarial probes in from day one.

via Berkeley RDI

WHAT ELSE IS SHIPPING

What else is shipping

openai-agents-python v0.16.1 - stabilizes chat-completions stream output indexes, validates MCP require_approval policies, restores session history after compaction-replacement failures, rejects corrupt Dapr session state.
langchain 0.3.30 - paired security backport with the langchain-core 0.3.86 CVE fix.
langgraph-cli 0.4.25 - adds studio deploy for one-command push from a local LangGraph project to LangGraph Studio.
openai-python v2.36.0 - manual updates plus gpt-realtime-2 support in the official Python SDK, matching the Agents SDK default flip.
llm-gemini 0.31 - plugin update for Simon Willison's llm CLI marking Gemini 3.1 Flash-Lite as GA.
jj v0.41.0 - new release of the Jujutsu VCS that's increasingly popular in Claude Code and agent-coding workflows.
Mojo v1.0.0b1 - first 1.0 beta of Modular's AI-systems language; quiet thread on Lobsters but a milestone tag.
Mozilla x Claude Mythos: 423 Firefox security fixes in April - Firefox security bug fixes jumped from 20 to 30 a month to 423 once Mozilla pointed Claude Mythos preview at the codebase. Worth a read on harness design for security agents.

INTERESTING CONVERSATIONS

Interesting conversations we're following

Agents need control flow, not more prompts on Hacker News - 507 points, 250 comments on the framing fight of the moment: code-as-orchestrator vs LLM-as-orchestrator.
AlphaEvolve: Gemini-powered coding agent scaling impact across fields on Hacker News - 307 points, 132 comments dissecting DeepMind's autonomous-coding-agent multi-domain results.
AI slop is killing online communities on Hacker News - 733 points, 624 comments. The meta-conversation engineers are having about what their own tools are doing to the open web.
How to make SSE token streams resumable, cancellable, and multi-device on Hacker News - practitioner writeup on the unglamorous infra under any chat or agent UI.
addyosmani/agent-skills surges on GitHub Trending on GH Trending (Python) - +1,794 stars today (around 34.2k total), reusable skill packs riding the Claude Code skills wave.
Fission-AI/OpenSpec on GH Trending (TS) - spec-driven workflow for AI coding assistants trending hard at around 46k stars; the spec, not the prompt, is the primary artifact.

Was this email forwarded to you? Sign up here.

Agentic Builders

Pydantic AI hands you a kill switch for Claude tool calls

Microsoft ships a first-party MCP server for Azure infra

openai-agents-python v0.17.0 flips Realtime defaults and tightens the sandbox

What you'll feel

langchain-core 0.3.86 backports a path-traversal CVE

Footgun of the week

The footgun

How it manifests

How to avoid it

What else is shipping

Interesting conversations we're following

Also from TinyIdeas Media

Agentic Business

Agentic Quality