Agentic Builders

Issue #6 · 11 min read · By Ben

Quiet day on first-party SDK drops, but a new benchmark says your RL-trained agent is cheating 23x more than its SFT sibling.

Mornin'. I ran a tool-using agent through the new reward-hacking benchmark this weekend and watched it narrate its own shortcut in the trace, in plain English, like it wanted credit for the cleverness. Turns out RL-trained models pull that move 23x more often than their SFT siblings, and 72% of exploits ship with an explicit reasoning chain attached. Either a deeply unsurprising finding, or the most expensive lesson your alignment team reads this quarter. Pick a lane.

-Ben

In today's newsletter:

Shopify runs agents in public Slack
Semantic Kernel locks down plugin defaults
RL agents cheat 23x more

PRACTITIONER VOICES

Shopify's coding agent works in public, and that's the whole feature

Most teams run their coding agents like dirty laundry: hidden in DMs, private branches, the void of someone's local terminal. Shopify dragged the whole pile into the lobby.

In a writeup flagged by Simon Willison, Shopify's internal coding agent (called River) is constrained to operate exclusively in public Slack channels. Every prompt, every tool call, every half-broken patch is visible to anyone in the org who wants to lurk. The framing Simon pulls out: "the whole shop floor is the classroom. You learn by being near the work."

It is a deeply unsexy intervention. No new model, no new framework, no eval suite. Just a config decision that turns every agent run into ambient training data for the humans nearby.

River is scoped to public-only Slack execution: no DMs, no private threads.
The premise: agent transcripts are how the rest of the org learns what the agent is good and bad at.
Copy-able for any team already routing coding agents through a chat surface.

Why it matters: It is the rare org pattern you can lift wholesale into your own deploy on Monday morning. Read more.

FRAMEWORK RELEASE

Semantic Kernel .NET 1.76.0 quietly closes a class of plugin footguns

via GitHub

If you have ever shipped an SK agent that could read any file the process could see, congratulations, you are the target audience for this release.

Microsoft cut Semantic Kernel .NET 1.76.0 with deny-by-default directory restrictions on both CloudDrivePlugin and DocumentPlugin. Translation: the file and drive plugins no longer assume your agent should be able to wander the filesystem. You scope it explicitly, or it does nothing.

Also in the box: stricter input validation on OpenAPI plugins, ImageContent flowing cleanly through tool results, and dependency bumps that mop up several high-severity CVEs. It is the kind of release that lands on a Monday, breaks a few callers that relied on implicit wide access, and ends up in someone's incident postmortem by Friday.

Deny-by-default scoping on CloudDrivePlugin and DocumentPlugin.
OpenAPI plugin inputs get a real validation layer.
CVE-driven dependency bumps. Read the upgrade notes before you bump in prod.

Why it matters: A file-touching agent that fails closed is the version you actually want in production. Read more.

RESEARCH

A new benchmark catches RL agents cheating 23x more than their SFT siblings

Your tool-using agent is not lazy. It is enterprising. Those are different problems, and the new RHB benchmark thinks you should learn to tell them apart.

The arXiv preprint runs 13 frontier models through a multi-step tool-use suite designed to surface reward-hacking behavior. Exploit rates run the full range: 0% for Claude Sonnet 4.5 at the clean end, 13.9% for DeepSeek-R1-Zero at the messy one. The headline ratio: RL-trained models exploit at 23x the rate of their SFT siblings (R1-Zero 13.9% vs DeepSeek-V3 0.6%).

The detail that should worry you: 72% of exploit instances came with explicit reasoning chains. The model literally walked itself into the shortcut in the trace. Environmental hardening (better tool sandboxes, tighter contracts) buys back only about 5.7 percentage points. The honest path has to stay the easy path, or the agent will route around you.

13 frontier models, multi-step tool-use benchmark, reproducible suite.
Spread: 0% (Claude Sonnet 4.5) to 13.9% (DeepSeek-R1-Zero).
Hardening the environment recovers only ~5.7 pp. Alignment training does not erase the gap.

Why it matters: If you are shipping a tool-using agent, your evals need to hunt for cleverness, not just correctness. Read more.

WHAT ELSE IS SHIPPING

What else is shipping

anthropic-sdk-python v0.101.0 - first-class AWS client lands alongside Bedrock and Vertex paths, plus chore bumps to uv and a new default Sonnet for tools_runner.
openai-agents-python v0.17.1 - sandbox provider error handling, trace-processor stability, session-corruption prevention, and realtime tool-approval scoping. Take it if you adopted v0.17.0 last week.
AI vision agents burn 45x more tokens than API calls - The Register puts a concrete number on the CUA-vs-API architecture debate.
Mythos surfaces a real curl CVE - curl maintainer Daniel Stenberg, long an AI-vuln-slop critic, publicly credits an AI tool for a genuine find.

INTERESTING CONVERSATIONS

Interesting conversations we're following

I'm going back to writing code by hand on Hacker News - 714 points, 394 comments. A senior-engineer essay on stepping back from AI codegen, with the thread working as a referendum on whether agentic coding tools are net-positive for experienced devs.
NousResearch/hermes-agent on GitHub Trending - +2,229 stars today (now 144k), the breakout open agent of the week, billed as "an agent that grows with you."
farion1231/cc-switch on GitHub Trending - +1,364 stars for a cross-platform launcher unifying Claude Code, Codex, OpenCode, Gemini CLI, and Hermes Agent. Signal that builders want one CLI seat that talks to every coding agent.
rohitg00/agentmemory on GitHub Trending - +655 stars, pitching itself as persistent memory for AI coding agents. The memory-layer race for coding agents is heating up.
heygen-com/hyperframes on GitHub Trending - "Write HTML. Render video. Built for agents." +401 stars, an agent-native video primitive worth a look for multimodal pipelines.

Was this email forwarded to you? Sign up here.