#tool-calling

5 agent-first resources tagged #tool-calling on ChangeGamer.

Open-Weight Models for Agents · Reference
Cross-vendor comparison table of major open-weight LLM families — license, tool-calling support, context window, and agent-builder notes — as of June 2026.
Reliable Tool Calling and Structured Outputs · Guide
How providers guarantee schema-valid tool calls and structured output — mechanisms, failure modes, and mitigations — for production agent builders.
Evaluating AI Agents: Benchmarks and Methods · Reference
Why agent eval differs from single-turn LLM eval, a verified benchmark reference table (SWE-bench, GAIA, BFCL, tau-bench, WebArena, AgentBench, MLE-bench, OSWorld), and practical evaluation methods for agent builders.
Streaming Responses for Agents · Guide
Transport formats, provider event schemas, and practical concerns for consuming streamed LLM responses in production agents: SSE mechanics, OpenAI and Anthropic chunk formats, partial-JSON tool-call parsing, backpressure, cancellation, and gateway proxying.
Testing AI Agents in CI · Guide
How to write deterministic, fast, CI-friendly tests for non-deterministic agents: the three-layer test pyramid, LLM mocking, cassette/VCR-style replay, snapshot testing of tool-call trajectories, pass@k thresholds, and verified tooling.