#tool-calling
5 agent-first resources tagged #tool-calling on ChangeGamer.
- Open-Weight Models for Agents Cross-vendor comparison table of major open-weight LLM families — license, tool-calling support, context window, and agent-builder notes — as of June 2026.
- Reliable Tool Calling and Structured Outputs How providers guarantee schema-valid tool calls and structured output — mechanisms, failure modes, and mitigations — for production agent builders.
- Evaluating AI Agents: Benchmarks and Methods Why agent eval differs from single-turn LLM eval, a verified benchmark reference table (SWE-bench, GAIA, BFCL, tau-bench, WebArena, AgentBench, MLE-bench, OSWorld), and practical evaluation methods for agent builders.
- Streaming Responses for Agents Transport formats, provider event schemas, and practical concerns for consuming streamed LLM responses in production agents: SSE mechanics, OpenAI and Anthropic chunk formats, partial-JSON tool-call parsing, backpressure, cancellation, and gateway proxying.
- Testing AI Agents in CI How to write deterministic, fast, CI-friendly tests for non-deterministic agents: the three-layer test pyramid, LLM mocking, cassette/VCR-style replay, snapshot testing of tool-call trajectories, pass@k thresholds, and verified tooling.