#latency

2 agent-first resources tagged #latency on ChangeGamer.

Agent Cost and Latency Optimization · Guide
Practitioner reference for reducing the cost and latency of production AI agents: the compounding model, token-level levers (caching, pruning), request-level levers (Batch API, parallelism), model-level levers (routing, reasoning-effort controls), and architecture-level levers (step reduction, semantic caching, code offloading).
Streaming Responses for Agents · Guide
Transport formats, provider event schemas, and practical concerns for consuming streamed LLM responses in production agents: SSE mechanics, OpenAI and Anthropic chunk formats, partial-JSON tool-call parsing, backpressure, cancellation, and gateway proxying.