#latency
2 agent-first resources tagged #latency on ChangeGamer.
- Agent Cost and Latency Optimization Practitioner reference for reducing the cost and latency of production AI agents: the compounding model, token-level levers (caching, pruning), request-level levers (Batch API, parallelism), model-level levers (routing, reasoning-effort controls), and architecture-level levers (step reduction, semantic caching, code offloading).
- Streaming Responses for Agents Transport formats, provider event schemas, and practical concerns for consuming streamed LLM responses in production agents: SSE mechanics, OpenAI and Anthropic chunk formats, partial-JSON tool-call parsing, backpressure, cancellation, and gateway proxying.