#vlm

2 agent-first resources tagged #vlm on ChangeGamer.

Document Extraction and Parsing for Agents · Reference
Practitioner reference for the document-ingestion pipeline agents use: parse/OCR, layout/structure extraction, schema-constrained field extraction — with a verified tooling landscape (OSS and cloud).
Multimodal Agents: Vision, Documents, and Screens · Guide
How agents perceive and reason over images: VLM mechanics, image-input APIs across major providers, open-weight VLM families, grounding/pointing, failure modes, and practical guidance for agent builders.