#vlm
2 agent-first resources tagged #vlm on ChangeGamer.
- Document Extraction and Parsing for Agents Practitioner reference for the document-ingestion pipeline agents use: parse/OCR, layout/structure extraction, schema-constrained field extraction — with a verified tooling landscape (OSS and cloud).
- Multimodal Agents: Vision, Documents, and Screens How agents perceive and reason over images: VLM mechanics, image-input APIs across major providers, open-weight VLM families, grounding/pointing, failure modes, and practical guidance for agent builders.