# Open-Weight Models for Agents

> Cross-vendor comparison table of major open-weight LLM families — license, tool-calling support, context window, and agent-builder notes — as of June 2026.

Category: Reference · Updated: 2026-06-19 · Tags: open-weight, llm, tool-calling, agents, models, comparison
Canonical: https://changegamer.ai/resources/open-weight-models-for-agents

Open-weight models let agent builders control inference, eliminate per-call vendor fees, and avoid rate-limit ceilings. The tradeoff is hosting cost and model-update lag. This page compares the major families through the lens of what agent builders actually need. All claims are dated June 2026; this space moves fast — verify before pinning a model version.

## Comparison table

| Family | Latest open-weight release | License | Native tool/function-calling | Notable for agents |
|---|---|---|---|---|
| **Meta Llama 4** | Llama 4 Maverick (17B active / 400B total MoE, Apr 2025) | Llama 4 Community License (not OSI open; MAU cap >700M requires Meta approval; commercial use otherwise permitted) | Yes — natively optimized for tool-calling and agentic use | Scout (10M-token context) and Maverick (1M) both remain available on Hugging Face under the Llama 4 Community License; Maverick is the stronger general/agentic pick. Multimodal. Behemoth was previewed but never open-released. |
| **Mistral** | Mistral Small 4 (119B total / 6.5B active MoE, Mar 2026); Mistral Medium 3.5 (128B dense, Apr 2026); Mistral Large 3 (675B total / 41B active MoE, Dec 2025) | Apache 2.0 (Small 4, Large 3); Modified MIT with revenue cap (Medium 3.5) | Yes — function calling and structured output supported across all three; Small 4 also unifies reasoning and vision | Small 4 (256K ctx) unifies Magistral reasoning + Pixtral vision + Devstral coding in one model. Medium 3.5 (256K ctx) is a frontier-class open coding/agentic model. Large 3 (256K ctx) is the largest open-weight Mistral. Mistral Small 3.2 deprecated April 30, 2026. |
| **Alibaba Qwen** | Qwen 3.6-27B / 3.6-35B-A3B (Apr 2026); Qwen3 base series (Apr 2025) | Apache 2.0 | Yes — native tool-calling and MCP support via Qwen-Agent; all sizes | Dense and MoE variants from 0.6B to 235B. Up to 262K context (extensible to 1M via YaRN). Hybrid thinking/non-thinking mode. Qwen 3.7 is closed-weight API-only as of Jun 2026. |
| **DeepSeek** | DeepSeek V4-Pro (1.6T total / 49B active MoE) and V4-Flash (284B total / 13B active), both Apr 2026 preview | MIT | Yes — V4 natively supports function calling, JSON output, tool calls, and thinking / non-thinking modes | 1M context window (default across both V4 variants). V4-Pro: frontier-class agentic coding. V4-Flash: fast/cheap inference. Weights on Hugging Face (deepseek-ai). V4 labeled preview; stable release expected later 2026. |
| **Google Gemma 4** | Gemma 4: E2B, E4B, 26B-A4B, 31B (Mar–Apr 2026); Gemma 4 12B Unified (Jun 2026, encoder-free, native audio) | Apache 2.0 (first Gemma release under true Apache 2.0) | Yes — native function-calling built into Gemma 4; FunctionGemma 270M for edge/on-device | 128K context (E2B/E4B); 256K context (12B+, 26B, 31B). Gemma 4 12B Unified (Jun 3 2026) adds native audio + video via encoder-free architecture; runs on 16 GB RAM. Multimodal across the family. |
| **Microsoft Phi-4** | Phi-4-reasoning-vision-15B (15B, Mar 2026); Phi-4-reasoning (14B, May 2025); Phi-4-mini (3.8B) | MIT | Yes — Phi-4-mini has built-in function calling; the Phi-4 line supports tool use; Phi-4-reasoning for chain-of-thought agentic tasks | Efficiency-first: strong reasoning per parameter. MIT license. Phi-4-reasoning-vision-15B adds selective thinking mode + high-res vision. Phi-4-multimodal adds audio+vision. |
| **IBM Granite 4.1** | Granite 4.1 (3B, 8B, 30B, Apr 2026) | Apache 2.0 | Yes — tool calling follows OpenAI function definition schema; benchmarked on Berkeley BFCL | 512K context window. Enterprise-focused; ISO 42001 certified (Granite 4.0 line). 30B uses hybrid Mamba-Transformer architecture for long-context efficiency; 3B/8B are dense. Sizes 3B–30B. |
| **OpenAI gpt-oss** | gpt-oss-20b (21B total / 3.6B active, MoE); gpt-oss-120b (117B total / 5.1B active, MoE); released Aug 2025 | Apache 2.0 | Yes — native function calling, structured outputs, web browsing, code execution | Reasoning models (comparable to o3-mini / o4-mini). 128K context. MXFP4 quantized weights; 120B fits on a single 80GB GPU. Weights on Hugging Face: huggingface.co/openai. |

## What to look for when picking an open-weight model for agents

### Tool-calling and JSON-mode reliability

Reliable structured output is the single most important property for agents. A model that hallucinates tool names, omits required arguments, or produces malformed JSON turns every downstream step into an error-handling problem. Check: (a) whether the model was instruction-tuned with a tool-use dataset, not just base-pretrained; (b) benchmark scores on Berkeley Function Calling Leaderboard (BFCL) for your target task category; (c) whether the inference framework you use supports the model's chat template exactly (template mismatches silently degrade tool-call reliability).

### License terms: true-open vs source-available

Not all "open-weight" licenses are equal. For commercial agent deployments, the key questions are: (1) Is the license OSI-approved (Apache 2.0, MIT)? If yes, no usage caps or approval gatekeepers exist. (2) Is there a monthly active user (MAU) cap requiring vendor approval? Llama 4 Community License restricts deployments serving >700M MAU to Meta's discretion — irrelevant for most builders, but material at scale. (3) Does the license permit sublicensing or redistribution of fine-tunes? Apache 2.0 and MIT do; Llama Community License restricts this. As of June 2026: Qwen 3.6, Gemma 4, Phi-4, Granite 4.1, Mistral Small 4 / Large 3, DeepSeek V4, and gpt-oss are all Apache 2.0 or MIT — no MAU caps. Mistral Medium 3.5 uses a Modified MIT license with a revenue-threshold clause for large enterprises (not Apache/OSI-approved, but commercial-use-permissive for most builders).

### Context length

Long context matters for agents that hold large tool outputs, long conversation histories, or multi-document retrieval results in the context window. Current verified windows: Qwen 3.6 up to 262K (extendable to 1M via YaRN); Llama 4 Maverick 1M / Scout 10M; Gemma 4 12B+ 256K; DeepSeek V4-Pro and V4-Flash 1M; gpt-oss 128K; Mistral Small 4 and Medium 3.5 256K; Granite 4.1 512K. Long-context performance degrades before the nominal limit — test your actual retrieval patterns, not just the window size.

### Inference cost and the self-hosting tradeoff

Self-hosting eliminates per-token vendor fees but introduces GPU cost, model-update ops, and batching complexity. MoE architectures (Llama 4, Qwen 3.6 MoE, gpt-oss, DeepSeek) run fewer parameters per token at inference — lower latency and VRAM per request than dense models of comparable quality. Dense models (Phi-4 14B, Granite 4.1 3B/8B) are simpler to serve. For burst or experimental workloads, use an inference provider (Together AI, Fireworks, DeepInfra, Replicate) that hosts the weights — the Apache 2.0 / MIT license means no additional fee to the model vendor.

## Verified sources

- Meta Llama 4 blog: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
- Llama 4 model cards and prompt formats: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4/
- Mistral Small 4 announcement: https://mistral.ai/news/mistral-small-4/
- Mistral Small 4 model card: https://docs.mistral.ai/models/model-cards/mistral-small-4-0-26-03
- Mistral Medium 3.5 model card: https://docs.mistral.ai/models/model-cards/mistral-medium-3-5-26-04
- Mistral Medium 3.5 on Hugging Face: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
- Mistral Large 3 announcement: https://mistral.ai/news/mistral-3/
- Qwen3 blog: https://qwenlm.github.io/blog/qwen3/
- Qwen3.6 GitHub: https://github.com/QwenLM/Qwen3.6
- Qwen-Agent (tool-calling + MCP): https://github.com/QwenLM/Qwen-Agent
- DeepSeek V4 preview release notes: https://api-docs.deepseek.com/news/news260424
- DeepSeek V4-Pro on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
- DeepSeek V4-Flash on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
- Gemma 4 releases page: https://ai.google.dev/gemma/docs/releases
- Gemma 4 12B Unified announcement: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/
- Function calling with Gemma 4: https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
- Gemma 4 Apache 2.0 announcement: https://opensource.googleblog.com/2026/03/gemma-4-expanding-the-gemmaverse-with-apache-20.html
- Microsoft Phi-4 on Azure: https://azure.microsoft.com/en-us/products/phi/
- Phi-4-reasoning-vision-15B (Microsoft Research): https://www.microsoft.com/en-us/research/blog/phi-4-reasoning-vision-and-the-lessons-of-training-a-multimodal-reasoning-model/
- Phi-4-reasoning-vision-15B on Hugging Face: https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B
- IBM Granite 4.1 IBM Research blog: https://research.ibm.com/blog/granite-4-1-ai-foundation-models
- OpenAI gpt-oss introduction: https://openai.com/index/introducing-gpt-oss/
- gpt-oss GitHub (weights + model card): https://github.com/openai/gpt-oss
