Curated for Edge AI
Open-weight language models from 1B to 35B parameters, selected for prompt adherence, tool calling, and reasoning — optimized for on-device deployment with EdgeAI.
Models that reliably generate structured function calls, enabling agentic workflows on-device.
Chain-of-thought and hybrid thinking modes for complex multi-step problem solving at the edge.
Every model runs locally on CPUs, GPUs, or NPUs — no cloud dependency, no per-token costs.
Tier 1 · 1 – 4B
Phones, IoT, Embedded
Smallest model with real tool calling and hybrid thinking modes.
Broadest ecosystem support with native tool use and Meta safety tools.
Best overall at this size for tool calling with 128K context window.
256K context, vision support, and native function calling.
Tier 2 · 4 – 14B
Laptops, Jetson, Single GPU
Best balance of size and capability with hybrid thinking modes.
Multimodal with 140+ languages and Google edge SDK support.
Best-in-class reasoning at this size, distilled from DeepSeek-R1.
Strongest tool calling in tier with 128K context window.
Tier 3 · 14 – 35B
Workstations, Edge Servers
Large model intelligence at small model cost — best edge efficiency.
Best dense model for tool calling, on par with Llama 70B.
Beats Gemini 1.5-Pro with best Google ecosystem integration.
Strongest dense Qwen3 with hybrid thinking and 128K context.
Notable Picks
Standout models worth watching — novel architectures, first-of-their-kind releases, and compact VLMs for edge vision tasks.
OpenAI's first open-weight model. MXFP4 quantized MoE that fits in 16GB — Apache 2.0 licensed.
Liquid AI's edge-native architecture — 2x faster than Qwen3 on CPU, optimized for embedded SoCs.
HuggingFace's compact VLM for image understanding — strong vision performance under 3B parameters.
Tiny vision-language model that runs in 2GB RAM — ideal for edge visual understanding tasks.