Tl;Dr: Agents push the full conversation history into context every turn; hence, over a large number of turns, they are extremely read heavy, which in turn is why cache hit rates are an important factor. This post is an analysis of 60+ providers and their cache hit rates using 398 data points. All data sourced from openrouter.ai model pages. This post assumes the reader is familiar with Prefix Caching and all mentions of Caching in this post refer to Prefix Caching.
Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.
Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.
Context processing over multi-turn conversation grows quadratically. Every turn passes the full conversation up until that point into context along with its own input, e.g. turn 100 will push everything up to turn 99 again in the context window. The LLM, on its end, will try to match the longest sequence it can to the caches it has available and build the rest of the conversation as newly input tokens, typically at 10x the price. So on a long 200k context conversation (which btw is a bad idea capability-wise even if you don't care about costs), if the model that costs $5 per million input tokens fails to hit any cache, you'll be charged $1 just for the input processing. Two things determine this:
- 1. Cached input pricing - the headline metric everyone looks at.
- 2. Cache hit rate - the hidden variable that nobody talks about.
I recently spent significant hours thinking there was a bug in Dirac that caused caches to break, only to find in the end that it was entirely due to Gemini Flash 3's cache hit rate.
While trying to look up the data on this, I found that OpenRouter fortunately publishes this data (go to model's page and look for 'Effective Pricing' section). Since the data is hourly, we have to assume that it doesn't change too much hour to hour.
Provider Cache-hit Tier list
Providers with multiple endpoints (e.g. Amazon Bedrock US, Bedrock Global, Bedrock (1)) are listed separately — each entry reflects the hit rate of that specific endpoint as observed.
DeepSeek remains the gold standard of caching, which probably doesn't surprise anyone who has used their official API. In fact, all S-tier entries (hitting 75%+ cache rates) are Chinese labs: DeepSeek (87%), StepFun (86.1%), Moonshot AI (84.8%), MiniMax (75.4%), and Xiaomi (74.7%).
The mainstream US labs place somewhere in the middle but as we will see in the next section, the variance is huge and rather interesting.
On the flip side, we have the "F-Tier". Providers like io.net, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models.
US closed-source big 3
The most interesting thing to me from the chart above is, older models from the same provider tend to get lower cache hit rates. If I had to explain it non-cynically, I would guess that from systems engineering POV, it's probably the cache-pool sizes allocated to each model.
Google does worse than the two other providers across the board, especially considering that they own the full stack on TPUs. This gets full-on clowny when you look at the Vertex AI numbers (see table below) - Opus 4.7 on Vertex AI has 65.30% cache hit rate while Google's own Google_Gemini_3.1_Pro_Preview has 37.30% (and this trend applies to all Claude vs Gemini hosted on Vertex)! How do you manage to get a lower cache hit rate on your own hardware with your model trained on that hardware than a competitor's model? If I was to speculate, I'd guess that the whole 'thought signature' architecture is just not working out.
OSS Models Cost Comparison
OSS models, depending on who you use them from, make a huge difference, mostly due to cache-hit rates.
| Model | Cheapest effectiveInputPrice (Provider) | Most Expensive effectiveInputPrice (Provider) | Difference (Percentage) |
|---|---|---|---|
| Kimi K2.6 | 0.2760 (StreamLake) | 1.0900 (Phala) | 0.8140 (294.93%) |
| MiMo-V2.5-Pro | 0.3720 (Xiaomi) | 0.9060 (DeepInfra) | 0.5340 (143.55%) |
| DeepSeek V4 Pro (Max) | 0.0560 (DeepSeek) | 1.7220 (Parasail) | 1.6660 (2975.00%) |
| GLM-5.1 | 0.3230 (StreamLake) | 1.7470 (Venice) | 1.4240 (440.87%) |
| MiniMax-M2.7 | 0.1430 (MiniMax) | 0.6000 (SambaNova) | 0.4570 (319.58%) |
Small Model Grift
Now onto smaller models that seem instinctively cheaper. Below is the average effective pricing for 4 of the most popular local models (hi r/localLlama)
| Model Name | Total Providers | Avg Eff. Input | Avg Eff. Output | Avg Cache Hit |
|---|---|---|---|---|
| Google_Gemma_4_26B_A4B | 10 | $0.1156 | $0.4150 | 21.57% |
| Google_Gemma_4_31B | 10 | $0.1729 | $0.5049 | 7.44% |
| Qwen_Qwen36_35B_A3B | 7 | $0.1643 | $1.1450 | 11.54% |
| Qwen_Qwen36_27B | 8 | $0.4096 | $2.9433 | 7.66% |
Compare that to
| Model Name (Official API) | Eff. Input Price | Eff. Output Price | Cache Hit Rate |
|---|---|---|---|
| DeepSeek_DeepSeek_V4_Pro | $0.0560 | $0.8690 | 87.90% |
| DeepSeek_DeepSeek_V4_Flash | $0.0220 | $0.2800 | 86.10% |
Yup, you can use DeepSeek V4 Pro, a 1.6 Trillion parameter model whose active 49B parameters are higher than the total parameters of any small model, for cheaper than you can use either of Qwen3.6 models. Thanks to the providers like io.net and DeepInfra offering $0.32/$3.20 input output pricing with 0 caching.
The full table
| Model Name | Provider | Eff. Input Price | Eff. Output Price | Cache Hit Rate |
|---|---|---|---|---|
| Zai_GLM_5 | SiliconFlow | $0.3100 | $2.5490 | 85.30% |
| Zai_GLM_5 | Baidu Qianfan | $0.3930 | $2.2390 | 54.70% |
| Zai_GLM_5 | GMICloud | $0.4200 | $1.9200 | 37.50% |
| Zai_GLM_5 | DeepInfra | $0.3410 | $2.0790 | 54.00% |
| Zai_GLM_5 | Z.ai | $0.4710 | $3.1990 | 66.10% |
| Zai_GLM_5 | Amazon Bedrock | $1.0000 | $3.1990 | 0.10% |
| Zai_GLM_5 | Friendli | $0.8700 | $3.2000 | 26.00% |
| Zai_GLM_5 | StreamLake | $0.4650 | $2.0790 | 35.60% |
| Zai_GLM_5 | NovitaAI | $0.7340 | $3.2000 | 33.20% |
| Zai_GLM_5 | AtlasCloud | $0.7830 | $3.1500 | 22.00% |
| Zai_GLM_5 | Parasail | $0.7720 | $3.2000 | 28.60% |
| Zai_GLM_5 | Together | $1.0000 | $3.2000 | 0.80% |
| Zai_GLM_5 | Chutes | $0.9410 | $2.5500 | 1.80% |
| Zai_GLM_5 | Phala | $1.2000 | $3.5000 | 0.10% |
| Qwen_Qwen3_VL_32B_Instruct | Alibaba Cloud Int. | $0.1040 | $0.4140 | 0.00% |
| Qwen_Qwen36_35B_A3B | Parasail | $0.1020 | $1.0000 | 47.60% |
| Qwen_Qwen36_35B_A3B | Ambient | $0.1170 | $1.0000 | 33.20% |
| Qwen_Qwen36_35B_A3B | io.net | $0.1500 | $1.0000 | 0.00% |
| Qwen_Qwen36_35B_A3B | AkashML | $0.1700 | $1.2000 | 0.00% |
| Qwen_Qwen36_35B_A3B | AtlasCloud | $0.1610 | $0.9650 | 0.00% |
| Qwen_Qwen36_35B_A3B | Weights & Biases | $0.2500 | $1.2500 | 0.00% |
| Qwen_Qwen36_35B_A3B | SiliconFlow | $0.2000 | $1.6000 | 0.00% |
| OpenAI_GPT-41_Nano | OpenAI | $0.0920 | $0.3990 | 11.00% |
| OpenAI_GPT-41_Nano | Azure (1) | $0.0710 | $0.3990 | 41.30% |
| OpenAI_GPT-41_Nano | Azure (2) | $0.1000 | $0.3980 | 0.00% |
| xAI_Grok_43 | xAI | $0.7490 | $2.5000 | 47.80% |
| Anthropic_Claude_Sonnet_46 | Claude Platform on AWS | $0.9370 | $15.0000 | 79.30% |
| Anthropic_Claude_Sonnet_46 | Anthropic | $0.6070 | $15.0000 | 89.90% |
| Anthropic_Claude_Sonnet_46 | Google Vertex (US East) | $2.6200 | $15.0000 | 21.80% |
| Anthropic_Claude_Sonnet_46 | Amazon Bedrock (Global) | $1.3370 | $15.0000 | 64.00% |
| Anthropic_Claude_Sonnet_46 | Amazon Bedrock | $0.9400 | $15.0000 | 78.40% |
| Anthropic_Claude_Sonnet_46 | Google Vertex (Global) | $1.8020 | $15.0000 | 52.70% |
| Anthropic_Claude_Sonnet_46 | Google Vertex (Europe) | $2.4390 | $15.0000 | 28.70% |
| Xiaomi_MiMo-V25-Pro | Xiaomi | $0.3720 | $3.1670 | 94.80% |
| Xiaomi_MiMo-V25-Pro | DeepInfra | $0.9060 | $3.0000 | 11.70% |
| Qwen_Qwen3_Coder_Next | Ionstream | $0.0860 | $0.7990 | 61.10% |
| Qwen_Qwen3_Coder_Next | Parasail | $0.0860 | $0.7990 | 68.40% |
| Qwen_Qwen3_Coder_Next | AtlasCloud | $0.1800 | $1.3490 | 0.00% |
| Qwen_Qwen3_Coder_Next | NovitaAI | $0.2000 | $1.4990 | 0.00% |
| Anthropic_Claude_Opus_46 | Claude Platform on AWS | $2.3320 | $25.0000 | 63.50% |
| Anthropic_Claude_Opus_46 | Amazon Bedrock | $1.4750 | $25.0000 | 81.40% |
| Anthropic_Claude_Opus_46 | Anthropic | $1.6520 | $25.0000 | 79.00% |
| Anthropic_Claude_Opus_46 | Google Vertex | $1.9690 | $25.0000 | 71.10% |
| Anthropic_Claude_Opus_46 | Google Vertex (Europe) | $3.0870 | $25.0000 | 44.30% |
| Anthropic_Claude_Opus_46 | Azure | $6.2500 | $25.0000 | 0.00% |
| Anthropic_Claude_Opus_47 | Claude Platform on AWS | $1.8440 | $25.0000 | 72.40% |
| Anthropic_Claude_Opus_47 | Google Vertex | $2.4580 | $25.0000 | 65.30% |
| Anthropic_Claude_Opus_47 | Amazon Bedrock (US) | $3.9520 | $25.0000 | 23.70% |
| Anthropic_Claude_Opus_47 | Amazon Bedrock | $4.9530 | $25.0000 | 1.20% |
| Anthropic_Claude_Opus_47 | Google Vertex (Europe) | $2.3000 | $25.0000 | 68.20% |
| Anthropic_Claude_Opus_47 | Anthropic | $1.5920 | $25.0000 | 79.10% |
| MiniMax_MiniMax_M27 | MiniMax | $0.1430 | $1.2000 | 65.60% |
| MiniMax_MiniMax_M27 | Together | $0.2010 | $1.1990 | 41.20% |
| MiniMax_MiniMax_M27 | Morph | $0.2790 | $1.1990 | 73.90% |
| MiniMax_MiniMax_M27 | Fireworks | $0.2050 | $1.1990 | 39.20% |
| MiniMax_MiniMax_M27 | MiniMax Highspeed | $0.2510 | $2.3990 | 64.70% |
| MiniMax_MiniMax_M27 | SambaNova | $0.6000 | $2.3990 | 0.00% |
| Qwen_Qwen36_27B | DeepInfra | $0.3200 | $3.1990 | 0.00% |
| Qwen_Qwen36_27B | Alibaba Cloud Int. | $0.4500 | $2.6990 | 0.00% |
| Qwen_Qwen36_27B | Ambient | $0.2670 | $3.2000 | 32.80% |
| Qwen_Qwen36_27B | Weights & Biases | $0.6000 | $3.6000 | 0.00% |
| Qwen_Qwen36_27B | io.net | $0.3170 | $3.1990 | 0.00% |
| Qwen_Qwen36_27B | Morph | $0.4980 | $2.3990 | 28.50% |
| Qwen_Qwen36_27B | Chutes | $0.5000 | $2.0000 | 0.00% |
| Qwen_Qwen36_27B | Venice | $0.3250 | $3.2500 | 0.00% |
| OpenAI_gpt-oss-120b | Google Vertex | $0.0900 | $0.3590 | 4.10% |
| OpenAI_gpt-oss-120b | DeepInfra | $0.0390 | $0.1890 | 0.00% |
| OpenAI_gpt-oss-120b | Groq | $0.1200 | $0.5990 | 40.00% |
| OpenAI_gpt-oss-120b | Cerebras | $0.3500 | $0.7490 | 48.60% |
| OpenAI_gpt-oss-120b | DekaLLM | $0.0390 | $0.1770 | 0.00% |
| OpenAI_gpt-oss-120b | Baseten | $0.1000 | $0.4990 | 52.10% |
| OpenAI_gpt-oss-120b | NovitaAI | $0.0500 | $0.2490 | 2.50% |
| OpenAI_gpt-oss-120b | Ambient | $0.1070 | $0.6000 | 57.00% |
| OpenAI_gpt-oss-120b | DeepInfra (Turbo) | $0.1500 | $0.5990 | 0.00% |
| OpenAI_gpt-oss-120b | Parasail | $0.0890 | $0.7490 | 23.60% |
| OpenAI_gpt-oss-120b | SiliconFlow | $0.0500 | $0.4490 | 0.00% |
| OpenAI_gpt-oss-120b | Amazon Bedrock (1) | $0.1500 | $0.5990 | 0.00% |
| OpenAI_gpt-oss-120b | Nebius Token Factory | $0.1500 | $0.5990 | 0.00% |
| OpenAI_gpt-oss-120b | SambaNova Dedicated | $0.1200 | $0.8990 | 0.00% |
| OpenAI_gpt-oss-120b | SambaNova | $0.1400 | $0.9490 | 0.00% |
| OpenAI_gpt-oss-120b | Together | $0.1500 | $0.6000 | 0.00% |
| OpenAI_gpt-oss-120b | Phala | $0.1000 | $0.4890 | 14.00% |
| OpenAI_gpt-oss-120b | MARA | $0.1500 | $0.7490 | 0.00% |
| OpenAI_gpt-oss-120b | Weights & Biases | $0.1500 | $0.5990 | 12.70% |
| OpenAI_gpt-oss-120b | Amazon Bedrock (2) | $0.1500 | $0.5990 | 0.00% |
| Zai_GLM_47_Flash | DeepInfra | $0.0230 | $0.3990 | 73.40% |
| Zai_GLM_47_Flash | NovitaAI | $0.0610 | $0.3990 | 14.30% |
| Zai_GLM_47_Flash | Cloudflare | $0.0600 | $0.3970 | 0.00% |
| Zai_GLM_47_Flash | Phala | $0.1000 | $0.4290 | 0.00% |
| Zai_GLM_47_Flash | Z.ai | $0.0430 | $0.3990 | 44.20% |
| Zai_GLM_47_Flash | Venice | $0.1250 | $0.4960 | 14.40% |
| OpenAI_GPT-51 | OpenAI | $0.8220 | $10.0000 | 38.10% |
| OpenAI_GPT-51 | Azure (1) | $0.9550 | $10.0000 | 26.30% |
| OpenAI_GPT-54_Mini | OpenAI | $0.3370 | $4.5110 | 61.50% |
| OpenAI_GPT-54_Mini | Azure | $0.6840 | $4.4990 | 9.80% |
| Meta_Llama_31_8B_Instruct | Groq | $0.0330 | $0.0750 | 67.80% |
| Meta_Llama_31_8B_Instruct | DeepInfra | $0.0200 | $0.0480 | 0.00% |
| Meta_Llama_31_8B_Instruct | NovitaAI | $0.0200 | $0.0480 | 0.00% |
| Meta_Llama_31_8B_Instruct | Cerebras | $0.0990 | $0.0950 | 87.50% |
| Meta_Llama_31_8B_Instruct | Cloudflare | $0.1520 | $0.2850 | 0.00% |
| Qwen_Qwen35-9B | DeepInfra | $0.0400 | $0.1480 | 0.00% |
| Qwen_Qwen35-9B | Together | $0.1000 | $0.1490 | 0.00% |
| Qwen_Qwen35-9B | SiliconFlow | $0.1000 | $0.1490 | 0.00% |
| Qwen_Qwen35-9B | Venice | $0.1000 | $0.1490 | 35.50% |
| MoonshotAI_Kimi_K25 | DeepInfra | $0.1660 | $2.2490 | 74.80% |
| MoonshotAI_Kimi_K25 | NovitaAI | $0.2320 | $2.8490 | 71.10% |
| MoonshotAI_Kimi_K25 | ModelRun | $0.1610 | $1.9000 | 77.10% |
| MoonshotAI_Kimi_K25 | Moonshot AI | $0.1360 | $3.0000 | 92.80% |
| MoonshotAI_Kimi_K25 | Fireworks | $0.1960 | $3.0000 | 80.70% |
| MoonshotAI_Kimi_K25 | Chutes | $0.3790 | $2.0000 | 27.70% |
| MoonshotAI_Kimi_K25 | AtlasCloud | $0.3790 | $2.4990 | 38.10% |
| MoonshotAI_Kimi_K25 | SiliconFlow | $0.2590 | $2.2500 | 50.10% |
| MoonshotAI_Kimi_K25 | Cloudflare | $0.3390 | $3.0000 | 52.20% |
| MoonshotAI_Kimi_K25 | Parasail | $0.3810 | $2.7990 | 54.80% |
| MoonshotAI_Kimi_K25 | Phala | $0.6000 | $3.0000 | 2.20% |
| MoonshotAI_Kimi_K25 | Venice | $0.5300 | $3.5000 | 8.90% |
| Tencent_Hy3_preview | SiliconFlow | $0.0350 | $0.2590 | 84.30% |
| OpenAI_GPT-53-Codex | OpenAI | $0.2820 | $14.0000 | 93.20% |
| OpenAI_GPT-53-Codex | Azure | $0.4470 | $14.0000 | 82.70% |
| StepFun_Step_35_Flash | StepFun | $0.0310 | $0.2990 | 86.10% |
| StepFun_Step_35_Flash | DeepInfra | $0.0900 | $0.2990 | 0.00% |
| StepFun_Step_35_Flash | SiliconFlow | $0.1000 | $0.3000 | 0.00% |
| OpenAI_GPT-54_Nano | OpenAI | $0.0870 | $1.2490 | 62.80% |
| OpenAI_GPT-54_Nano | Azure | $0.1530 | $1.2490 | 26.10% |
| DeepSeek_DeepSeek_V3_0324 | NovitaAI | $0.1800 | $1.1180 | 66.30% |
| DeepSeek_DeepSeek_V3_0324 | DeepInfra | $0.1600 | $0.7670 | 61.90% |
| DeepSeek_DeepSeek_V3_0324 | ModelRun | $0.1850 | $0.7980 | 50.40% |
| DeepSeek_DeepSeek_V3_0324 | SiliconFlow | $0.2500 | $1.0000 | 52.60% |
| DeepSeek_DeepSeek_V3_0324 | AtlasCloud | $0.2140 | $0.8770 | 4.00% |
| DeepSeek_DeepSeek_V3_0324 | GMICloud | $0.2890 | $1.1350 | 0.60% |
| Qwen_Qwen35_397B_A17B | Morph | $0.4220 | $3.5000 | 64.70% |
| Qwen_Qwen35_397B_A17B | Alibaba Cloud Int. | $0.3900 | $2.3400 | 0.00% |
| Qwen_Qwen35_397B_A17B | Chutes | $0.2710 | $3.0000 | 79.60% |
| Qwen_Qwen35_397B_A17B | DeepInfra | $0.4900 | $3.5990 | 0.00% |
| Qwen_Qwen35_397B_A17B | Together | $0.6000 | $3.6000 | 48.40% |
| Qwen_Qwen35_397B_A17B | NovitaAI | $0.6000 | $3.6000 | 5.10% |
| Qwen_Qwen35_397B_A17B | Nebius Token Factory | $0.6000 | $3.5980 | 0.00% |
| Qwen_Qwen35_397B_A17B | AtlasCloud | $0.5500 | $3.5000 | 0.00% |
| Qwen_Qwen35_397B_A17B | Phala | $0.5500 | $3.5000 | 35.50% |
| Qwen_Qwen35_397B_A17B | Parasail | $0.4090 | $3.6000 | 45.30% |
| Qwen_Qwen35_397B_A17B | GMICloud | $0.6000 | $3.6000 | 0.00% |
| Qwen_Qwen35_397B_A17B | Venice | $0.7500 | $4.5000 | 19.30% |
| Mistral_Mistral_Small_32_24B | Mistral | $0.0860 | $0.2990 | 15.90% |
| Mistral_Mistral_Small_32_24B | DeepInfra | $0.0750 | $0.1980 | 0.00% |
| Mistral_Mistral_Small_32_24B | Parasail | $0.0730 | $0.5980 | 41.90% |
| Mistral_Mistral_Small_32_24B | Venice | $0.0940 | $0.2490 | 0.00% |
| Meta_Llama_4_Maverick | Parasail | $0.3140 | $1.0000 | 19.30% |
| Meta_Llama_4_Maverick | DeepInfra | $0.1500 | $0.5970 | 0.00% |
| Meta_Llama_4_Maverick | NovitaAI | $0.2700 | $0.8460 | 0.00% |
| Meta_Llama_4_Maverick | SambaNova | $0.6300 | $1.7970 | 0.00% |
| Mistral_Mistral_Medium_35 | Mistral | $1.5000 | $7.4990 | 21.70% |
| Qwen_Qwen36_Flash | Alibaba Cloud Int. | $0.1920 | $1.1400 | 0.30% |
| OpenAI_GPT-41 | OpenAI | $1.1000 | $8.0000 | 60.00% |
| OpenAI_GPT-41 | Azure (1) | $1.0780 | $8.0000 | 61.50% |
| Mistral_Mistral_Nemo | DeepInfra | $0.0200 | $0.0370 | 0.00% |
| Mistral_Mistral_Nemo | DekaLLM | $0.0200 | $0.0250 | 0.00% |
| Mistral_Mistral_Nemo | Mistral | $0.0900 | $0.1430 | 44.40% |
| Mistral_Mistral_Nemo | NovitaAI | $0.0390 | $0.1640 | 0.00% |
| Google_Gemma_4_31B | DeepInfra Turbo | $0.1200 | $0.3690 | 0.00% |
| Google_Gemma_4_31B | DeepInfra | $0.1300 | $0.3790 | 0.00% |
| Google_Gemma_4_31B | NovitaAI | $0.1400 | $0.3990 | 5.80% |
| Google_Gemma_4_31B | Chutes | $0.1090 | $0.3780 | 31.80% |
| Google_Gemma_4_31B | Together (2) | $0.3900 | $0.9690 | 0.00% |
| Google_Gemma_4_31B | SiliconFlow | $0.1300 | $0.3990 | 0.00% |
| Google_Gemma_4_31B | Together (1) | $0.2800 | $0.8580 | 0.00% |
| Google_Gemma_4_31B | Ambient | $0.1180 | $0.3990 | 18.40% |
| Google_Gemma_4_31B | Venice | $0.1750 | $0.5000 | 12.00% |
| Google_Gemma_4_31B | Parasail | $0.1370 | $0.3990 | 6.40% |
| Google_Gemini_25_Flash_Lite | Google Vertex (EU) | $0.0800 | $0.3990 | 22.70% |
| Google_Gemini_25_Flash_Lite | Google Vertex | $0.0940 | $0.3990 | 7.40% |
| Google_Gemini_25_Flash_Lite | Google AI Studio | $0.0910 | $0.3990 | 12.00% |
| Google_Gemini_35_Flash | Google Vertex | $0.5520 | $9.0080 | 70.30% |
| Google_Gemini_35_Flash | Google AI Studio | $0.6340 | $7.9040 | 63.50% |
| DeepSeek_DeepSeek_V31 | Weights & Biases | $0.5500 | $1.6370 | 31.80% |
| DeepSeek_DeepSeek_V31 | NovitaAI | $0.2620 | $1.0000 | 5.70% |
| DeepSeek_DeepSeek_V31 | DeepInfra | $0.1750 | $0.7880 | 43.20% |
| DeepSeek_DeepSeek_V31 | SiliconFlow | $0.2700 | $1.0000 | 36.80% |
| DeepSeek_DeepSeek_V31 | AtlasCloud | $0.2870 | $0.9470 | 7.30% |
| DeepSeek_DeepSeek_V31 | Google Vertex | $0.6000 | $1.7000 | 19.30% |
| DeepSeek_DeepSeek_V31 | SambaNova | $0.6500 | $1.4990 | 0.00% |
| Google_Gemma_4_26B_A4B | Google Vertex | $0.1500 | $0.5970 | 19.10% |
| Google_Gemma_4_26B_A4B | NovitaAI | $0.1300 | $0.4000 | 16.80% |
| Google_Gemma_4_26B_A4B | NextBit | $0.1180 | $0.3980 | 63.40% |
| Google_Gemma_4_26B_A4B | DeepInfra | $0.0700 | $0.3370 | 0.00% |
| Google_Gemma_4_26B_A4B | Cloudflare | $0.1000 | $0.2970 | 0.00% |
| Google_Gemma_4_26B_A4B | Parasail | $0.0960 | $0.3980 | 41.80% |
| Google_Gemma_4_26B_A4B | DekaLLM | $0.0600 | $0.3270 | 0.00% |
| Google_Gemma_4_26B_A4B | SiliconFlow | $0.1200 | $0.3970 | 0.00% |
| Google_Gemma_4_26B_A4B | Venice | $0.1620 | $0.4990 | 74.60% |
| Google_Gemma_4_26B_A4B | io.net | $0.1500 | $0.5000 | 0.00% |
| OpenAI_gpt-oss-20b | Weights & Biases | $0.0500 | $0.2000 | 69.00% |
| OpenAI_gpt-oss-20b | DeepInfra | $0.0300 | $0.1400 | 0.00% |
| OpenAI_gpt-oss-20b | Amazon Bedrock (2) | $0.0700 | $0.1500 | 0.00% |
| OpenAI_gpt-oss-20b | Amazon Bedrock (1) | $0.0700 | $0.1500 | 0.00% |
| OpenAI_gpt-oss-20b | NovitaAI | $0.0400 | $0.1500 | 0.00% |
| OpenAI_gpt-oss-20b | Google Vertex | $0.0690 | $0.2460 | 63.20% |
| OpenAI_gpt-oss-20b | Groq | $0.0660 | $0.2990 | 22.40% |
| OpenAI_gpt-oss-20b | Parasail | $0.0250 | $0.1990 | 71.40% |
| OpenAI_gpt-oss-20b | SiliconFlow | $0.0400 | $0.1790 | 0.00% |
| OpenAI_gpt-oss-20b | Fireworks | $0.0580 | $0.2990 | 33.70% |
| OpenAI_gpt-oss-20b | NextBit | $0.1000 | $0.4490 | 0.00% |
| OpenAI_gpt-oss-20b | Together | $0.0490 | $0.1990 | 0.00% |
| OpenAI_GPT-41_Mini | OpenAI | $0.2800 | $1.5940 | 40.00% |
| OpenAI_GPT-41_Mini | Azure (2) | $0.2520 | $1.5980 | 49.30% |
| OpenAI_GPT-41_Mini | Azure (1) | $0.2020 | $1.5930 | 66.00% |
| DeepSeek_DeepSeek_V32 | NovitaAI | $0.2490 | $0.3990 | 15.20% |
| DeepSeek_DeepSeek_V32 | Baidu Qianfan | $0.1050 | $0.3760 | 64.90% |
| DeepSeek_DeepSeek_V32 | SiliconFlow | $0.1720 | $0.4190 | 69.80% |
| DeepSeek_DeepSeek_V32 | DeepInfra | $0.1950 | $0.3790 | 49.60% |
| DeepSeek_DeepSeek_V32 | AtlasCloud | $0.2410 | $0.3790 | 14.40% |
| DeepSeek_DeepSeek_V32 | Friendli | $0.3990 | $1.5000 | 40.50% |
| DeepSeek_DeepSeek_V32 | Alibaba Cloud Int. | $0.2730 | $1.1110 | 32.50% |
| DeepSeek_DeepSeek_V32 | Parasail | $0.2540 | $0.4470 | 17.20% |
| DeepSeek_DeepSeek_V32 | Google Vertex | $0.5600 | $1.6790 | 8.00% |
| Google_Gemini_20_Flash_Lite | Google Vertex | $0.0750 | $0.2980 | 0.00% |
| Google_Gemini_20_Flash_Lite | Google AI Studio | $0.0750 | $0.2950 | 0.00% |
| Anthropic_Claude_Sonnet_4 | Amazon Bedrock (1) | $1.6300 | $15.0000 | 53.70% |
| Anthropic_Claude_Sonnet_4 | Anthropic | $1.9410 | $15.0000 | 41.60% |
| Anthropic_Claude_Sonnet_4 | Amazon Bedrock (2) | $2.1510 | $15.0000 | 35.10% |
| Anthropic_Claude_Sonnet_4 | Google Vertex (Global) | $2.3880 | $15.0000 | 29.50% |
| Anthropic_Claude_Sonnet_4 | Google Vertex (Europe) | $1.7360 | $15.0000 | 58.40% |
| Anthropic_Claude_Haiku_45 | Amazon Bedrock (Global) | $0.3670 | $5.0000 | 72.40% |
| Anthropic_Claude_Haiku_45 | Anthropic | $0.5420 | $5.0000 | 54.80% |
| Anthropic_Claude_Haiku_45 | Google Vertex (Europe) | $0.9760 | $5.0000 | 2.80% |
| Anthropic_Claude_Haiku_45 | Amazon Bedrock | $0.5780 | $5.0000 | 48.70% |
| Anthropic_Claude_Haiku_45 | Google Vertex | $0.4890 | $5.0000 | 58.20% |
| OpenAI_GPT-54 | Azure | $1.8150 | $15.0000 | 30.40% |
| OpenAI_GPT-54 | OpenAI | $0.8980 | $15.1190 | 74.90% |
| OpenAI_GPT-5 | OpenAI | $0.4550 | $10.0000 | 70.60% |
| OpenAI_GPT-5 | Azure (1) | $1.2500 | $10.0000 | 0.00% |
| Xiaomi_MiMo-V2-Flash | Xiaomi | $0.0500 | $0.2980 | 55.00% |
| Xiaomi_MiMo-V2-Flash | NovitaAI | $0.0530 | $0.2990 | 58.70% |
| Qwen_Qwen35-35B-A3B | Parasail | $0.0910 | $1.0000 | 59.00% |
| Qwen_Qwen35-35B-A3B | Alibaba Cloud Int. | $0.1620 | $1.2990 | 0.00% |
| Qwen_Qwen35-35B-A3B | AkashML | $0.1600 | $1.2000 | 0.00% |
| Qwen_Qwen35-35B-A3B | Ambient | $0.1020 | $1.0000 | 41.60% |
| Qwen_Qwen35-35B-A3B | Venice | $0.1970 | $1.2500 | 73.90% |
| Qwen_Qwen35-35B-A3B | DeepInfra | $0.1400 | $1.0000 | 0.00% |
| Qwen_Qwen35-35B-A3B | DekaLLM | $0.1390 | $1.0000 | 0.00% |
| Qwen_Qwen35-35B-A3B | AtlasCloud | $0.2250 | $1.8000 | 0.00% |
| Qwen_Qwen35-35B-A3B | NextBit | $0.3000 | $1.8000 | 0.00% |
| Qwen_Qwen35-35B-A3B | SiliconFlow | $0.2400 | $1.8000 | 0.00% |
| Google_Gemini_25_Pro | Google Vertex (Global) | $1.0410 | $10.0650 | 30.10% |
| Google_Gemini_25_Pro | Google Vertex (EU) | $0.9680 | $10.0000 | 26.40% |
| Google_Gemini_25_Pro | Google AI Studio | $0.9680 | $10.0000 | 35.40% |
| Google_Gemini_25_Pro | Google Vertex (US) | $1.2200 | $10.0000 | 2.60% |
| Zai_GLM_51 | StreamLake | $0.3230 | $3.9590 | 91.30% |
| Zai_GLM_51 | Friendli | $0.6440 | $4.4000 | 66.30% |
| Zai_GLM_51 | Z.ai | $0.5660 | $4.3990 | 73.10% |
| Zai_GLM_51 | Chutes | $0.9960 | $4.0000 | 34.00% |
| Zai_GLM_51 | AtlasCloud | $0.5020 | $4.4000 | 78.50% |
| Zai_GLM_51 | DeepInfra | $0.4450 | $3.5000 | 71.60% |
| Zai_GLM_51 | SiliconFlow | $0.5070 | $4.3990 | 78.30% |
| Zai_GLM_51 | NovitaAI | $0.5120 | $4.4000 | 77.50% |
| Zai_GLM_51 | Baidu Qianfan | $0.5880 | $3.0790 | 49.10% |
| Zai_GLM_51 | Baseten | $1.3000 | $4.3000 | 45.60% |
| Zai_GLM_51 | Inceptron | $1.0810 | $4.4000 | 28.00% |
| Zai_GLM_51 | Together | $1.4000 | $4.4000 | 6.50% |
| Zai_GLM_51 | Parasail | $1.1110 | $4.4000 | 25.30% |
| Zai_GLM_51 | Ambient | $1.4000 | $4.4000 | 14.00% |
| Zai_GLM_51 | Phala | $1.2100 | $4.2000 | 0.70% |
| Zai_GLM_51 | io.net | $1.2900 | $4.4800 | 0.00% |
| Zai_GLM_51 | Fireworks | $1.2730 | $4.3990 | 11.10% |
| Zai_GLM_51 | Venice | $1.7470 | $5.5000 | 0.20% |
| Zai_GLM_51 | GMICloud | $0.0000 | $0.0000 | 0.00% |
| Zai_GLM_47 | Z.ai | $0.1170 | $2.1990 | 98.60% |
| Zai_GLM_47 | Google Vertex | $0.6000 | $2.2000 | 36.50% |
| Zai_GLM_47 | SiliconFlow | $0.2040 | $2.2000 | 72.30% |
| Zai_GLM_47 | DeepInfra | $0.2180 | $1.7500 | 56.90% |
| Zai_GLM_47 | Cerebras | $2.2500 | $2.7500 | 45.60% |
| Zai_GLM_47 | AtlasCloud | $0.3220 | $1.8500 | 49.60% |
| Zai_GLM_47 | Phala | $0.8500 | $3.2990 | 3.50% |
| Zai_GLM_47 | NovitaAI | $0.3700 | $2.0050 | 40.80% |
| Zai_GLM_47 | Parasail | $0.4270 | $2.1000 | 6.70% |
| Zai_GLM_47 | Venice | $0.5490 | $2.6490 | 0.30% |
| OpenAI_GPT-55 | OpenAI | $1.1170 | $30.6400 | 92.70% |
| OpenAI_GPT-55 | Azure | $2.3150 | $30.0820 | 62.00% |
| Google_Gemini_31_Flash_Lite | Google Vertex | $0.2100 | $1.4980 | 19.20% |
| Google_Gemini_31_Flash_Lite | Google AI Studio | $0.1860 | $1.4680 | 27.40% |
| Qwen_Qwen3_Coder_480B_A35B | DeepInfra (Turbo) | $0.1210 | $1.0000 | 89.30% |
| Qwen_Qwen3_Coder_480B_A35B | Together | $2.0000 | $2.0000 | 0.00% |
| Qwen_Qwen3_Coder_480B_A35B | Google Vertex | $0.2200 | $1.7990 | 49.20% |
| Qwen_Qwen3_Coder_480B_A35B | AtlasCloud | $0.7800 | $3.7980 | 0.00% |
| Qwen_Qwen3_Coder_480B_A35B | Weights & Biases | $1.0000 | $1.4980 | 70.50% |
| Qwen_Qwen3_Coder_480B_A35B | Alibaba OpenSource | $1.3730 | $6.8700 | 0.00% |
| Qwen_Qwen3_Coder_480B_A35B | NovitaAI | $0.3800 | $1.5490 | 0.00% |
| Qwen_Qwen3_Coder_480B_A35B | Venice | $0.3500 | $1.5000 | 0.00% |
| Google_Gemini_3_Flash_Preview | Google Vertex | $0.3600 | $2.9970 | 32.00% |
| Google_Gemini_3_Flash_Preview | Google AI Studio | $0.3480 | $2.9980 | 34.50% |
| Qwen_Qwen36_Plus | Alibaba Cloud Int. | $0.6690 | $1.9820 | 6.00% |
| Mistral_Mistral_Small_4 | Mistral | $0.1280 | $0.5990 | 16.10% |
| Mistral_Mistral_Small_4 | Venice | $0.1870 | $0.7490 | 11.20% |
| MoonshotAI_Kimi_K26 | SiliconFlow | $0.2920 | $4.0000 | 83.90% |
| MoonshotAI_Kimi_K26 | Moonshot AI | $0.3430 | $4.0000 | 76.80% |
| MoonshotAI_Kimi_K26 | Cloudflare | $0.6870 | $3.5000 | 13.60% |
| MoonshotAI_Kimi_K26 | Inceptron | $0.3360 | $3.5000 | 76.60% |
| MoonshotAI_Kimi_K26 | Weights & Biases | $0.5520 | $4.0000 | 50.30% |
| MoonshotAI_Kimi_K26 | NovitaAI | $0.2850 | $3.4000 | 80.50% |
| MoonshotAI_Kimi_K26 | Chutes | $0.4410 | $3.5000 | 80.80% |
| MoonshotAI_Kimi_K26 | Fireworks | $0.4160 | $4.0000 | 67.60% |
| MoonshotAI_Kimi_K26 | Together | $0.4250 | $4.5000 | 77.40% |
| MoonshotAI_Kimi_K26 | Parasail | $0.3120 | $3.5000 | 74.20% |
| MoonshotAI_Kimi_K26 | io.net | $0.7300 | $3.4900 | 0.00% |
| MoonshotAI_Kimi_K26 | AkashML | $0.9500 | $4.0000 | 0.00% |
| MoonshotAI_Kimi_K26 | DeepInfra | $0.3220 | $3.5000 | 71.30% |
| MoonshotAI_Kimi_K26 | AtlasCloud | $0.6370 | $4.0000 | 39.70% |
| MoonshotAI_Kimi_K26 | Nebius Token Factory | $0.9500 | $4.0000 | 0.00% |
| MoonshotAI_Kimi_K26 | StreamLake | $0.2760 | $3.8000 | 83.50% |
| MoonshotAI_Kimi_K26 | Phala | $1.0900 | $4.6000 | 6.10% |
| MoonshotAI_Kimi_K26 | Venice | $0.7270 | $4.6550 | 19.50% |
| Qwen_Qwen35-Flash | Alibaba Cloud Int. | $0.0650 | $0.2590 | 0.00% |
| OpenAI_GPT-5_Nano | OpenAI | $0.0380 | $0.3960 | 25.10% |
| OpenAI_GPT-5_Nano | Azure (1) | $0.0350 | $0.3990 | 36.50% |
| DeepSeek_DeepSeek_V4_Flash | DeepSeek | $0.0220 | $0.2800 | 86.10% |
| DeepSeek_DeepSeek_V4_Flash | SiliconFlow | $0.0890 | $0.2790 | 45.60% |
| DeepSeek_DeepSeek_V4_Flash | Alibaba Cloud Int. | $0.0720 | $0.2790 | 61.00% |
| DeepSeek_DeepSeek_V4_Flash | NovitaAI | $0.0720 | $0.2790 | 60.60% |
| DeepSeek_DeepSeek_V4_Flash | Parasail | $0.1300 | $0.2790 | 14.40% |
| DeepSeek_DeepSeek_V4_Flash | AtlasCloud | $0.0870 | $0.2790 | 47.40% |
| DeepSeek_DeepSeek_V4_Flash | DeepInfra | $0.0790 | $0.1990 | 25.70% |
| DeepSeek_DeepSeek_V4_Flash | GMICloud | $0.0690 | $0.2230 | 48.20% |
| DeepSeek_DeepSeek_V4_Flash | Baidu Qianfan | $0.0810 | $0.2510 | 44.40% |
| DeepSeek_DeepSeek_V4_Flash | AkashML | $0.1400 | $0.2790 | 0.00% |
| DeepSeek_DeepSeek_V4_Flash | Venice | $0.1510 | $0.3490 | 13.60% |
| OpenAI_GPT-52 | OpenAI | $0.9490 | $14.0000 | 50.90% |
| OpenAI_GPT-52 | Azure | $1.0480 | $14.0000 | 44.60% |
| Google_Gemini_25_Flash | Google Vertex (EU) | $0.2350 | $2.4990 | 25.30% |
| Google_Gemini_25_Flash | Google Vertex (Global) | $0.2330 | $2.4990 | 27.50% |
| Google_Gemini_25_Flash | Google AI Studio | $0.1770 | $2.4990 | 47.80% |
| Google_Gemini_25_Flash | Google Vertex | $0.1780 | $2.5000 | 45.10% |
| Google_Gemini_31_Pro_Preview | Google Vertex | $1.4470 | $12.0080 | 37.30% |
| Google_Gemini_31_Pro_Preview | Google AI Studio | $1.5670 | $11.9550 | 24.20% |
| OpenAI_GPT-5_Mini | OpenAI | $0.1330 | $1.9960 | 50.70% |
| OpenAI_GPT-5_Mini | Azure (1) | $0.0900 | $2.0000 | 72.80% |
| Xiaomi_MiMo-V25 | Xiaomi | $0.1830 | $2.0570 | 74.20% |
| Owl_Alpha | Stealth | $0.0000 | $0.0000 | 62.60% |
| DeepSeek_DeepSeek_V4_Pro | SiliconFlow | $0.7450 | $3.4790 | 59.30% |
| DeepSeek_DeepSeek_V4_Pro | DeepSeek | $0.0560 | $0.8690 | 87.90% |
| DeepSeek_DeepSeek_V4_Pro | NovitaAI | $0.4060 | $3.3790 | 82.00% |
| DeepSeek_DeepSeek_V4_Pro | Alibaba Cloud Int. | $0.9230 | $3.3600 | 49.10% |
| DeepSeek_DeepSeek_V4_Pro | GMICloud | $0.2460 | $2.7830 | 89.80% |
| DeepSeek_DeepSeek_V4_Pro | DeepInfra | $0.7350 | $2.5990 | 47.10% |
| DeepSeek_DeepSeek_V4_Pro | Baidu Qianfan | $0.8920 | $3.0410 | 45.10% |
| DeepSeek_DeepSeek_V4_Pro | AtlasCloud | $1.1450 | $3.3800 | 34.50% |
| DeepSeek_DeepSeek_V4_Pro | Parasail | $1.7220 | $3.4780 | 2.10% |
| DeepSeek_DeepSeek_V4_Pro | Fireworks | $1.3190 | $3.4800 | 26.40% |
| DeepSeek_DeepSeek_V4_Pro | Together | $1.4570 | $4.4000 | 33.80% |
| DeepSeek_DeepSeek_V4_Pro | Venice | $0.7990 | $3.7950 | 66.50% |
| OpenAI_GPT-4o-mini | OpenAI | $0.1440 | $0.5970 | 8.00% |
| OpenAI_GPT-4o-mini | Azure (1) | $0.1260 | $0.5980 | 32.00% |
| OpenAI_GPT-4o-mini | Azure (2) | $0.1250 | $0.5980 | 33.50% |
| Google_Gemini_31_Flash_Lite_Preview | Google AI Studio | $0.1640 | $1.4990 | 38.30% |
| Google_Gemini_31_Flash_Lite_Preview | Google Vertex | $0.2300 | $1.4990 | 10.40% |
| Zai_GLM_45_Air | Z.ai | $0.0750 | $1.0960 | 73.50% |
| Zai_GLM_45_Air | NovitaAI | $0.0540 | $0.8460 | 72.30% |
| Zai_GLM_45_Air | SiliconFlow | $0.1400 | $0.8580 | 42.40% |
| Anthropic_Claude_Sonnet_45 | Amazon Bedrock (1) | $2.2850 | $15.0010 | 30.40% |
| Anthropic_Claude_Sonnet_45 | Google Vertex (Global) | $1.6100 | $15.0000 | 55.10% |
| Anthropic_Claude_Sonnet_45 | Claude Platform on AWS | $0.9770 | $15.0000 | 77.80% |
| Anthropic_Claude_Sonnet_45 | Anthropic | $3.0210 | $15.0000 | 5.40% |
| Anthropic_Claude_Sonnet_45 | Google Vertex | $0.9840 | $15.0000 | 78.40% |
| Anthropic_Claude_Sonnet_45 | Amazon Bedrock (2) | $1.7740 | $15.0000 | 48.70% |
| MiniMax_MiniMax_M25 | DeepInfra | $0.0750 | $1.1480 | 62.50% |
| MiniMax_MiniMax_M25 | MiniMax Highspeed | $0.1800 | $2.3990 | 77.80% |
| MiniMax_MiniMax_M25 | Inceptron | $0.0740 | $0.8980 | 78.80% |
| MiniMax_MiniMax_M25 | MiniMax | $0.0700 | $1.1990 | 85.30% |
| MiniMax_MiniMax_M25 | AtlasCloud | $0.1110 | $1.1980 | 78.20% |
| MiniMax_MiniMax_M25 | NovitaAI | $0.0710 | $1.1970 | 84.90% |
| MiniMax_MiniMax_M25 | Chutes | $0.1080 | $1.1990 | 56.30% |
| MiniMax_MiniMax_M25 | Baidu Qianfan | $0.1170 | $1.0790 | 62.90% |
| MiniMax_MiniMax_M25 | Friendli | $0.1090 | $1.1990 | 79.50% |
| MiniMax_MiniMax_M25 | AkashML | $0.1500 | $1.1490 | 0.00% |
| MiniMax_MiniMax_M25 | Parasail | $0.1310 | $1.1990 | 62.60% |
| MiniMax_MiniMax_M25 | MARA | $0.3000 | $1.2000 | 0.00% |
| MiniMax_MiniMax_M25 | SiliconFlow | $0.1580 | $1.1990 | 52.60% |
| MiniMax_MiniMax_M25 | Phala | $0.2000 | $1.3790 | 32.90% |
| MiniMax_MiniMax_M25 | Weights & Biases | $0.2990 | $1.1960 | 24.20% |
| MiniMax_MiniMax_M25 | StreamLake | $0.3050 | $1.1990 | 6.80% |
| MiniMax_MiniMax_M25 | Venice | $0.1480 | $1.1880 | 63.80% |
| Anthropic_Claude_Opus_45 | Google Vertex | $4.0740 | $25.0000 | 36.70% |
| Anthropic_Claude_Opus_45 | Amazon Bedrock (1) | $0.9350 | $25.0000 | 91.90% |
| Anthropic_Claude_Opus_45 | Amazon Bedrock (2) | $3.3230 | $25.0000 | 41.30% |
| Anthropic_Claude_Opus_45 | Anthropic | $4.4120 | $25.0000 | 23.00% |
| Anthropic_Claude_Opus_45 | Claude Platform on AWS | $1.7950 | $25.0000 | 74.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Weights & Biases | $0.1000 | $0.0950 | 32.30% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | NovitaAI | $0.0900 | $0.5770 | 0.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | DeepInfra | $0.0710 | $0.0970 | 0.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Parasail | $0.0780 | $0.5990 | 43.10% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Alibaba Cloud Int. | $0.1490 | $0.5630 | 0.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Google Vertex (2) | $0.2200 | $0.8790 | 28.40% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Cerebras | $0.6000 | $1.1950 | 69.90% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Google Vertex (1) | $0.2500 | $1.0000 | 0.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Friendli | $0.2000 | $0.7980 | 0.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | Together | $0.2000 | $0.5980 | 0.00% |
| Qwen_Qwen3_235B_A22B_Instruct_2507 | AtlasCloud | $0.2000 | $0.8760 | 0.00% |
| OpenAI_GPT-51_Chat | OpenAI | $0.6280 | $10.0000 | 55.30% |
| Google_Gemini_20_Flash | Google Vertex | $0.0980 | $0.3990 | 5.40% |
| Google_Gemini_20_Flash | Google AI Studio | $0.1010 | $0.3990 | 0.60% |
Opinion
The "cheap" providers aren't always cheap. Besides that, one thing we did not discuss in this essay is the constant price increases from almost every provider. The numbers look worse when you consider the inference has been getting more expensive too. It makes increasingly more sense to move to a hybrid first (with a smaller local model) and then fully local setup eventually for coding agents.