Back to Posts / Cache Hit Rates

Cache hit rates of Inference are more meaningful than the headline costs

By Max Trivedi

Tl;Dr: Agents push the full conversation history into context every turn; hence, over a large number of turns, they are extremely read heavy, which in turn is why cache hit rates are an important factor. This post is an analysis of 60+ providers and their cache hit rates using 398 data points. All data sourced from openrouter.ai model pages. This post assumes the reader is familiar with Prefix Caching and all mentions of Caching in this post refer to Prefix Caching.

Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.

Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.

Context processing over multi-turn conversation grows quadratically. Every turn passes the full conversation up until that point into context along with its own input, e.g. turn 100 will push everything up to turn 99 again in the context window. The LLM, on its end, will try to match the longest sequence it can to the caches it has available and build the rest of the conversation as newly input tokens, typically at 10x the price. So on a long 200k context conversation (which btw is a bad idea capability-wise even if you don't care about costs), if the model that costs $5 per million input tokens fails to hit any cache, you'll be charged $1 just for the input processing. Two things determine this:

  • 1. Cached input pricing - the headline metric everyone looks at.
  • 2. Cache hit rate - the hidden variable that nobody talks about.

I recently spent significant hours thinking there was a bug in Dirac that caused caches to break, only to find in the end that it was entirely due to Gemini Flash 3's cache hit rate.

While trying to look up the data on this, I found that OpenRouter fortunately publishes this data (go to model's page and look for 'Effective Pricing' section). Since the data is hourly, we have to assume that it doesn't change too much hour to hour.

Provider Cache-hit Tier list

Providers with multiple endpoints (e.g. Amazon Bedrock US, Bedrock Global, Bedrock (1)) are listed separately — each entry reflects the hit rate of that specific endpoint as observed.

DeepSeek remains the gold standard of caching, which probably doesn't surprise anyone who has used their official API. In fact, all S-tier entries (hitting 75%+ cache rates) are Chinese labs: DeepSeek (87%), StepFun (86.1%), Moonshot AI (84.8%), MiniMax (75.4%), and Xiaomi (74.7%).

The mainstream US labs place somewhere in the middle but as we will see in the next section, the variance is huge and rather interesting.

On the flip side, we have the "F-Tier". Providers like io.net, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models.

US closed-source big 3

The most interesting thing to me from the chart above is, older models from the same provider tend to get lower cache hit rates. If I had to explain it non-cynically, I would guess that from systems engineering POV, it's probably the cache-pool sizes allocated to each model.

Google does worse than the two other providers across the board, especially considering that they own the full stack on TPUs. This gets full-on clowny when you look at the Vertex AI numbers (see table below) - Opus 4.7 on Vertex AI has 65.30% cache hit rate while Google's own Google_Gemini_3.1_Pro_Preview has 37.30% (and this trend applies to all Claude vs Gemini hosted on Vertex)! How do you manage to get a lower cache hit rate on your own hardware with your model trained on that hardware than a competitor's model? If I was to speculate, I'd guess that the whole 'thought signature' architecture is just not working out.

OSS Models Cost Comparison

OSS models, depending on who you use them from, make a huge difference, mostly due to cache-hit rates.

Model Cheapest effectiveInputPrice (Provider) Most Expensive effectiveInputPrice (Provider) Difference (Percentage)
Kimi K2.6 0.2760 (StreamLake) 1.0900 (Phala) 0.8140 (294.93%)
MiMo-V2.5-Pro 0.3720 (Xiaomi) 0.9060 (DeepInfra) 0.5340 (143.55%)
DeepSeek V4 Pro (Max) 0.0560 (DeepSeek) 1.7220 (Parasail) 1.6660 (2975.00%)
GLM-5.1 0.3230 (StreamLake) 1.7470 (Venice) 1.4240 (440.87%)
MiniMax-M2.7 0.1430 (MiniMax) 0.6000 (SambaNova) 0.4570 (319.58%)

Small Model Grift

Now onto smaller models that seem instinctively cheaper. Below is the average effective pricing for 4 of the most popular local models (hi r/localLlama)

Model Name Total Providers Avg Eff. Input Avg Eff. Output Avg Cache Hit
Google_Gemma_4_26B_A4B 10 $0.1156 $0.4150 21.57%
Google_Gemma_4_31B 10 $0.1729 $0.5049 7.44%
Qwen_Qwen36_35B_A3B 7 $0.1643 $1.1450 11.54%
Qwen_Qwen36_27B 8 $0.4096 $2.9433 7.66%

Compare that to

Model Name (Official API) Eff. Input Price Eff. Output Price Cache Hit Rate
DeepSeek_DeepSeek_V4_Pro $0.0560 $0.8690 87.90%
DeepSeek_DeepSeek_V4_Flash $0.0220 $0.2800 86.10%

Yup, you can use DeepSeek V4 Pro, a 1.6 Trillion parameter model whose active 49B parameters are higher than the total parameters of any small model, for cheaper than you can use either of Qwen3.6 models. Thanks to the providers like io.net and DeepInfra offering $0.32/$3.20 input output pricing with 0 caching.

The full table

Model Name Provider Eff. Input Price Eff. Output Price Cache Hit Rate
Zai_GLM_5 SiliconFlow $0.3100 $2.5490 85.30%
Zai_GLM_5 Baidu Qianfan $0.3930 $2.2390 54.70%
Zai_GLM_5 GMICloud $0.4200 $1.9200 37.50%
Zai_GLM_5 DeepInfra $0.3410 $2.0790 54.00%
Zai_GLM_5 Z.ai $0.4710 $3.1990 66.10%
Zai_GLM_5 Amazon Bedrock $1.0000 $3.1990 0.10%
Zai_GLM_5 Friendli $0.8700 $3.2000 26.00%
Zai_GLM_5 StreamLake $0.4650 $2.0790 35.60%
Zai_GLM_5 NovitaAI $0.7340 $3.2000 33.20%
Zai_GLM_5 AtlasCloud $0.7830 $3.1500 22.00%
Zai_GLM_5 Parasail $0.7720 $3.2000 28.60%
Zai_GLM_5 Together $1.0000 $3.2000 0.80%
Zai_GLM_5 Chutes $0.9410 $2.5500 1.80%
Zai_GLM_5 Phala $1.2000 $3.5000 0.10%
Qwen_Qwen3_VL_32B_Instruct Alibaba Cloud Int. $0.1040 $0.4140 0.00%
Qwen_Qwen36_35B_A3B Parasail $0.1020 $1.0000 47.60%
Qwen_Qwen36_35B_A3B Ambient $0.1170 $1.0000 33.20%
Qwen_Qwen36_35B_A3B io.net $0.1500 $1.0000 0.00%
Qwen_Qwen36_35B_A3B AkashML $0.1700 $1.2000 0.00%
Qwen_Qwen36_35B_A3B AtlasCloud $0.1610 $0.9650 0.00%
Qwen_Qwen36_35B_A3B Weights & Biases $0.2500 $1.2500 0.00%
Qwen_Qwen36_35B_A3B SiliconFlow $0.2000 $1.6000 0.00%
OpenAI_GPT-41_Nano OpenAI $0.0920 $0.3990 11.00%
OpenAI_GPT-41_Nano Azure (1) $0.0710 $0.3990 41.30%
OpenAI_GPT-41_Nano Azure (2) $0.1000 $0.3980 0.00%
xAI_Grok_43 xAI $0.7490 $2.5000 47.80%
Anthropic_Claude_Sonnet_46 Claude Platform on AWS $0.9370 $15.0000 79.30%
Anthropic_Claude_Sonnet_46 Anthropic $0.6070 $15.0000 89.90%
Anthropic_Claude_Sonnet_46 Google Vertex (US East) $2.6200 $15.0000 21.80%
Anthropic_Claude_Sonnet_46 Amazon Bedrock (Global) $1.3370 $15.0000 64.00%
Anthropic_Claude_Sonnet_46 Amazon Bedrock $0.9400 $15.0000 78.40%
Anthropic_Claude_Sonnet_46 Google Vertex (Global) $1.8020 $15.0000 52.70%
Anthropic_Claude_Sonnet_46 Google Vertex (Europe) $2.4390 $15.0000 28.70%
Xiaomi_MiMo-V25-Pro Xiaomi $0.3720 $3.1670 94.80%
Xiaomi_MiMo-V25-Pro DeepInfra $0.9060 $3.0000 11.70%
Qwen_Qwen3_Coder_Next Ionstream $0.0860 $0.7990 61.10%
Qwen_Qwen3_Coder_Next Parasail $0.0860 $0.7990 68.40%
Qwen_Qwen3_Coder_Next AtlasCloud $0.1800 $1.3490 0.00%
Qwen_Qwen3_Coder_Next NovitaAI $0.2000 $1.4990 0.00%
Anthropic_Claude_Opus_46 Claude Platform on AWS $2.3320 $25.0000 63.50%
Anthropic_Claude_Opus_46 Amazon Bedrock $1.4750 $25.0000 81.40%
Anthropic_Claude_Opus_46 Anthropic $1.6520 $25.0000 79.00%
Anthropic_Claude_Opus_46 Google Vertex $1.9690 $25.0000 71.10%
Anthropic_Claude_Opus_46 Google Vertex (Europe) $3.0870 $25.0000 44.30%
Anthropic_Claude_Opus_46 Azure $6.2500 $25.0000 0.00%
Anthropic_Claude_Opus_47 Claude Platform on AWS $1.8440 $25.0000 72.40%
Anthropic_Claude_Opus_47 Google Vertex $2.4580 $25.0000 65.30%
Anthropic_Claude_Opus_47 Amazon Bedrock (US) $3.9520 $25.0000 23.70%
Anthropic_Claude_Opus_47 Amazon Bedrock $4.9530 $25.0000 1.20%
Anthropic_Claude_Opus_47 Google Vertex (Europe) $2.3000 $25.0000 68.20%
Anthropic_Claude_Opus_47 Anthropic $1.5920 $25.0000 79.10%
MiniMax_MiniMax_M27 MiniMax $0.1430 $1.2000 65.60%
MiniMax_MiniMax_M27 Together $0.2010 $1.1990 41.20%
MiniMax_MiniMax_M27 Morph $0.2790 $1.1990 73.90%
MiniMax_MiniMax_M27 Fireworks $0.2050 $1.1990 39.20%
MiniMax_MiniMax_M27 MiniMax Highspeed $0.2510 $2.3990 64.70%
MiniMax_MiniMax_M27 SambaNova $0.6000 $2.3990 0.00%
Qwen_Qwen36_27B DeepInfra $0.3200 $3.1990 0.00%
Qwen_Qwen36_27B Alibaba Cloud Int. $0.4500 $2.6990 0.00%
Qwen_Qwen36_27B Ambient $0.2670 $3.2000 32.80%
Qwen_Qwen36_27B Weights & Biases $0.6000 $3.6000 0.00%
Qwen_Qwen36_27B io.net $0.3170 $3.1990 0.00%
Qwen_Qwen36_27B Morph $0.4980 $2.3990 28.50%
Qwen_Qwen36_27B Chutes $0.5000 $2.0000 0.00%
Qwen_Qwen36_27B Venice $0.3250 $3.2500 0.00%
OpenAI_gpt-oss-120b Google Vertex $0.0900 $0.3590 4.10%
OpenAI_gpt-oss-120b DeepInfra $0.0390 $0.1890 0.00%
OpenAI_gpt-oss-120b Groq $0.1200 $0.5990 40.00%
OpenAI_gpt-oss-120b Cerebras $0.3500 $0.7490 48.60%
OpenAI_gpt-oss-120b DekaLLM $0.0390 $0.1770 0.00%
OpenAI_gpt-oss-120b Baseten $0.1000 $0.4990 52.10%
OpenAI_gpt-oss-120b NovitaAI $0.0500 $0.2490 2.50%
OpenAI_gpt-oss-120b Ambient $0.1070 $0.6000 57.00%
OpenAI_gpt-oss-120b DeepInfra (Turbo) $0.1500 $0.5990 0.00%
OpenAI_gpt-oss-120b Parasail $0.0890 $0.7490 23.60%
OpenAI_gpt-oss-120b SiliconFlow $0.0500 $0.4490 0.00%
OpenAI_gpt-oss-120b Amazon Bedrock (1) $0.1500 $0.5990 0.00%
OpenAI_gpt-oss-120b Nebius Token Factory $0.1500 $0.5990 0.00%
OpenAI_gpt-oss-120b SambaNova Dedicated $0.1200 $0.8990 0.00%
OpenAI_gpt-oss-120b SambaNova $0.1400 $0.9490 0.00%
OpenAI_gpt-oss-120b Together $0.1500 $0.6000 0.00%
OpenAI_gpt-oss-120b Phala $0.1000 $0.4890 14.00%
OpenAI_gpt-oss-120b MARA $0.1500 $0.7490 0.00%
OpenAI_gpt-oss-120b Weights & Biases $0.1500 $0.5990 12.70%
OpenAI_gpt-oss-120b Amazon Bedrock (2) $0.1500 $0.5990 0.00%
Zai_GLM_47_Flash DeepInfra $0.0230 $0.3990 73.40%
Zai_GLM_47_Flash NovitaAI $0.0610 $0.3990 14.30%
Zai_GLM_47_Flash Cloudflare $0.0600 $0.3970 0.00%
Zai_GLM_47_Flash Phala $0.1000 $0.4290 0.00%
Zai_GLM_47_Flash Z.ai $0.0430 $0.3990 44.20%
Zai_GLM_47_Flash Venice $0.1250 $0.4960 14.40%
OpenAI_GPT-51 OpenAI $0.8220 $10.0000 38.10%
OpenAI_GPT-51 Azure (1) $0.9550 $10.0000 26.30%
OpenAI_GPT-54_Mini OpenAI $0.3370 $4.5110 61.50%
OpenAI_GPT-54_Mini Azure $0.6840 $4.4990 9.80%
Meta_Llama_31_8B_Instruct Groq $0.0330 $0.0750 67.80%
Meta_Llama_31_8B_Instruct DeepInfra $0.0200 $0.0480 0.00%
Meta_Llama_31_8B_Instruct NovitaAI $0.0200 $0.0480 0.00%
Meta_Llama_31_8B_Instruct Cerebras $0.0990 $0.0950 87.50%
Meta_Llama_31_8B_Instruct Cloudflare $0.1520 $0.2850 0.00%
Qwen_Qwen35-9B DeepInfra $0.0400 $0.1480 0.00%
Qwen_Qwen35-9B Together $0.1000 $0.1490 0.00%
Qwen_Qwen35-9B SiliconFlow $0.1000 $0.1490 0.00%
Qwen_Qwen35-9B Venice $0.1000 $0.1490 35.50%
MoonshotAI_Kimi_K25 DeepInfra $0.1660 $2.2490 74.80%
MoonshotAI_Kimi_K25 NovitaAI $0.2320 $2.8490 71.10%
MoonshotAI_Kimi_K25 ModelRun $0.1610 $1.9000 77.10%
MoonshotAI_Kimi_K25 Moonshot AI $0.1360 $3.0000 92.80%
MoonshotAI_Kimi_K25 Fireworks $0.1960 $3.0000 80.70%
MoonshotAI_Kimi_K25 Chutes $0.3790 $2.0000 27.70%
MoonshotAI_Kimi_K25 AtlasCloud $0.3790 $2.4990 38.10%
MoonshotAI_Kimi_K25 SiliconFlow $0.2590 $2.2500 50.10%
MoonshotAI_Kimi_K25 Cloudflare $0.3390 $3.0000 52.20%
MoonshotAI_Kimi_K25 Parasail $0.3810 $2.7990 54.80%
MoonshotAI_Kimi_K25 Phala $0.6000 $3.0000 2.20%
MoonshotAI_Kimi_K25 Venice $0.5300 $3.5000 8.90%
Tencent_Hy3_preview SiliconFlow $0.0350 $0.2590 84.30%
OpenAI_GPT-53-Codex OpenAI $0.2820 $14.0000 93.20%
OpenAI_GPT-53-Codex Azure $0.4470 $14.0000 82.70%
StepFun_Step_35_Flash StepFun $0.0310 $0.2990 86.10%
StepFun_Step_35_Flash DeepInfra $0.0900 $0.2990 0.00%
StepFun_Step_35_Flash SiliconFlow $0.1000 $0.3000 0.00%
OpenAI_GPT-54_Nano OpenAI $0.0870 $1.2490 62.80%
OpenAI_GPT-54_Nano Azure $0.1530 $1.2490 26.10%
DeepSeek_DeepSeek_V3_0324 NovitaAI $0.1800 $1.1180 66.30%
DeepSeek_DeepSeek_V3_0324 DeepInfra $0.1600 $0.7670 61.90%
DeepSeek_DeepSeek_V3_0324 ModelRun $0.1850 $0.7980 50.40%
DeepSeek_DeepSeek_V3_0324 SiliconFlow $0.2500 $1.0000 52.60%
DeepSeek_DeepSeek_V3_0324 AtlasCloud $0.2140 $0.8770 4.00%
DeepSeek_DeepSeek_V3_0324 GMICloud $0.2890 $1.1350 0.60%
Qwen_Qwen35_397B_A17B Morph $0.4220 $3.5000 64.70%
Qwen_Qwen35_397B_A17B Alibaba Cloud Int. $0.3900 $2.3400 0.00%
Qwen_Qwen35_397B_A17B Chutes $0.2710 $3.0000 79.60%
Qwen_Qwen35_397B_A17B DeepInfra $0.4900 $3.5990 0.00%
Qwen_Qwen35_397B_A17B Together $0.6000 $3.6000 48.40%
Qwen_Qwen35_397B_A17B NovitaAI $0.6000 $3.6000 5.10%
Qwen_Qwen35_397B_A17B Nebius Token Factory $0.6000 $3.5980 0.00%
Qwen_Qwen35_397B_A17B AtlasCloud $0.5500 $3.5000 0.00%
Qwen_Qwen35_397B_A17B Phala $0.5500 $3.5000 35.50%
Qwen_Qwen35_397B_A17B Parasail $0.4090 $3.6000 45.30%
Qwen_Qwen35_397B_A17B GMICloud $0.6000 $3.6000 0.00%
Qwen_Qwen35_397B_A17B Venice $0.7500 $4.5000 19.30%
Mistral_Mistral_Small_32_24B Mistral $0.0860 $0.2990 15.90%
Mistral_Mistral_Small_32_24B DeepInfra $0.0750 $0.1980 0.00%
Mistral_Mistral_Small_32_24B Parasail $0.0730 $0.5980 41.90%
Mistral_Mistral_Small_32_24B Venice $0.0940 $0.2490 0.00%
Meta_Llama_4_Maverick Parasail $0.3140 $1.0000 19.30%
Meta_Llama_4_Maverick DeepInfra $0.1500 $0.5970 0.00%
Meta_Llama_4_Maverick NovitaAI $0.2700 $0.8460 0.00%
Meta_Llama_4_Maverick SambaNova $0.6300 $1.7970 0.00%
Mistral_Mistral_Medium_35 Mistral $1.5000 $7.4990 21.70%
Qwen_Qwen36_Flash Alibaba Cloud Int. $0.1920 $1.1400 0.30%
OpenAI_GPT-41 OpenAI $1.1000 $8.0000 60.00%
OpenAI_GPT-41 Azure (1) $1.0780 $8.0000 61.50%
Mistral_Mistral_Nemo DeepInfra $0.0200 $0.0370 0.00%
Mistral_Mistral_Nemo DekaLLM $0.0200 $0.0250 0.00%
Mistral_Mistral_Nemo Mistral $0.0900 $0.1430 44.40%
Mistral_Mistral_Nemo NovitaAI $0.0390 $0.1640 0.00%
Google_Gemma_4_31B DeepInfra Turbo $0.1200 $0.3690 0.00%
Google_Gemma_4_31B DeepInfra $0.1300 $0.3790 0.00%
Google_Gemma_4_31B NovitaAI $0.1400 $0.3990 5.80%
Google_Gemma_4_31B Chutes $0.1090 $0.3780 31.80%
Google_Gemma_4_31B Together (2) $0.3900 $0.9690 0.00%
Google_Gemma_4_31B SiliconFlow $0.1300 $0.3990 0.00%
Google_Gemma_4_31B Together (1) $0.2800 $0.8580 0.00%
Google_Gemma_4_31B Ambient $0.1180 $0.3990 18.40%
Google_Gemma_4_31B Venice $0.1750 $0.5000 12.00%
Google_Gemma_4_31B Parasail $0.1370 $0.3990 6.40%
Google_Gemini_25_Flash_Lite Google Vertex (EU) $0.0800 $0.3990 22.70%
Google_Gemini_25_Flash_Lite Google Vertex $0.0940 $0.3990 7.40%
Google_Gemini_25_Flash_Lite Google AI Studio $0.0910 $0.3990 12.00%
Google_Gemini_35_Flash Google Vertex $0.5520 $9.0080 70.30%
Google_Gemini_35_Flash Google AI Studio $0.6340 $7.9040 63.50%
DeepSeek_DeepSeek_V31 Weights & Biases $0.5500 $1.6370 31.80%
DeepSeek_DeepSeek_V31 NovitaAI $0.2620 $1.0000 5.70%
DeepSeek_DeepSeek_V31 DeepInfra $0.1750 $0.7880 43.20%
DeepSeek_DeepSeek_V31 SiliconFlow $0.2700 $1.0000 36.80%
DeepSeek_DeepSeek_V31 AtlasCloud $0.2870 $0.9470 7.30%
DeepSeek_DeepSeek_V31 Google Vertex $0.6000 $1.7000 19.30%
DeepSeek_DeepSeek_V31 SambaNova $0.6500 $1.4990 0.00%
Google_Gemma_4_26B_A4B Google Vertex $0.1500 $0.5970 19.10%
Google_Gemma_4_26B_A4B NovitaAI $0.1300 $0.4000 16.80%
Google_Gemma_4_26B_A4B NextBit $0.1180 $0.3980 63.40%
Google_Gemma_4_26B_A4B DeepInfra $0.0700 $0.3370 0.00%
Google_Gemma_4_26B_A4B Cloudflare $0.1000 $0.2970 0.00%
Google_Gemma_4_26B_A4B Parasail $0.0960 $0.3980 41.80%
Google_Gemma_4_26B_A4B DekaLLM $0.0600 $0.3270 0.00%
Google_Gemma_4_26B_A4B SiliconFlow $0.1200 $0.3970 0.00%
Google_Gemma_4_26B_A4B Venice $0.1620 $0.4990 74.60%
Google_Gemma_4_26B_A4B io.net $0.1500 $0.5000 0.00%
OpenAI_gpt-oss-20b Weights & Biases $0.0500 $0.2000 69.00%
OpenAI_gpt-oss-20b DeepInfra $0.0300 $0.1400 0.00%
OpenAI_gpt-oss-20b Amazon Bedrock (2) $0.0700 $0.1500 0.00%
OpenAI_gpt-oss-20b Amazon Bedrock (1) $0.0700 $0.1500 0.00%
OpenAI_gpt-oss-20b NovitaAI $0.0400 $0.1500 0.00%
OpenAI_gpt-oss-20b Google Vertex $0.0690 $0.2460 63.20%
OpenAI_gpt-oss-20b Groq $0.0660 $0.2990 22.40%
OpenAI_gpt-oss-20b Parasail $0.0250 $0.1990 71.40%
OpenAI_gpt-oss-20b SiliconFlow $0.0400 $0.1790 0.00%
OpenAI_gpt-oss-20b Fireworks $0.0580 $0.2990 33.70%
OpenAI_gpt-oss-20b NextBit $0.1000 $0.4490 0.00%
OpenAI_gpt-oss-20b Together $0.0490 $0.1990 0.00%
OpenAI_GPT-41_Mini OpenAI $0.2800 $1.5940 40.00%
OpenAI_GPT-41_Mini Azure (2) $0.2520 $1.5980 49.30%
OpenAI_GPT-41_Mini Azure (1) $0.2020 $1.5930 66.00%
DeepSeek_DeepSeek_V32 NovitaAI $0.2490 $0.3990 15.20%
DeepSeek_DeepSeek_V32 Baidu Qianfan $0.1050 $0.3760 64.90%
DeepSeek_DeepSeek_V32 SiliconFlow $0.1720 $0.4190 69.80%
DeepSeek_DeepSeek_V32 DeepInfra $0.1950 $0.3790 49.60%
DeepSeek_DeepSeek_V32 AtlasCloud $0.2410 $0.3790 14.40%
DeepSeek_DeepSeek_V32 Friendli $0.3990 $1.5000 40.50%
DeepSeek_DeepSeek_V32 Alibaba Cloud Int. $0.2730 $1.1110 32.50%
DeepSeek_DeepSeek_V32 Parasail $0.2540 $0.4470 17.20%
DeepSeek_DeepSeek_V32 Google Vertex $0.5600 $1.6790 8.00%
Google_Gemini_20_Flash_Lite Google Vertex $0.0750 $0.2980 0.00%
Google_Gemini_20_Flash_Lite Google AI Studio $0.0750 $0.2950 0.00%
Anthropic_Claude_Sonnet_4 Amazon Bedrock (1) $1.6300 $15.0000 53.70%
Anthropic_Claude_Sonnet_4 Anthropic $1.9410 $15.0000 41.60%
Anthropic_Claude_Sonnet_4 Amazon Bedrock (2) $2.1510 $15.0000 35.10%
Anthropic_Claude_Sonnet_4 Google Vertex (Global) $2.3880 $15.0000 29.50%
Anthropic_Claude_Sonnet_4 Google Vertex (Europe) $1.7360 $15.0000 58.40%
Anthropic_Claude_Haiku_45 Amazon Bedrock (Global) $0.3670 $5.0000 72.40%
Anthropic_Claude_Haiku_45 Anthropic $0.5420 $5.0000 54.80%
Anthropic_Claude_Haiku_45 Google Vertex (Europe) $0.9760 $5.0000 2.80%
Anthropic_Claude_Haiku_45 Amazon Bedrock $0.5780 $5.0000 48.70%
Anthropic_Claude_Haiku_45 Google Vertex $0.4890 $5.0000 58.20%
OpenAI_GPT-54 Azure $1.8150 $15.0000 30.40%
OpenAI_GPT-54 OpenAI $0.8980 $15.1190 74.90%
OpenAI_GPT-5 OpenAI $0.4550 $10.0000 70.60%
OpenAI_GPT-5 Azure (1) $1.2500 $10.0000 0.00%
Xiaomi_MiMo-V2-Flash Xiaomi $0.0500 $0.2980 55.00%
Xiaomi_MiMo-V2-Flash NovitaAI $0.0530 $0.2990 58.70%
Qwen_Qwen35-35B-A3B Parasail $0.0910 $1.0000 59.00%
Qwen_Qwen35-35B-A3B Alibaba Cloud Int. $0.1620 $1.2990 0.00%
Qwen_Qwen35-35B-A3B AkashML $0.1600 $1.2000 0.00%
Qwen_Qwen35-35B-A3B Ambient $0.1020 $1.0000 41.60%
Qwen_Qwen35-35B-A3B Venice $0.1970 $1.2500 73.90%
Qwen_Qwen35-35B-A3B DeepInfra $0.1400 $1.0000 0.00%
Qwen_Qwen35-35B-A3B DekaLLM $0.1390 $1.0000 0.00%
Qwen_Qwen35-35B-A3B AtlasCloud $0.2250 $1.8000 0.00%
Qwen_Qwen35-35B-A3B NextBit $0.3000 $1.8000 0.00%
Qwen_Qwen35-35B-A3B SiliconFlow $0.2400 $1.8000 0.00%
Google_Gemini_25_Pro Google Vertex (Global) $1.0410 $10.0650 30.10%
Google_Gemini_25_Pro Google Vertex (EU) $0.9680 $10.0000 26.40%
Google_Gemini_25_Pro Google AI Studio $0.9680 $10.0000 35.40%
Google_Gemini_25_Pro Google Vertex (US) $1.2200 $10.0000 2.60%
Zai_GLM_51 StreamLake $0.3230 $3.9590 91.30%
Zai_GLM_51 Friendli $0.6440 $4.4000 66.30%
Zai_GLM_51 Z.ai $0.5660 $4.3990 73.10%
Zai_GLM_51 Chutes $0.9960 $4.0000 34.00%
Zai_GLM_51 AtlasCloud $0.5020 $4.4000 78.50%
Zai_GLM_51 DeepInfra $0.4450 $3.5000 71.60%
Zai_GLM_51 SiliconFlow $0.5070 $4.3990 78.30%
Zai_GLM_51 NovitaAI $0.5120 $4.4000 77.50%
Zai_GLM_51 Baidu Qianfan $0.5880 $3.0790 49.10%
Zai_GLM_51 Baseten $1.3000 $4.3000 45.60%
Zai_GLM_51 Inceptron $1.0810 $4.4000 28.00%
Zai_GLM_51 Together $1.4000 $4.4000 6.50%
Zai_GLM_51 Parasail $1.1110 $4.4000 25.30%
Zai_GLM_51 Ambient $1.4000 $4.4000 14.00%
Zai_GLM_51 Phala $1.2100 $4.2000 0.70%
Zai_GLM_51 io.net $1.2900 $4.4800 0.00%
Zai_GLM_51 Fireworks $1.2730 $4.3990 11.10%
Zai_GLM_51 Venice $1.7470 $5.5000 0.20%
Zai_GLM_51 GMICloud $0.0000 $0.0000 0.00%
Zai_GLM_47 Z.ai $0.1170 $2.1990 98.60%
Zai_GLM_47 Google Vertex $0.6000 $2.2000 36.50%
Zai_GLM_47 SiliconFlow $0.2040 $2.2000 72.30%
Zai_GLM_47 DeepInfra $0.2180 $1.7500 56.90%
Zai_GLM_47 Cerebras $2.2500 $2.7500 45.60%
Zai_GLM_47 AtlasCloud $0.3220 $1.8500 49.60%
Zai_GLM_47 Phala $0.8500 $3.2990 3.50%
Zai_GLM_47 NovitaAI $0.3700 $2.0050 40.80%
Zai_GLM_47 Parasail $0.4270 $2.1000 6.70%
Zai_GLM_47 Venice $0.5490 $2.6490 0.30%
OpenAI_GPT-55 OpenAI $1.1170 $30.6400 92.70%
OpenAI_GPT-55 Azure $2.3150 $30.0820 62.00%
Google_Gemini_31_Flash_Lite Google Vertex $0.2100 $1.4980 19.20%
Google_Gemini_31_Flash_Lite Google AI Studio $0.1860 $1.4680 27.40%
Qwen_Qwen3_Coder_480B_A35B DeepInfra (Turbo) $0.1210 $1.0000 89.30%
Qwen_Qwen3_Coder_480B_A35B Together $2.0000 $2.0000 0.00%
Qwen_Qwen3_Coder_480B_A35B Google Vertex $0.2200 $1.7990 49.20%
Qwen_Qwen3_Coder_480B_A35B AtlasCloud $0.7800 $3.7980 0.00%
Qwen_Qwen3_Coder_480B_A35B Weights & Biases $1.0000 $1.4980 70.50%
Qwen_Qwen3_Coder_480B_A35B Alibaba OpenSource $1.3730 $6.8700 0.00%
Qwen_Qwen3_Coder_480B_A35B NovitaAI $0.3800 $1.5490 0.00%
Qwen_Qwen3_Coder_480B_A35B Venice $0.3500 $1.5000 0.00%
Google_Gemini_3_Flash_Preview Google Vertex $0.3600 $2.9970 32.00%
Google_Gemini_3_Flash_Preview Google AI Studio $0.3480 $2.9980 34.50%
Qwen_Qwen36_Plus Alibaba Cloud Int. $0.6690 $1.9820 6.00%
Mistral_Mistral_Small_4 Mistral $0.1280 $0.5990 16.10%
Mistral_Mistral_Small_4 Venice $0.1870 $0.7490 11.20%
MoonshotAI_Kimi_K26 SiliconFlow $0.2920 $4.0000 83.90%
MoonshotAI_Kimi_K26 Moonshot AI $0.3430 $4.0000 76.80%
MoonshotAI_Kimi_K26 Cloudflare $0.6870 $3.5000 13.60%
MoonshotAI_Kimi_K26 Inceptron $0.3360 $3.5000 76.60%
MoonshotAI_Kimi_K26 Weights & Biases $0.5520 $4.0000 50.30%
MoonshotAI_Kimi_K26 NovitaAI $0.2850 $3.4000 80.50%
MoonshotAI_Kimi_K26 Chutes $0.4410 $3.5000 80.80%
MoonshotAI_Kimi_K26 Fireworks $0.4160 $4.0000 67.60%
MoonshotAI_Kimi_K26 Together $0.4250 $4.5000 77.40%
MoonshotAI_Kimi_K26 Parasail $0.3120 $3.5000 74.20%
MoonshotAI_Kimi_K26 io.net $0.7300 $3.4900 0.00%
MoonshotAI_Kimi_K26 AkashML $0.9500 $4.0000 0.00%
MoonshotAI_Kimi_K26 DeepInfra $0.3220 $3.5000 71.30%
MoonshotAI_Kimi_K26 AtlasCloud $0.6370 $4.0000 39.70%
MoonshotAI_Kimi_K26 Nebius Token Factory $0.9500 $4.0000 0.00%
MoonshotAI_Kimi_K26 StreamLake $0.2760 $3.8000 83.50%
MoonshotAI_Kimi_K26 Phala $1.0900 $4.6000 6.10%
MoonshotAI_Kimi_K26 Venice $0.7270 $4.6550 19.50%
Qwen_Qwen35-Flash Alibaba Cloud Int. $0.0650 $0.2590 0.00%
OpenAI_GPT-5_Nano OpenAI $0.0380 $0.3960 25.10%
OpenAI_GPT-5_Nano Azure (1) $0.0350 $0.3990 36.50%
DeepSeek_DeepSeek_V4_Flash DeepSeek $0.0220 $0.2800 86.10%
DeepSeek_DeepSeek_V4_Flash SiliconFlow $0.0890 $0.2790 45.60%
DeepSeek_DeepSeek_V4_Flash Alibaba Cloud Int. $0.0720 $0.2790 61.00%
DeepSeek_DeepSeek_V4_Flash NovitaAI $0.0720 $0.2790 60.60%
DeepSeek_DeepSeek_V4_Flash Parasail $0.1300 $0.2790 14.40%
DeepSeek_DeepSeek_V4_Flash AtlasCloud $0.0870 $0.2790 47.40%
DeepSeek_DeepSeek_V4_Flash DeepInfra $0.0790 $0.1990 25.70%
DeepSeek_DeepSeek_V4_Flash GMICloud $0.0690 $0.2230 48.20%
DeepSeek_DeepSeek_V4_Flash Baidu Qianfan $0.0810 $0.2510 44.40%
DeepSeek_DeepSeek_V4_Flash AkashML $0.1400 $0.2790 0.00%
DeepSeek_DeepSeek_V4_Flash Venice $0.1510 $0.3490 13.60%
OpenAI_GPT-52 OpenAI $0.9490 $14.0000 50.90%
OpenAI_GPT-52 Azure $1.0480 $14.0000 44.60%
Google_Gemini_25_Flash Google Vertex (EU) $0.2350 $2.4990 25.30%
Google_Gemini_25_Flash Google Vertex (Global) $0.2330 $2.4990 27.50%
Google_Gemini_25_Flash Google AI Studio $0.1770 $2.4990 47.80%
Google_Gemini_25_Flash Google Vertex $0.1780 $2.5000 45.10%
Google_Gemini_31_Pro_Preview Google Vertex $1.4470 $12.0080 37.30%
Google_Gemini_31_Pro_Preview Google AI Studio $1.5670 $11.9550 24.20%
OpenAI_GPT-5_Mini OpenAI $0.1330 $1.9960 50.70%
OpenAI_GPT-5_Mini Azure (1) $0.0900 $2.0000 72.80%
Xiaomi_MiMo-V25 Xiaomi $0.1830 $2.0570 74.20%
Owl_Alpha Stealth $0.0000 $0.0000 62.60%
DeepSeek_DeepSeek_V4_Pro SiliconFlow $0.7450 $3.4790 59.30%
DeepSeek_DeepSeek_V4_Pro DeepSeek $0.0560 $0.8690 87.90%
DeepSeek_DeepSeek_V4_Pro NovitaAI $0.4060 $3.3790 82.00%
DeepSeek_DeepSeek_V4_Pro Alibaba Cloud Int. $0.9230 $3.3600 49.10%
DeepSeek_DeepSeek_V4_Pro GMICloud $0.2460 $2.7830 89.80%
DeepSeek_DeepSeek_V4_Pro DeepInfra $0.7350 $2.5990 47.10%
DeepSeek_DeepSeek_V4_Pro Baidu Qianfan $0.8920 $3.0410 45.10%
DeepSeek_DeepSeek_V4_Pro AtlasCloud $1.1450 $3.3800 34.50%
DeepSeek_DeepSeek_V4_Pro Parasail $1.7220 $3.4780 2.10%
DeepSeek_DeepSeek_V4_Pro Fireworks $1.3190 $3.4800 26.40%
DeepSeek_DeepSeek_V4_Pro Together $1.4570 $4.4000 33.80%
DeepSeek_DeepSeek_V4_Pro Venice $0.7990 $3.7950 66.50%
OpenAI_GPT-4o-mini OpenAI $0.1440 $0.5970 8.00%
OpenAI_GPT-4o-mini Azure (1) $0.1260 $0.5980 32.00%
OpenAI_GPT-4o-mini Azure (2) $0.1250 $0.5980 33.50%
Google_Gemini_31_Flash_Lite_Preview Google AI Studio $0.1640 $1.4990 38.30%
Google_Gemini_31_Flash_Lite_Preview Google Vertex $0.2300 $1.4990 10.40%
Zai_GLM_45_Air Z.ai $0.0750 $1.0960 73.50%
Zai_GLM_45_Air NovitaAI $0.0540 $0.8460 72.30%
Zai_GLM_45_Air SiliconFlow $0.1400 $0.8580 42.40%
Anthropic_Claude_Sonnet_45 Amazon Bedrock (1) $2.2850 $15.0010 30.40%
Anthropic_Claude_Sonnet_45 Google Vertex (Global) $1.6100 $15.0000 55.10%
Anthropic_Claude_Sonnet_45 Claude Platform on AWS $0.9770 $15.0000 77.80%
Anthropic_Claude_Sonnet_45 Anthropic $3.0210 $15.0000 5.40%
Anthropic_Claude_Sonnet_45 Google Vertex $0.9840 $15.0000 78.40%
Anthropic_Claude_Sonnet_45 Amazon Bedrock (2) $1.7740 $15.0000 48.70%
MiniMax_MiniMax_M25 DeepInfra $0.0750 $1.1480 62.50%
MiniMax_MiniMax_M25 MiniMax Highspeed $0.1800 $2.3990 77.80%
MiniMax_MiniMax_M25 Inceptron $0.0740 $0.8980 78.80%
MiniMax_MiniMax_M25 MiniMax $0.0700 $1.1990 85.30%
MiniMax_MiniMax_M25 AtlasCloud $0.1110 $1.1980 78.20%
MiniMax_MiniMax_M25 NovitaAI $0.0710 $1.1970 84.90%
MiniMax_MiniMax_M25 Chutes $0.1080 $1.1990 56.30%
MiniMax_MiniMax_M25 Baidu Qianfan $0.1170 $1.0790 62.90%
MiniMax_MiniMax_M25 Friendli $0.1090 $1.1990 79.50%
MiniMax_MiniMax_M25 AkashML $0.1500 $1.1490 0.00%
MiniMax_MiniMax_M25 Parasail $0.1310 $1.1990 62.60%
MiniMax_MiniMax_M25 MARA $0.3000 $1.2000 0.00%
MiniMax_MiniMax_M25 SiliconFlow $0.1580 $1.1990 52.60%
MiniMax_MiniMax_M25 Phala $0.2000 $1.3790 32.90%
MiniMax_MiniMax_M25 Weights & Biases $0.2990 $1.1960 24.20%
MiniMax_MiniMax_M25 StreamLake $0.3050 $1.1990 6.80%
MiniMax_MiniMax_M25 Venice $0.1480 $1.1880 63.80%
Anthropic_Claude_Opus_45 Google Vertex $4.0740 $25.0000 36.70%
Anthropic_Claude_Opus_45 Amazon Bedrock (1) $0.9350 $25.0000 91.90%
Anthropic_Claude_Opus_45 Amazon Bedrock (2) $3.3230 $25.0000 41.30%
Anthropic_Claude_Opus_45 Anthropic $4.4120 $25.0000 23.00%
Anthropic_Claude_Opus_45 Claude Platform on AWS $1.7950 $25.0000 74.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 Weights & Biases $0.1000 $0.0950 32.30%
Qwen_Qwen3_235B_A22B_Instruct_2507 NovitaAI $0.0900 $0.5770 0.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 DeepInfra $0.0710 $0.0970 0.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 Parasail $0.0780 $0.5990 43.10%
Qwen_Qwen3_235B_A22B_Instruct_2507 Alibaba Cloud Int. $0.1490 $0.5630 0.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 Google Vertex (2) $0.2200 $0.8790 28.40%
Qwen_Qwen3_235B_A22B_Instruct_2507 Cerebras $0.6000 $1.1950 69.90%
Qwen_Qwen3_235B_A22B_Instruct_2507 Google Vertex (1) $0.2500 $1.0000 0.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 Friendli $0.2000 $0.7980 0.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 Together $0.2000 $0.5980 0.00%
Qwen_Qwen3_235B_A22B_Instruct_2507 AtlasCloud $0.2000 $0.8760 0.00%
OpenAI_GPT-51_Chat OpenAI $0.6280 $10.0000 55.30%
Google_Gemini_20_Flash Google Vertex $0.0980 $0.3990 5.40%
Google_Gemini_20_Flash Google AI Studio $0.1010 $0.3990 0.60%

Opinion

The "cheap" providers aren't always cheap. Besides that, one thing we did not discuss in this essay is the constant price increases from almost every provider. The numbers look worse when you consider the inference has been getting more expensive too. It makes increasingly more sense to move to a hybrid first (with a smaller local model) and then fully local setup eventually for coding agents.