Context Window Benchmarks

Models
Llama 4 Scout32,000,000900 t/s0.33s
Llama 4 Maverick32,000,000900 t/s0.44s
Gemini 3 Pro128,000n/an/a800 t/s0.72s
Qwen 3 [Beta]n/an/an/an/an/a
Gemini 2.0 Pro1,000,000800 t/s0.50s
Claude 3.7 Sonnet200,000750 t/s0.55s
GPT-4.5128,000450 t/s1.20s
Claude 3.7 Sonnet [P]200,000750 t/s0.55s
DeepSeek V3128,000350 t/s4.00s
OpenAI o1-mini200,000250 t/s14.00s
OpenAI o1128,000200 t/s12.64s
OpenAI o1-mini-2750,000n/an/an/an/a
DeepSeek V3 G324128,000350 t/s4.00s
Qwen o1200,000900 t/s20.00s
Gemini 2.0 Flash128,000500 t/s0.32s
Llama 3.1 70b128,000500 t/s0.72s
Nous Pro300,000500 t/s0.64s
Claude 3.5 Haiku200,000850 t/s0.35s
Llama 3.1 405b128,000350 t/s0.72s
GPT-4o-mini128,000450 t/s0.50s
GPT-4o128,000450 t/s0.50s
Claude 3.5 Sonnet200,000750 t/s1.20s

Model Details

Llama 4 Series (Meta)

Leading the industry with massive 32M token context windows, both Scout and Maverick variants offer unparalleled context processing capabilities. Despite higher costs (-/M tokens), they maintain excellent speed (900 t/s) with minimal latency (0.33-0.44s), making them ideal for large-scale document processing and complex analyses.

Gemini Series (Google)

Notable contrast between versions: Gemini 3 Pro offers standard 128K context but with improved speed (800 t/s), while Gemini 2.0 Pro features an impressive 1M token context window with competitive pricing (/M input). Both maintain excellent processing speeds with low latency, suitable for enterprise applications.

Claude Series (Anthropic)

The Claude family shows consistent performance with 200K context windows. The 3.7 Sonnet and its [P] variant offer excellent value (/M input, /M output) with strong speed (750 t/s). The 3.5 versions maintain similar capabilities with slight variations in latency.

OpenAI Series

Diverse lineup with varying capabilities: o1 models range from 128K to 750K context windows, with the base o1 offering good value (/M input, /M output). The mini variants show different trade-offs between speed and latency, while GPT-4 models focus on consistent performance.

DeepSeek V3 Series

Provides a 128K context window with highly competitive pricing (/M input, /M output). While the processing speed (350 t/s) and latency (4s) are moderate, the cost-effectiveness makes it suitable for budget-conscious applications.

Specialized Models

  • Qwen Series: Offers 200K context with mid-range pricing but impressive speed (900 t/s)
  • Nous Pro: Features a larger 300K context window with affordable pricing (/M input, /M output)
  • Llama 3.1 Series: Provides consistent 128K context with economical pricing and reliable performance