Context Window Comparison
Context window, cost and speed comparison across different AI models.
Context Window Benchmarks
Models | |||||
---|---|---|---|---|---|
Llama 4 Scout | 32,000,000 | 900 t/s | 0.33s | ||
Llama 4 Maverick | 32,000,000 | 900 t/s | 0.44s | ||
Gemini 3 Pro | 128,000 | n/a | n/a | 800 t/s | 0.72s |
Qwen 3 [Beta] | n/a | n/a | n/a | n/a | n/a |
Gemini 2.0 Pro | 1,000,000 | 800 t/s | 0.50s | ||
Claude 3.7 Sonnet | 200,000 | 750 t/s | 0.55s | ||
GPT-4.5 | 128,000 | 450 t/s | 1.20s | ||
Claude 3.7 Sonnet [P] | 200,000 | 750 t/s | 0.55s | ||
DeepSeek V3 | 128,000 | 350 t/s | 4.00s | ||
OpenAI o1-mini | 200,000 | 250 t/s | 14.00s | ||
OpenAI o1 | 128,000 | 200 t/s | 12.64s | ||
OpenAI o1-mini-2 | 750,000 | n/a | n/a | n/a | n/a |
DeepSeek V3 G324 | 128,000 | 350 t/s | 4.00s | ||
Qwen o1 | 200,000 | 900 t/s | 20.00s | ||
Gemini 2.0 Flash | 128,000 | 500 t/s | 0.32s | ||
Llama 3.1 70b | 128,000 | 500 t/s | 0.72s | ||
Nous Pro | 300,000 | 500 t/s | 0.64s | ||
Claude 3.5 Haiku | 200,000 | 850 t/s | 0.35s | ||
Llama 3.1 405b | 128,000 | 350 t/s | 0.72s | ||
GPT-4o-mini | 128,000 | 450 t/s | 0.50s | ||
GPT-4o | 128,000 | 450 t/s | 0.50s | ||
Claude 3.5 Sonnet | 200,000 | 750 t/s | 1.20s |
Model Details
Llama 4 Series (Meta)
Leading the industry with massive 32M token context windows, both Scout and Maverick variants offer unparalleled context processing capabilities. Despite higher costs (-/M tokens), they maintain excellent speed (900 t/s) with minimal latency (0.33-0.44s), making them ideal for large-scale document processing and complex analyses.
Gemini Series (Google)
Notable contrast between versions: Gemini 3 Pro offers standard 128K context but with improved speed (800 t/s), while Gemini 2.0 Pro features an impressive 1M token context window with competitive pricing (/M input). Both maintain excellent processing speeds with low latency, suitable for enterprise applications.
Claude Series (Anthropic)
The Claude family shows consistent performance with 200K context windows. The 3.7 Sonnet and its [P] variant offer excellent value (/M input, /M output) with strong speed (750 t/s). The 3.5 versions maintain similar capabilities with slight variations in latency.
OpenAI Series
Diverse lineup with varying capabilities: o1 models range from 128K to 750K context windows, with the base o1 offering good value (/M input, /M output). The mini variants show different trade-offs between speed and latency, while GPT-4 models focus on consistent performance.
DeepSeek V3 Series
Provides a 128K context window with highly competitive pricing (/M input, /M output). While the processing speed (350 t/s) and latency (4s) are moderate, the cost-effectiveness makes it suitable for budget-conscious applications.
Specialized Models
- Qwen Series: Offers 200K context with mid-range pricing but impressive speed (900 t/s)
- Nous Pro: Features a larger 300K context window with affordable pricing (/M input, /M output)
- Llama 3.1 Series: Provides consistent 128K context with economical pricing and reliable performance