Context Window Comparison

Context Window Benchmarks

Models
Llama 4 Scout	32,000,000	900 t/s	0.33s
Llama 4 Maverick	32,000,000	900 t/s	0.44s
Gemini 3 Pro	128,000	n/a	n/a	800 t/s	0.72s
Qwen 3 [Beta]	n/a	n/a	n/a	n/a	n/a
Gemini 2.0 Pro	1,000,000	800 t/s	0.50s
Claude 3.7 Sonnet	200,000	750 t/s	0.55s
GPT-4.5	128,000	450 t/s	1.20s
Claude 3.7 Sonnet [P]	200,000	750 t/s	0.55s
DeepSeek V3	128,000	350 t/s	4.00s
OpenAI o1-mini	200,000	250 t/s	14.00s
OpenAI o1	128,000	200 t/s	12.64s
OpenAI o1-mini-2	750,000	n/a	n/a	n/a	n/a
DeepSeek V3 G324	128,000	350 t/s	4.00s
Qwen o1	200,000	900 t/s	20.00s
Gemini 2.0 Flash	128,000	500 t/s	0.32s
Llama 3.1 70b	128,000	500 t/s	0.72s
Nous Pro	300,000	500 t/s	0.64s
Claude 3.5 Haiku	200,000	850 t/s	0.35s
Llama 3.1 405b	128,000	350 t/s	0.72s
GPT-4o-mini	128,000	450 t/s	0.50s
GPT-4o	128,000	450 t/s	0.50s
Claude 3.5 Sonnet	200,000	750 t/s	1.20s

Model Details

Llama 4 Series (Meta)

Leading the industry with massive 32M token context windows, both Scout and Maverick variants offer unparalleled context processing capabilities. Despite higher costs (-/M tokens), they maintain excellent speed (900 t/s) with minimal latency (0.33-0.44s), making them ideal for large-scale document processing and complex analyses.

Gemini Series (Google)

Notable contrast between versions: Gemini 3 Pro offers standard 128K context but with improved speed (800 t/s), while Gemini 2.0 Pro features an impressive 1M token context window with competitive pricing (/M input). Both maintain excellent processing speeds with low latency, suitable for enterprise applications.

Claude Series (Anthropic)

The Claude family shows consistent performance with 200K context windows. The 3.7 Sonnet and its [P] variant offer excellent value (/M input, /M output) with strong speed (750 t/s). The 3.5 versions maintain similar capabilities with slight variations in latency.

OpenAI Series

Diverse lineup with varying capabilities: o1 models range from 128K to 750K context windows, with the base o1 offering good value (/M input, /M output). The mini variants show different trade-offs between speed and latency, while GPT-4 models focus on consistent performance.

DeepSeek V3 Series

Provides a 128K context window with highly competitive pricing (/M input, /M output). While the processing speed (350 t/s) and latency (4s) are moderate, the cost-effectiveness makes it suitable for budget-conscious applications.

Specialized Models

Qwen Series: Offers 200K context with mid-range pricing but impressive speed (900 t/s)
Nous Pro: Features a larger 300K context window with affordable pricing (/M input, /M output)
Llama 3.1 Series: Provides consistent 128K context with economical pricing and reliable performance

General

Introduction

Learning Studio

Prompt Engineering

Sample Topics

Learning Levels

Context Window Comparison

Context Window Benchmarks

Model Details

Llama 4 Series (Meta)

Gemini Series (Google)

Claude Series (Anthropic)

OpenAI Series

DeepSeek V3 Series

Specialized Models

General

Introduction

Learning Studio

Prompt Engineering

Sample Topics

Learning Levels

​Context Window Benchmarks

​Model Details

​Llama 4 Series (Meta)

​Gemini Series (Google)

​Claude Series (Anthropic)

​OpenAI Series

​DeepSeek V3 Series

​Specialized Models

Context Window Benchmarks

Model Details

Llama 4 Series (Meta)

Gemini Series (Google)

Claude Series (Anthropic)

OpenAI Series

DeepSeek V3 Series

Specialized Models