Research
Model Comparison for Falcon
Detailed comparison of LLM research capabilities across different models and metrics.
Performance Metrics
The following tables present comprehensive benchmarking results across different models and evaluation criteria.
Average Numeric Scores
Model | Average Scores (%) | Token Count | Response Time (s) | Factual Accuracy (%) | Hallucination Rate (%) | Logical Correctness (%) | Source Quality (%) |
---|---|---|---|---|---|---|---|
Claude | 87 | 1623 | 104 | 84 | 88 | 91 | 81 |
Gemini | 92 | 10501 | 600 | 91 | 92 | 95 | 88 |
Manus | 79 | 4278 | 891 | 73 | 72 | 90 | 69 |
OpenAI | 93 | 9316 | 737 | 93 | 93 | 95 | 89 |
Perplexity | 88 | 3152 | 180 | 86 | 89 | 93 | 82 |
xAI Grok 3 Deep Search | 89 | 2929 | 39 | 83 | 89 | 93 | 86 |
xAI Grok 3 Deeper Search | 85 | 2905 | 229 | 80 | 80 | 92 | 78 |
Qualitative Ratings
Model | Rating | Token Count | Response Time | Factual Accuracy | Mistakes | Correctness | Source Quality |
---|---|---|---|---|---|---|---|
Claude | 3 | 2 | 5 | 3 | 4 | 4 | 3 |
Gemini | 4 | 5 | 2 | 5 | 5 | 5 | 5 |
Manus | 1 | 3 | 1 | 1 | 1 | 4 | 1 |
OpenAI | 4 | 5 | 2 | 5 | 5 | 5 | 5 |
Perplexity | 3 | 2 | 5 | 4 | 5 | 4 | 4 |
xAI Grok 3 Deep Search | 3 | 2 | 5 | 3 | 4 | 4 | 5 |
xAI Grok 3 Deeper Search | 2 | 2 | 4 | 2 | 2 | 4 | 3 |
Subscription Costs
Model | Cost |
---|---|
OpenAI ChatGPT Deep Research | $200/mo (Pro, 250 queries/mo); $20/mo (Plus, Team, Edu, Enterprise, 25 queries/mo) |
Anthropic Claude Research | $100 or $200/mo (Max tier only, US/JP/BR, early beta) |
Gemini 2.5 Pro Deep Research | $20/month |
Perplexity Deep Research | Free (5 queries/day) or $20/month |
xAI Grok 3 Deep Research | $30/month (SuperGrok) |
xAI Grok 3 Deeper Research | $30/month (SuperGrok - Deeper Mode) |
Manus AI | $2-10 per task (depending on task intensity & difficulty) |