AI Agent Benchmark

Performance Evaluation Leaderboard

Updated: 2025-12-01 01:08:44
Top Performers
1
with_code_executor
Accuracy
80.0%
Avg Time
3.21s
Avg Input Tokens
450
Avg Output Tokens
111

Detailed Task Results

Task Name Status Time (s) Input Tokens Output Tokens
簡単な計算 Pass 1.26 59 10
やや複雑な計算 Pass 1.91 169 34
因数分解(3桁) Pass 2.55 71 318
因数分解(10桁) Pass 6.66 228 72
検索タスク Fail 3.67 1721 122
2
gemini_2_5_flash
Accuracy
40.0%
Avg Time
15.22s
Avg Input Tokens
70
Avg Output Tokens
3311

Detailed Task Results

Task Name Status Time (s) Input Tokens Output Tokens
簡単な計算 Pass 1.32 62 55
やや複雑な計算 Fail 1.30 71 56
因数分解(3桁) Pass 3.31 74 525
因数分解(10桁) Fail 59.58 80 14196
検索タスク Fail 10.60 64 1725
3
with_google_search
Accuracy
40.0%
Avg Time
2.69s
Avg Input Tokens
173
Avg Output Tokens
71

Detailed Task Results

Task Name Status Time (s) Input Tokens Output Tokens
簡単な計算 Pass 0.89 59 10
やや複雑な計算 Fail 1.97 162 55
因数分解(3桁) Fail 2.38 159 67
因数分解(10桁) Fail 4.18 330 186
検索タスク Pass 4.04 155 38
4
baseline
Accuracy
20.0%
Avg Time
0.93s
Avg Input Tokens
61
Avg Output Tokens
11

Detailed Task Results

Task Name Status Time (s) Input Tokens Output Tokens
簡単な計算 Pass 0.88 53 2
やや複雑な計算 Fail 1.08 62 12
因数分解(3桁) Fail 0.71 65 8
因数分解(10桁) Fail 1.08 71 27
検索タスク Fail 0.88 55 6