AI & ML

How We Achieved 36% Accuracy on ARC-AGI-3 in a Single Day

March 27, 2026 5 min read views

Symbolica's Agentica SDK demonstrates a 36.08% unverified performance on the ARC-AGI-3 benchmark [1], successfully solving 113 of 182 playable challenges and completing 7 of 25 available game scenarios [2].

This implementation significantly surpasses Chain-of-Thought baseline results—Opus 4.6 Max at 0.2% and GPT 5.4 High at 0.3%—while delivering substantially better cost efficiency: Agentica achieves 36.08% accuracy for $1,005 compared to Opus 4.6's 0.25% at $8,900.

Explore the implementation on GitHub symbolica-ai/ARC-AGI-3-Agents
ARC-AGI-3: Score vs Cost0%10%20%30%40%Score (%)$1$10$100$1k$10kCost ($)Gemini 3.1Pro(Preview)Grok 4.20(BetaReasoning)GPT-5.4(High)Opus4.6(Max)SOTAAgentica Opus4.6 (High)
Figure 1. Performance-cost analysis comparing Chain of Thought models against the Agentica ARC-AGI-3 agent using Opus 4.6 (120k) High on the public evaluation set. Cost breakdown per task for Agentica Opus 4.6 (120k) High available in the repository.
97.6%
118 actions
CN0497.6% WIN
84.16%
273 actions
LP8584.16% WIN
83.28%
516 actions
AR2583.28% WIN
77.59%
123 actions
FT0977.59% WIN

Performance Metrics Across All Sessions

Exceeded human benchmarkVictory achievedSession concluded