Novlithic

Symbolica's Agentica SDK demonstrates a 36.08% unverified performance on the ARC-AGI-3 benchmark [1], successfully solving 113 of 182 playable challenges and completing 7 of 25 available game scenarios [2].

This implementation significantly surpasses Chain-of-Thought baseline results—Opus 4.6 Max at 0.2% and GPT 5.4 High at 0.3%—while delivering substantially better cost efficiency: Agentica achieves 36.08% accuracy for $1,005 compared to Opus 4.6's 0.25% at $8,900.

Explore the implementation on GitHub symbolica-ai/ARC-AGI-3-Agents

Figure 1. Performance-cost analysis comparing Chain of Thought models against the Agentica ARC-AGI-3 agent using Opus 4.6 (120k) High on the public evaluation set. Cost breakdown per task for Agentica Opus 4.6 (120k) High available in the repository.

Gallery - Games Won

97.6%

118 actions

CN0497.6% WIN

84.16%

273 actions

LP8584.16% WIN

83.28%

516 actions

AR2583.28% WIN

77.59%

123 actions

FT0977.59% WIN

Performance Metrics Across All Sessions

Exceeded human benchmarkVictory achievedSession concluded

Source: lairv · https://www.symbolica.ai/blog/arc-agi-3

How We Achieved 36% Accuracy on ARC-AGI-3 in a Single Day

Gallery - Games Won

Performance Metrics Across All Sessions

Related Articles

Gemini's New Memory and Chat Import Features Simplify Migration from Other AI Assistants

Samsung's One UI 9 May Introduce Built-In Repair Hub to Streamline Galaxy Phone Maintenance

Turbolite: A High-Performance SQLite VFS That Executes Complex JOIN Queries from S3 in Under 250ms