Machine translation into English using my sentence-wise GPT translator (that should hopefully do a bit better than Google Translate):<p>---<p># EXAONE Deep Unveiled ━ Setting a New Standard for Reasoning AI<p>The era of Agentic AI—where AI independently formulates and tests hypotheses and makes decisions autonomously without human instruction—is approaching. In the shift to the Agentic AI era, developing reasoning-enhanced models is essential, but securing high-performance ones is no easy task. Globally, only a handful of companies with foundation models are developing their own reasoning-enhanced models. The LG AI Research Institute unveils EXAONE Deep, a powerful Reasoning AI that stands on par with these models.<p>EXAONE Deep is a high-performance, reasoning-specialized model endowed with the ability to understand mathematical logic, reason through scientific concepts, and solve programming problems. In developing EXAONE Deep, we focused on dramatically enhancing reasoning performance in the areas of Math, Science, and Coding. At the same time, we ensured it also possessed the ability to understand and apply essential knowledge across various domains.<p>Math – EXAONE Deep 32B: Outperforming high-difficulty math benchmarks at just 5% the size of competing models.<p>Science & Coding – EXAONE Deep 7.8B & 2.4B: Ranking first on all major benchmarks with overwhelming performance superiority.<p>MMLU – EXAONE Deep 32B: Demonstrating the highest performance among domestic models with an MMLU score of 83.0.<p>Shortly after its release, the EXAONE Deep 32B model was listed on the Notable AI Models list by Epoch AI, a U.S. non-profit research organization, thus proving its performance. This marks its consecutive inclusion following EXAONE 3.5, making EXAONE the only South Korean model to have been listed on this roster in the past two years.<p>With EXAONE Deep, we aim to establish a new standard for Reasoning AI that transcends mere numerical performance.<p>Image 1. EXAONE Deep and EXAONE 3.5 Listed on Epoch AI's Notable AI Models List (Source: Epoch AI)<p>We will now present the core features and performance of EXAONE Deep alongside actual benchmark results. Experience firsthand the future and potential of AI with the more powerful EXAONE Deep!<p>Try EXAONE Deep Model →<p>EXAONE Deep Technical Report →<p>## 1. Math ━ Highest Score in the 2025 CSAT Mathematics Section, with 7.8B & 2.4B Ranking First on All Major Benchmarks.<p>The EXAONE Deep 32B, 7.4B, and 2.4B models all demonstrated markedly superior performance over global reasoning models in the 2025 CSAT Mathematics section. They all achieved the highest scores compared to their peers, showcasing overwhelming competitiveness in mathematical reasoning.<p>Image 2. Performance Comparison in the Mathematics Category
※ An asterisk (<i>) represents officially reported figures, and scores highlighted in purple denote the highest performance.<p>Image 3. CSAT 2025 Mathematics Evaluation Results<p>### EXAONE Deep 32B ━ Outperforming high-difficulty math benchmarks at just 5% the size of competing models.<p>The EXAONE Deep 32B model scored 94.5 in the CSAT Mathematics section and 90.0 on AIME 2024—a criterion used for U.S. Olympiad selection—demonstrating the highest performance among competing models. In AIME 2025, it achieved performance equivalent to that of the DeepSeek-R1 (671B) model. These results attest to its exceptional problem-solving skills and logical reasoning in challenging mathematical assessments. Notably, by scoring impressively on challenging evaluations like AIME compared to much larger models, it reaffirms EXAONE's hallmark strengths in learning efficiency and cost-effectiveness.<p>### EXAONE Deep 7.8B & 2.4B ━ Achieving First Place on All Major Benchmarks.<p>The 7.8B and 2.4B models, representing the lightweight and on-device categories respectively, both secured first place on all major benchmarks, proving their overwhelming performance. The 7.8B model scored 94.8 on MATH-500 and 59.6 on AIME 2025, whereas the 2.4B model achieved scores of 92.3 and 47.9, respectively.<p>## 2. Science & Coding ━ Exceptional reasoning in specialized scientific domains and software coding skills.<p>Image 4. Performance Comparison in the Coding Category
※ An asterisk (</i>) denotes officially reported figures, and scores highlighted in purple indicate the highest performance.<p>The EXAONE Deep model also proved its overwhelming performance in the realms of science and coding. Firstly, the 32B model scored 66.1 on the GPQA Diamond test—evaluating doctoral-level problem solving in physics, chemistry, and biology—and 59.5 on LiveCodeBench, which assesses coding skills. This suggests that it exhibits high utility even in domains that demand specialized knowledge.<p>The 7.8B and 2.4B models also secured first place in both the GPQA Diamond and LiveCodeBench evaluations. Notably, following the EXAONE 3.5 2.4B model's first-place ranking in the 'Edge' category of Hugging Face's LLM Leaderboard last December, EXAONE Deep's attainment of top performance further confirms it as a world-class model for lightweight and on-device applications.<p>## 3. MMLU ━ Demonstrating the Highest Performance Among Domestic Proprietary Models.<p>The EXAONE Deep model has not only specialized reasoning capabilities in mathematics, science, and coding, but its performance in general domains has also been significantly bolstered. In particular, the 32B model achieved a score of 83.0 on MMLU (Massive Multitask Language Understanding), securing the top performance among domestic proprietary models.<p>Image 5. Performance Comparison in the General Category
※ An asterisk (*) signifies officially reported figures, and scores highlighted in purple represent the highest performance.<p>EXAONE Deep is broadening AI's reasoning capabilities across diverse domains like mathematics, science, and coding, and is venturing into solving even more complex problems. We will continue to advance through ongoing research and innovation, ensuring that AI can contribute to making human life richer and more convenient in the future.