
Logos of generative artificial intelligence (AI) applications are seen in this illustration. gettyimagesbank
Korean artificial intelligence (AI) models being developed under the government’s bid to assert technological sovereignty were found to lag behind leading overseas systems in tests modeled on the College Scholastic Ability Test and in advanced, essay-style mathematics questions.
The comparison was conducted by a research team led by Kim Jon-lark, a professor of mathematics at Sogang University, which asked 10 large language models to solve a total of 50 questions.
The set comprised 20 advanced CSAT mathematics problems, covering calculus, probability and statistics, geometry and common subjects, along with 30 essay-style questions drawn from top South Korean universities, Indian entrance examinations and graduate-level tests at the University of Tokyo.
Overseas models took the top spots. Google’s Gemini 3 Pro Preview ranked first with 92 points, correctly solving 46 of the 50 questions. Claude Opus 4.5 from Anthropic followed with 84 points, while Grok 4.1 Fast from xAI scored 82. GPT-5.1 from OpenAI earned 80 points, and China’s DeepSeek V3.2 posted 76 points.
By contrast, the performance of Korean models clustered near the bottom of the rankings. Solar Pro 2, developed by Upstage, recorded 58 points, the highest among domestic systems. HCX-007 from Naver scored 26 points, while EXAONE 4.0.1 from LG AI Research and A.X 4.0 (72B) from SK Telecom each scored 24 points. NC AI’s lightweight model, Llama-VARCO-8B-Instruct, finished last with just 2 points.
The research team said the gap remained even after allowing Korean models to use Python-based calculation tools to aid problem-solving, a measure intended to offset limitations in step-by-step reasoning.
Similar patterns emerged in follow-up tests using EntropyMath, a proprietary dataset designed to span difficulty levels from undergraduate coursework to research-grade mathematics.
Kim said the test was conducted after repeated questions about why there had been no public evaluation of Korean sovereign AI models using CSAT-level problems.
“We kept hearing why no one had assessed the five domestic models with college entrance exam questions, so we decided to run the tests ourselves,” he said. “The results showed that the current versions of Korean models remain significantly behind global frontier models.”
He stressed that the evaluation was based on publicly released versions of the domestic systems.
“Once the national team versions of these models are released, we plan to test them again using problems we develop ourselves,” Kim said.
Still, the results have fueled debate over Korea’s AI strategy. Industry officials say many domestic developers have prioritized service-oriented and enterprise applications rather than advanced mathematical reasoning.
A chief technology officer at an applied AI startup said Korean developers are primarily focused on building industrial AI systems, such as AI agents, making lower scores on exam-style math tests almost inevitable.
The findings have also raised questions about whether Korea’s current approach is sufficient to meet the government’s goal of building a homegrown AI foundation model ranked among the world’s top 10.
Responding to questions on the issue at a press briefing in Sejong City on Monday, Science and ICT Minister Bae Kyung-hoon said many companies developing domestic AI foundation models had optimized their systems for commercial use, leaving gaps in data training for science- and math-focused reasoning.
“If we create and train domain-specific datasets — for example, by converting chemical molecular structures into formats AI systems can recognize — those models can achieve competitiveness at a global top 10 level,” he said.
He added that developing AI systems capable of excelling across all academic disciplines is not realistic, and that greater focus should be placed on lightweight models that can be directly deployed in real-world services.