Arithmetic Reasoning Problems

Achieving >97% on GSM8K: Deeply understanding the problems makes LLMs better solvers for math word problems

Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks.

Tech Xplore

Reasoning: A smarter way for AI to understand text and images

Engineers at the University of California San Diego have developed a new way to train artificial intelligence systems to ...

Unite.AI

When AI Solves Open Math Problems, What’s Left for Genius?

AI systems are beginning to produce proof ideas that experts take seriously, even when final acceptance is still pending.

A New AI Math Startup Just Cracked 4 Previously Unsolved Problems

Axiom says its AI found solutions to several long-standing math problems, a sign of the technology’s steadily advancing reasoning capabilities.

NextBigFuture

Autonomous Deepmind AI Generates Publishable Math Papers – Next Accelerate Science Research

DeepMind's Aletheia is a huge advance in AI-driven mathematical reasoning. It is a research agent built on top of Gemini Deep ...

Communications of the ACM

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

A marriage of formal methods and LLMs seeks to harness the strengths of both.

Morning Overview on MSNOpinion

Top AI models are failing hard at solving fresh math problems

Top artificial intelligence systems now ace many textbook-style math questions, yet they still fall apart on genuinely new ...

Mirage News

LLMs Excel in Math Word Problems with >97% on GSM8K

Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls ...

TechJuice

Is This AGI? The Shocking New Reasoning Scores from Google’s Deep Think

Google upgrades its Gemini 3 Deep Think AI mode with stronger reasoning and practical problem-solving for science, research, ...

AI Can Solve Many Complex Problems, So Why Isn’t Science Moving Faster?

One would imagine that an AI capable of solving the hardest Olympiad problems would naturally produce novel scientific ...

21h

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be ...

eWeek

Axiom.AI Just Solved a Math Problem No Human Could Crack

AxiomProver solved a real open math conjecture using formal verification, signaling a shift from AI that assists research to AI that discovers new truths.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results