Programming Language Benchmarks

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between ...

'Most powerful programming language of the future isn’t C++ or Python it’s..., says Nvidia CEO Jensen Huang

Nvidia CEO Jensen Huang says English could become the most powerful programming language as AI reduces the need for traditional coding and shifts focus toward intent-driven human-machine interaction.

15h

Sarvam AI claims edge over larger global models on Indic benchmarks

Capable of reasoning, designed for voice, and fluent in Indian languages, the model would be ready for population-scale deployment ...

Communications of the ACM

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

Print Join the Discussion View in the ACM Digital Library The mathematical reasoning performed by LLMs is fundamentally different from the rule-based symbolic methods in traditional formal reasoning.

GitHub

LangArena: A Balanced Programming Language Benchmark Suite

The suite was initially authored in Crystal and then translated to other languages using AI-assisted tools (DeepSeek). This approach ensures functional and algorithmic parity, though the resulting ...

GitHub

Tilus: A Tile-Level GPU Kernel Programming Language

It also includes automatic tuning, caching, and a Pythonic interface for ease of use. Tilus is pronounced as tie-lus, /ˈtaɪləs/. Tilus supports Ampere architecture, and we are actively working on the ...

IEEE

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-Guided 3D Policy

Abstract: Generalizing language-conditioned robotic policies to new tasks remains a significant challenge, hampered by the lack of suitable simulation benchmarks. In this paper, we address this gap by ...

IEEE

Defying Distractions in Multimodal Tasks: A Novel Benchmark for Large Vision-Language Models

Abstract: Large Vision-Language Models (LVLMs) with “multimodal distractibility,” where plausible but irrelevant visual or textual inputs cause significant drops in reasoning consistency and lead to ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results