Human Benchmark Test - Search News

Opinion

The Kafka Test: On Trusting AI’s Eloquent Incoherence

The AI’s responses were intriguing; I affirmed its continued use as an analytical partner, a digital hevruta, while remaining mindful of its contaminated knowledge. Note that my earlier articles ...

ExecutiveGov

NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing

NIST said Friday that its Center for AI Standards and Innovation, or CAISI, released an initial public draft of NIST AI 800-2 ...

Frontiers

Automated versus human scoring of the Rey-Osterrieth Complex Figure Test: a rapid review

Introduction: Despite digital advances in healthcare, clinical neuropsychology has been slow to adopt automated assessment tools. Automated scoring of the Rey-Osterrieth Complex Figure Test (ROCFT) ...

ascopubs.org

Artificial Intelligence–Assisted Error Detection in Complex Clinical Documentation: Leveraging Large Language Models to Enhance Patient Safety in Oncology

We conducted a two-phase evaluation. First, we assessed LLMs (GPT o4-mini and Gemini 2.5 Pro) on 1,000 synthetic clinical hematology/oncology vignettes with ...

Frontiers

LLMs achieve adult human performance on higher-order theory of mind tasks

We introduce a new benchmark, MoToMQA, to assess human and LLM ToM abilities at increasing orders. MoToMQA is based upon the format of the Imposing Memory Task (IMT), a well-validated psychological ...

IEEE

HHI-Assist: A Dataset and Benchmark of Human-Human Interaction in Physical Assistance Scenario

Abstract: The increasing labor shortage and aging population underline the need for assistive robots to support human care recipients. To enable safe and responsive assistance, robots require accurate ...

Business Wire

Hack The Box Launches the World’s First AI Cyber Range to Benchmark AI Agents and Accelerate Human-AI Teaming Across Offensive and Defensive Cyber Operations

NEW YORK--(BUSINESS WIRE)--Hack The Box (HTB), the global leader in AI-powered cybersecurity readiness, today unveiled HTB AI Range, the world’s first controlled AI cyber range built to test and ...

Morningstar

Hack The Box Launches the World’s First AI Cyber Range to Benchmark AI Agents and Accelerate Human-AI Teaming Across Offensive and Defensive Cyber Operations

Hack The Box Launches the World’s First AI Cyber Range to Benchmark AI Agents and Accelerate Human-AI Teaming Across Offensive and Defensive Cyber Operations Hack The Box (HTB), the global leader in ...

VentureBeat

Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks

Just a few short weeks ago, Google debuted its Gemini 3 model, claiming it scored a leadership position in multiple AI benchmarks. But the challenge with vendor-provided benchmarks is that they are ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results