Language Testing Methods

Giving Language Models a ‘Truth Dial’

True or chatty: pick one. A new training method lets users tell AI chatbots exactly how 'factual' to be, turning accuracy into a dial you can crank up or down. A new research collaboration between the ...

LondonLovesBusiness

The 10 best AI red teaming tools of 2026

Discover the top 10 AI red teaming tools of 2026 and learn how they help safeguard your AI systems from vulnerabilities.

Microsoft

Detecting backdoored language models at scale

Learn how Microsoft research uncovers backdoor risks in language models and introduces a practical scanner to detect tampering and strengthen AI security.

How our lab is helping develop an Alzheimer’s test that can be done at home

Our team at the UK Dementia Research Institute’s Biomarker Factory at UCL are part of the global effort working to develop ...

12d

Qwen3-Max Thinking beats Gemini 3 Pro and GPT-5.2 on Humanity's Last Exam (with search)

On HMMT Feb 25, a rigorous reasoning benchmark, Qwen3-Max-Thinking scored 98.0, edging out Gemini 3 Pro (97.5) and ...

Hosted on MSN

How to improve your memory with the 2-7-30 method

150 years of science shows this brain hack can radically improve your memory. Entrepreneurs and anyone else who needs to learn things fast should take note. This is a column about a helpful trick that ...

Frontiers

Testing emergency language skills in situational teaching: a quasi-experimental research

Despite the growing recognition of the need for specialized training in emergency language skills, educational frameworks in which disaster scenarios are integrated into language pedagogy remain ...

GitHub

Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models

Recent breakthroughs in large language models (LLMs) on complex reasoning tasks have been largely driven by Test-Time Scaling (TTS) — a paradigm that enhances reasoning by intensifying inference-time ...

Medical Xpress

New method accelerates resistance testing in urinary tract infections

Researchers at the Technical University of Munich (TUM) have developed a method for diagnosing urinary tract infections that significantly accelerates antibiotic resistance testing in urine. Because ...

unite

A ‘Zen’ Method to Stop Language Models from Hallucinating

Telling ChatGPT to fact-check a random answer before solving an actual problem makes it think harder, and get the answer right more often – even if the earlier ‘random’ answer has nothing to do with ...

Skift

Airbnb Is Testing a ‘What Box’ for Natural Language Search

Say what? Yes, Airbnb is testing adding a "What" box to the search interface atop its homepage. If it works, then over the next few years it may transform the way people look for a place to stay.

IEEE

Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging

Abstract: Large Language Models (LLMs) can generate plausible test code. Intuitively they generate this by imitating tests seen in their training data, rather than reasoning about execution semantics.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results