The method has two main features: it evaluates how AI models reason through problems instead of just checking whether their ...