In some ways, data and its quality can seem strange to people used to assessing the quality of software. There’s often no observable behaviour to check and little in the way of structure to help you ...
To some, METR’s “time horizon plot” indicates that AI utopia—or apocalypse—is close at hand. The truth is more complicated.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results