Large language models (LLMs) like ChatGPT show reasoning errors across many domains. Identifying vulnerabilities is good for public safety, industry, and the scientists making these models. The human ...