Have AI Chatbots Developed Theory of Mind? What We Do and Do Not Know. – The New York Times
Mind reading is common among us humans. Not in the ways that psychics claim to do it, by gaining access to the warm streams of consciousness that fill every individuals experience, or in the ways that mentalists claim to do it, by pulling a thought out of your head at will. Everyday mind reading is more subtle: We take in peoples faces and movements, listen to their words and then decide or intuit what might be going on in their heads.
Among psychologists, such intuitive psychology the ability to attribute to other people mental states different from our own is called theory of mind, and its absence or impairment has been linked to autism, schizophrenia and other developmental disorders. Theory of mind helps us communicate with and understand one another; it allows us to enjoy literature and movies, play games and make sense of our social surroundings. In many ways, the capacity is an essential part of being human.
What if a machine could read minds, too?
Recently, Michal Kosinski, a psychologist at the Stanford Graduate School of Business, made just that argument: that large language models like OpenAIs ChatGPT and GPT-4 next-word prediction machines trained on vast amounts of text from the internet have developed theory of mind. His studies have not been peer reviewed, but they prompted scrutiny and conversation among cognitive scientists, who have been trying to take the often asked question these days Can ChatGPT do this? and move it into the realm of more robust scientific inquiry. What capacities do these models have, and how might they change our understanding of our own minds?
Psychologists wouldnt accept any claim about the capacities of young children just based on anecdotes about your interactions with them, which is what seems to be happening with ChatGPT, said Alison Gopnik, a psychologist at the University of California, Berkeley and one of the first researchers to look into theory of mind in the 1980s. You have to do quite careful and rigorous tests.
Dr. Kosinskis previous research showed that neural networks trained to analyze facial features like nose shape, head angle and emotional expression could predict peoples political views and sexual orientation with a startling degree of accuracy (about 72 percent in the first case and about 80 percent in the second case). His recent work on large language models uses classic theory of mind tests that measure the ability of children to attribute false beliefs to other people.
A brave new world. A new crop of chatbotspowered by artificial intelligence has ignited a scramble to determine whether the technology could upend the economics of the internet, turning todays powerhouses into has-beens and creating the industrys next giants. Here are the bots to know:
ChatGPT. ChatGPT, the artificial intelligence language model from a research lab, OpenAI, has been making headlines since November for its ability to respond to complex questions, write poetry, generate code, plan vacationsand translate languages. GPT-4, the latest version introduced in mid-March, can even respond to images(and ace the Uniform Bar Exam).
Bing. Two months after ChatGPTs debut, Microsoft, OpenAIs primary investor and partner, added a similar chatbot, capable of having open-ended text conversations on virtually any topic, to its Bing internet search engine. But it was the bots occasionally inaccurate, misleading and weird responsesthat drew much of the attention after its release.
Ernie. The search giant Baidu unveiled Chinas first major rival to ChatGPT in March. The debut of Ernie, short for Enhanced Representation through Knowledge Integration, turned out to be a flopafter a promised live demonstration of the bot was revealed to have been recorded.
A famous example is the Sally-Anne test, in which a girl, Anne, moves a marble from a basket to a box when another girl, Sally, isnt looking. To know where Sally will look for the marble, researchers claimed, a viewer would have to exercise theory of mind, reasoning about Sallys perceptual evidence and belief formation: Sally didnt see Anne move the marble to the box, so she still believes it is where she last left it, in the basket.
Dr. Kosinski presented 10 large language models with 40 unique variations of these theory of mind tests descriptions of situations like the Sally-Anne test, in which a person (Sally) forms a false belief. Then he asked the models questions about those situations, prodding them to see whether they would attribute false beliefs to the characters involved and accurately predict their behavior. He found that GPT-3.5, released in November 2022, did so 90 percent of the time, and GPT-4, released in March 2023, did so 95 percent of the time.
The conclusion? Machines have theory of mind.
But soon after these results were released, Tomer Ullman, a psychologist at Harvard University, responded with a set of his own experiments, showing that small adjustments in the prompts could completely change the answers generated by even the most sophisticated large language models. If a container was described as transparent, the machines would fail to infer that someone could see into it. The machines had difficulty taking into account the testimony of people in these situations, and sometimes couldnt distinguish between an object being inside a container and being on top of it.
Maarten Sap, a computer scientist at Carnegie Mellon University, fed more than 1,000 theory of mind tests into large language models and found that the most advanced transformers, like ChatGPT and GPT-4, passed only about 70 percent of the time. (In other words, they were 70 percent successful at attributing false beliefs to the people described in the test situations.) The discrepancy between his data and Dr. Kosinskis could come down to differences in the testing, but Dr. Sap said that even passing 95 percent of the time would not be evidence of real theory of mind. Machines usually fail in a patterned way, unable to engage in abstract reasoning and often making spurious correlations, he said.
Dr. Ullman noted that machine learning researchers have struggled over the past couple of decades to capture the flexibility of human knowledge in computer models. This difficulty has been a shadow finding, he said, hanging behind every exciting innovation. Researchers have shown that language models will often give wrong or irrelevant answers when primed with unnecessary information before a question is posed; some chatbots were so thrown off by hypothetical discussions about talking birds that they eventually claimed that birds could speak. Because their reasoning is sensitive to small changes in their inputs, scientists have called the knowledge of these machines brittle.
Dr. Gopnik compared the theory of mind of large language models to her own understanding of general relativity. I have read enough to know what the words are, she said. But if you asked me to make a new prediction or to say what Einsteins theory tells us about a new phenomenon, Id be stumped because I dont really have the theory in my head. By contrast, she said, human theory of mind is linked with other common-sense reasoning mechanisms; it stands strong in the face of scrutiny.
In general, Dr. Kosinskis work and the responses to it fit into the debate about whether the capacities of these machines can be compared to the capacities of humans a debate that divides researchers who work on natural language processing. Are these machines stochastic parrots, or alien intelligences, or fraudulent tricksters? A 2022 survey of the field found that, of the 480 researchers who responded, 51 percent believed that large language models could eventually understand natural language in some nontrivial sense, and 49 percent believed that they could not.
Dr. Ullman doesnt discount the possibility of machine understanding or machine theory of mind, but he is wary of attributing human capacities to nonhuman things. He noted a famous 1944 study by Fritz Heider and Marianne Simmel, in which participants were shown an animated movie of two triangles and a circle interacting. When the subjects were asked to write down what transpired in the movie, nearly all described the shapes as people.
Lovers in the two-dimensional world, no doubt; little triangle number-two and sweet circle, one participant wrote. Triangle-one (hereafter known as the villain) spies the young love. Ah!
Its natural and often socially required to explain human behavior by talking about beliefs, desires, intentions and thoughts. This tendency is central to who we are so central that we sometimes try to read the minds of things that dont have minds, at least not minds like our own.
See the original post:
Have AI Chatbots Developed Theory of Mind? What We Do and Do Not Know. - The New York Times
- How Humans Could Soon Understand and Talk to Animals, Thanks to Machine Learning - SYFY - November 10th, 2025 [November 10th, 2025]
- Machine learning based analysis of diesel engine performance using FeO nanoadditive in sterculia foetida biodiesel blend - Nature - November 10th, 2025 [November 10th, 2025]
- Machine Learning in Maternal Care - Johns Hopkins Bloomberg School of Public Health - November 10th, 2025 [November 10th, 2025]
- Machine learning-based differentiation of benign and malignant adrenal lesions using 18F-FDG PET/CT: a two-stage classification and SHAP... - November 10th, 2025 [November 10th, 2025]
- How to Better Use AI and Machine Learning in Dermatology, With Renata Block, MMS, PA-C - HCPLive - November 10th, 2025 [November 10th, 2025]
- Avoiding Catastrophe: The Importance of Privacy when Leveraging AI and Machine Learning for Disaster Management - CSIS | Center for Strategic and... - November 10th, 2025 [November 10th, 2025]
- Efferocytosis-related signatures identified via Single-cell analysis and machine learning predict TNBC outcomes and immunotherapy response - Nature - November 10th, 2025 [November 10th, 2025]
- Arc Raiders' use of AI highlights the tension and confusion over where machine learning ends and generative AI begins - PC Gamer - November 3rd, 2025 [November 3rd, 2025]
- From performance to prediction: extracting aging data from the effects of base load aging on washing machines for a machine learning model - Nature - November 3rd, 2025 [November 3rd, 2025]
- Meet 'kvcached': A Machine Learning Library to Enable Virtualized, Elastic KV Cache for LLM Serving on Shared GPUs - MarkTechPost - October 28th, 2025 [October 28th, 2025]
- Bayesian-optimized machine learning boosts actual evapotranspiration prediction in water-stressed agricultural regions of China - Nature - October 28th, 2025 [October 28th, 2025]
- Using machine learning to shed light on how well the triage systems work - News-Medical - October 28th, 2025 [October 28th, 2025]
- Our Last Hope Before The AI Bubble Detonates: Taming LLMs - Machine Learning Week US - October 28th, 2025 [October 28th, 2025]
- Using multiple machine learning algorithms to predict spinal cord injury in patients with cervical spondylosis: a multicenter study - Nature - October 28th, 2025 [October 28th, 2025]
- The diagnostic potential of proteomics and machine learning in Lyme neuroborreliosis - Nature - October 28th, 2025 [October 28th, 2025]
- Using unsupervised machine learning methods to cluster cardio-metabolic profile of the middle-aged and elderly Chinese with general and central... - October 28th, 2025 [October 28th, 2025]
- The prognostic value of POD24 for multiple myeloma: a comprehensive analysis based on traditional statistics and machine learning - BMC Cancer - October 28th, 2025 [October 28th, 2025]
- Reducing inequalities using an unbiased machine learning approach to identify births with the highest risk of preventable neonatal deaths - Population... - October 28th, 2025 [October 28th, 2025]
- Association between SHR and mortality in critically ill patients with CVD: a retrospective analysis and machine learning approach - Diabetology &... - October 28th, 2025 [October 28th, 2025]
- AI-Powered Visual Storytelling: How Machine Learning Transforms Creative Content Production - About Chromebooks - October 28th, 2025 [October 28th, 2025]
- How beauty brand Shiseido nearly tripled revenue per user with machine learning - Performance Marketing World - October 28th, 2025 [October 28th, 2025]
- Magnite introduces machine learning-powered ad podding for streaming platforms - PPC Land - October 26th, 2025 [October 26th, 2025]
- Krafton is an AI first company and will invest 70M USD on machine learning - Female First - October 26th, 2025 [October 26th, 2025]
- Machine learning prediction of bacterial optimal growth temperature from protein domain signatures reveals thermoadaptation mechanisms - BMC Genomics - October 24th, 2025 [October 24th, 2025]
- Data Proportionality and Its Impact on Machine Learning Predictions of Ground Granulated Blast Furnace Slag Concrete Strength | Newswise - Newswise - October 24th, 2025 [October 24th, 2025]
- The Evolution of Machine Learning and Its Applications in Orthopaedics: A Bibliometric Analysis - Cureus - October 24th, 2025 [October 24th, 2025]
- Sentiment Analysis with Machine Learning Achieves 83.48% Accuracy in Predicting Consumer Behavior Trends - Quantum Zeitgeist - October 24th, 2025 [October 24th, 2025]
- Use of machine learning for risk stratification of chest pain patients in the emergency department - BMC Medical Informatics and Decision Making - October 24th, 2025 [October 24th, 2025]
- Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in... - October 24th, 2025 [October 24th, 2025]
- How Machine Learning Is Shrinking to Fit the Sensor Node - All About Circuits - October 24th, 2025 [October 24th, 2025]
- Machine learning models for mechanical properties prediction of basalt fiber-reinforced concrete incorporating graphical user interface - Nature - October 24th, 2025 [October 24th, 2025]
- Ohio wins national cybersecurity award for fraud solutions using machine learning - Spectrum News NY1 - October 24th, 2025 [October 24th, 2025]
- Itron Partners with Gordian Technologies to Enhance Grid Edge Intelligence with AI and Machine Learning Solutions - Quiver Quantitative - October 24th, 2025 [October 24th, 2025]
- Wearable sensors and machine learning give leg up on better running data - Medical Xpress - October 23rd, 2025 [October 23rd, 2025]
- Geophysical-machine learning tool developed for continuous subsurface geomaterials characterization - Phys.org - October 23rd, 2025 [October 23rd, 2025]
- Ohio wins national cybersecurity award for fraud solutions using machine learning - Spectrum News 1 - October 23rd, 2025 [October 23rd, 2025]
- Machine learning predictions of climate change effects on nearly threatened bird species ( Crithagra xantholaema) habitat in Ethiopia for conservation... - October 23rd, 2025 [October 23rd, 2025]
- A machine learning tool for predicting newly diagnosed osteoporosis in primary healthcare in the Stockholm Region - Nature - October 23rd, 2025 [October 23rd, 2025]
- ECBs New Perspective on Machine Learning in Banking - KPMG - October 23rd, 2025 [October 23rd, 2025]
- Ensemble Machine Learning for Digital Mapping of Soil pH and Electrical Conductivity in the Andean Agroecosystem of Peru - Frontiers - October 21st, 2025 [October 21st, 2025]
- New UA research develops machine learning to address needs of children with autism - AZPM News - October 21st, 2025 [October 21st, 2025]
- NMDSI Speaker Series on Weather Forecasting: What Machine Learning Can and Can't Do, Oct. 23 - Marquette Today - October 21st, 2025 [October 21st, 2025]
- Polyskill Achieves 1.7x Improved Skill Reuse and 9.4% Higher Success Rates through Polymorphic Abstraction in Machine Learning - Quantum Zeitgeist - October 21st, 2025 [October 21st, 2025]
- University of Strathclyde opens admission for MSc in Machine & Deep Learning for Jan 2026 intake - The Indian Express - October 21st, 2025 [October 21st, 2025]
- Reducing Model Biases with Machine Learning Corrections Derived from Ocean Data Assimilation Increments - ESS Open Archive - October 19th, 2025 [October 19th, 2025]
- Unlocking Obesity: Multi-Omics and Machine Learning Insights - Bioengineer.org - October 19th, 2025 [October 19th, 2025]
- Lockheed Martin advances PAC-3 MSE interceptor using artificial intelligence and machine learning - Defence Industry Europe - October 19th, 2025 [October 19th, 2025]
- Semi-automated surveillance of surgical site infections using machine learning and rule-based classification models - Nature - October 19th, 2025 [October 19th, 2025]
- AI and Machine Learning - City of San Jos to release RFP for generative AI platform - Smart Cities World - October 19th, 2025 [October 19th, 2025]
- Machine learning helps identify 'thermal switch' for next-generation nanomaterials - Phys.org - October 17th, 2025 [October 17th, 2025]
- Machine Learning Makes Wildlife Data Analysis Less of a Trek - Maryland.gov - October 17th, 2025 [October 17th, 2025]
- An interpretable multimodal machine learning model for predicting malignancy of thyroid nodules in low-resource scenarios - BMC Endocrine Disorders - October 17th, 2025 [October 17th, 2025]
- In First-Episode Psychosis Patients, Machine Learning Predicted Illness Trajectories to Potentially Improve Outcomes - Brain and Behavior Research - October 17th, 2025 [October 17th, 2025]
- Novel Machine Learning Model Improves MASLD Detection in Type 2 Diabetes - The American Journal of Managed Care (AJMC) - October 17th, 2025 [October 17th, 2025]
- Hybrid machine learning models for predicting the tensile strength of reinforced concrete incorporating nano-engineered and sustainable supplementary... - October 17th, 2025 [October 17th, 2025]
- Modelling of immune infiltration in prostate cancer treated with HDR-brachytherapy using Raman spectroscopy and machine learning - Nature - October 17th, 2025 [October 17th, 2025]
- Association between atherogenic index of plasma and sepsis in critically ill patients with ischemic stroke: a retrospective cohort study using... - October 17th, 2025 [October 17th, 2025]
- AI enters the nuclear age: Pentagon modernizes warheads with machine learning - Washington Times - October 17th, 2025 [October 17th, 2025]
- AI and Machine Learning - Bentley Systems shares its vision for trustworthy AI - Smart Cities World - October 17th, 2025 [October 17th, 2025]
- Looking back to move forward: can historical clinical trial data and machine learning drive change in participant recruitment in anticipation of... - October 15th, 2025 [October 15th, 2025]
- Physics-Based Machine Learning Paves the Way for Advanced 3D-Printed Materials - Bioengineer.org - October 15th, 2025 [October 15th, 2025]
- Predicting one-year overall survival in patients with AITL using machine learning algorithms: a multicenter study - Nature - October 15th, 2025 [October 15th, 2025]
- Explainable machine learning models for predicting of protein-energy wasting in patients on maintenance haemodialysis - BMC Nephrology - October 15th, 2025 [October 15th, 2025]
- Feasibility of machine learning analysis for the identification of patients with possible primary ciliary dyskinesia - Orphanet Journal of Rare... - October 15th, 2025 [October 15th, 2025]
- Machine learning-based prediction of preeclampsia using first-trimester inflammatory markers and red blood cell indices - BMC Pregnancy and Childbirth - October 15th, 2025 [October 15th, 2025]
- Utilizing AI and machine learning to improve railroad safety: Detecting trespasser hotspots - masstransitmag.com - October 15th, 2025 [October 15th, 2025]
- Precision medicine meets machine learning: AI and oncology biomarkers - pharmaphorum - October 15th, 2025 [October 15th, 2025]
- Aether Pro Exchange Transforms Execution Dynamics with Machine-Learning Optimization - GlobeNewswire - October 15th, 2025 [October 15th, 2025]
- Prevalence, associated factors, and machine learning-based prediction of depression, anxiety, and stress among university students: a cross-sectional... - October 15th, 2025 [October 15th, 2025]
- Artificial Intelligence vs. Machine Learning: Which skills will open better career options in the global - Times of India - October 15th, 2025 [October 15th, 2025]
- Study Reveals Impact of Negative Class Definitions on Machine Learning Accuracy in Immunotherapy - geneonline.com - October 15th, 2025 [October 15th, 2025]
- Muna Al-Khaifi: Detection of Breast Cancer Using Machine Learning and Explainable AI - Oncodaily - October 13th, 2025 [October 13th, 2025]
- Expedia Group Unveils Innovative AI and Machine Learning Solutions to Transform Partner Travel Experiences - Travel And Tour World - October 13th, 2025 [October 13th, 2025]
- Machine Learning-Guided Prediction of Formulation Performance in Inhalable CiprofloxacinBile Acid Dispersions with Antimicrobial and Toxicity... - October 13th, 2025 [October 13th, 2025]
- Machine Learning and BIG DATA workshop planned Oct. 14-15 - West Virginia University - October 11th, 2025 [October 11th, 2025]
- How Google enables third-party circularity by increasing recycling rates with Machine Learning - The World Business Council for Sustainable... - October 11th, 2025 [October 11th, 2025]
- Integrating Artificial Intelligence and Machine Learning in Hydroclimatic Research - A Promising Step Forward - University of Northern British... - October 11th, 2025 [October 11th, 2025]
- Semi-automatic detection of anteriorly displaced temporomandibular joint discs in magnetic resonance images using machine learning - BMC Oral Health - October 11th, 2025 [October 11th, 2025]
- AI and Machine Learning - Partnership to bring infrastructure intelligence to US public sector - Smart Cities World - October 11th, 2025 [October 11th, 2025]
- Between rain and snow, machine learning finds nine precipitation types - Phys.org - October 9th, 2025 [October 9th, 2025]