A Chatbot Beat the SAT. What Now? – The Atlantic

Last fall, when generative AI abruptly started turning out competent high-school- and college-level writing, some educators saw it as an opportunity. Perhaps it was time, at last, to dispose of the five-paragraph essay, among other bad teaching practices that have lingered for generations. Universities and colleges convened emergency town halls before winter terms began to discuss how large language models might reshape their work, for better and worse.

But just as quickly, most of those efforts evaporated into the reality of normal life. Educators and administrators have so many problems to address even before AI enters the picture; the prospect of utterly redesigning writing education and assessment felt impossible. Worthwhile, but maybe later. Then, with last weeks arrival of GPT-4, came another provocation. OpenAI, the company that created the new software, put out a paper touting its capacities. Among them: taking tests. AIs are no longer just producing passable five-paragraph essays. Now theyre excelling at the SAT, earning a score of 1410. Theyre getting passing grades on more than a dozen different AP exams. Theyre doing well enough on bar exams to be licensed as lawyers.

It would be nice if this news inspired educators, governments, certification agencies, and other groups to rethink what these tests really meanor even to reinvent them altogether. Alas, as was the case for rote-essay writing, whatever appetite for change the shock inspires might prove to be short-lived. GPT-4s achievements help reveal the underlying problem: Americans love standardized tests as much as we hate themand were unlikely to let them go even if doing so would be in our best interest.

Many of the initial responses to GPT-4s exam prowess were predictably immoderate: AI can keep up with human lawyers, or apply to Stanford, or make education useless. But why should it be startling in the slightest that software trained on the entire text of the internet performs well on standardized exams? AI can instantly run what amounts to an open-book test on any subject through statistical analysis and regression. Indeed, that anyone is surprised at all by this success suggests that people tend to get confused about what it means when computers prove effective at human activities.

Read: The college essay is dead

Back in the late 1990s, nobody thought a computer could ever beat a human at Go, the ancient Chinese game played with black and white stones. Chess had been mastered by supercomputers, but Go remainedat least in the hearts of its playersimmune to computation. They were wrong. Two decades later, DeepMinds AlphaGo was regularly beating Go masters. To accomplish this task, AlphaGo initially mimicked human players moves before running innumerable games against itself to find new strategies. The victory was construed by some as evidence that computers could overtake people at complex tasks previously thought to be uniquely human.

By rights, GPT-4s skill at the SAT should be taken as the opposite. Standardized tests feel inhuman from the start: You, a distinct individual, are forced to perform in a manner that can be judged by a machine, and then compared with that of many other individuals. Yet last weeks announcementof the 1410 score, the AP exams, and so ongave rise to an unease similar to that produced by AlphaGo.

Perhaps were anxious not that computers will strip us of humanity, but that machines will reveal the vanity of our human concerns. The experience of reasoning about your next set of moves in Go, as a human player doing so from the vantage point of human culture, cannot be replaced or reproduced by a Go-playing machineunless the only point of Go were to prove that Go can be mastered, rather than played. Such cultural values do exist: The designation of chess grand masters and Go 9-dan professionals suggests expertise in excess of mere performance in a folk game. The best players of chess and Go are sometimes seen as smart in a general sense, because they are good at a game that takes smarts of a certain sort. The same is true for AIs that play (and win) these games.

Read: A machine crushed us at Pokmon

Standardized tests occupy a similar cultural role. They were conceived to assess and communicate general performance on a subject such as math or reading. Whether and how they ever managed to do that is up for debate, but the accuracy and fairness of the exams became less important than their social function. To score a 1410 on the SAT says something about your capacities and prospectsmaybe you can get into Stanford. To pursue and then emerge victorious against a battery of AP tests suggests general ability warranting accelerated progress in college. (That victory doesnt necessarily provide that acceleration only emphasizes the seduction of its symbolism.) The bar exam measuresone hopessomeones subject-matter proficiency, but doesnt promise to ensure lawyerly effectiveness or even competence. To perform well on a standardized test indicates potential to perform well at some real future activity, but it has also come to have some value in itself, as a marker of success at taking tests.

That value was already being questioned, machine intelligence aside. Standardized tests have long been scrutinized for contributing to discrimination against minority and low-income students. The coronavirus pandemic, and its disruptions to educational opportunity, intensified those concerns. Many colleges and universities made the SAT and ACT optional for admissions. Graduate schools are giving up on the GRE, and aspiring law students may no longer have to take the LSAT in a couple of years.

GPT-4s purported prowess at these tests shows how little progress has been made at decoupling appearance from reality in the tests pursuit. Standardized tests might fairly assess human capacity, or they might do so unfairly, but either way, they hold an outsize role in Americans conception of themselves and their communities. Were nervous that tests might turn us into computers, but also that computers might reveal the conceit of valuing tests so much in the first place.

AI-based chess and Go computers didnt obsolesce play by people, but they did change human-training practices. Large language models may do the same for taking the SAT and other standardized exams, and evolve into a fancy form of test prep. In that case, they could end up helping those who would already have done well enough to score even higher. Or perhaps they will become the basis for a low-cost alternative that puts such training in the hands of everyonea reversal of examination inequity, and a democratization of vanity. No matter the case, the standardized tests will persist, only now the chatbots have to take them too.

Read more from the original source:
A Chatbot Beat the SAT. What Now? - The Atlantic

Related Posts

Comments are closed.