This could lead to the next big breakthrough in common sense AI – MIT Technology Review
AI models that can parse both language and visual input also have very practical uses. If we want to build robotic assistants, for example, they need computer vision to navigate the world and language to communicate about it to humans.
But combining both types of AI is easier said than done. It isnt as simple as stapling together an existing language model with an existing object recognition system. It requires training a new model from scratch with a data set that includes text and images, otherwise known as a visual-language data set.
The most common approach for curating such a data set is to compile a collection of images with descriptive captions. A picture like the one below, for example, would be captioned An orange cat sits in the suitcase ready to be packed. This differs from typical image data sets, which would label the same picture with only one noun, like cat. A visual-language data set can therefore teach an AI model not just how to recognize objects but how they relate to and act on one other, using verbs and prepositions.
But you can see why this data curation process would take forever. This is why the visual-language data sets that exist are so puny. A popular text-only data set like English Wikipedia (which indeed includes nearly all the English-language Wikipedia entries) might contain nearly 3 billion words. A visual-language data set like Microsoft Common Objects in Context, or MS COCO, contains only 7 million. Its simply not enough data to train an AI model for anything useful.
Vokenization gets around this problem, using unsupervised learning methods to scale the tiny amount ofdata in MS COCO to the size of English Wikipedia. The resultant visual-language model outperforms state-of-the-art models in some of the hardest tests used to evaluate AI language comprehension today.
You dont beat state of the art on these tests by just trying a little bit, says Thomas Wolf, the cofounder and chief science officer of the natural-language processing startup Hugging Face, who was not part of the research. This is not a toy test. This is why this is super exciting.
Lets first sort out some terminology. What on earth is a voken?
In AI speak, the words that are used to train language models are known as tokens. So the UNC researchers decided to call the image associated with each token in their visual-language model a voken. Vokenizer is what they call the algorithm that finds vokens for each token, and vokenization is what they call the whole process.
The point of this isnt just to show how much AI researchers love making up words. (They really do.) It also helps break down the basic idea behind vokenization. Instead of starting with an image data set and manually writing sentences to serve as captionsa very slow processthe UNC researchers started with a language data set and used unsupervised learning to match each word with a relevant image (more on this later). This is a highly scalable process.
The unsupervised learning technique, here, is ultimately the contribution of the paper. How do you actually find a relevant image for each word?
Lets go back for a moment to GPT-3. GPT-3 is part of a family of language models known as transformers, which represented a major breakthrough in applying unsupervised learning to natural-language processing when the first one was introduced in 2017. Transformers learn the patterns of human language by observing how words are used in context and then creating a mathematical representation of each word, known as a word embedding, based on that context. The embedding for the word cat might show, for example, that it is frequently used around the words meow and orange but less often around the words bark or blue.
This is how transformers approximate the meanings of words, and how GPT-3 can write such human-like sentences. It relies in part on these embeddings to tell it how to assemble words into sentences, and sentences into paragraphs.
Theres a parallel technique that can also be used for images. Instead of scanning text for word usage patterns, it scans images for visual patterns. It tabulates how often a cat, say, appears on a bed versus on a tree, and creates a cat embedding with this contextual information.
The insight of the UNC researchers was that they should use both embedding techniques on MS COCO. They converted the images into visual embeddings and the captions into word embeddings. Whats really neat about these embeddings is that they can then be graphed in a three-dimensional space, and you can literally see how they are related to one another. Visual embeddings that are closely related to word embeddings will appear closer in the graph. In other words, the visual cat embedding should (in theory) overlap with the text-based cat embedding. Pretty cool.
You can see where this is going. Once the embeddings are all graphed and compared and related to one another, its easy to start matching images (vokens) with words (tokens). And remember, because the images and words are matched based on their embeddings, theyre also matched based on context. This is useful when one word can have totally different meanings. The technique successfully handles that by finding different vokens for each instance of the word.
For example:
Go here to read the rest:
This could lead to the next big breakthrough in common sense AI - MIT Technology Review
- Wikipedia is facing attacks from the White House and Musk. Its founder isn't worried - NPR - November 7th, 2025 [November 7th, 2025]
- We tried Elon Musks Wikipedia clone. Its as racist as youd expect - The Sydney Morning Herald - November 7th, 2025 [November 7th, 2025]
- We tried Elon Musks Wikipedia clone. Its as racist as youd expect - The Sydney Morning Herald - November 7th, 2025 [November 7th, 2025]
- Ranked: The Most Viewed Wikipedia Pages of 2025 (So Far) - Visual Capitalist - November 7th, 2025 [November 7th, 2025]
- Ranked: The Most Viewed Wikipedia Pages of 2025 (So Far) - Visual Capitalist - November 7th, 2025 [November 7th, 2025]
- I Fell Into The Darkest Parts Of Wikipedia And I Want A Refund - BuzzFeed - November 7th, 2025 [November 7th, 2025]
- I Fell Into The Darkest Parts Of Wikipedia And I Want A Refund - BuzzFeed - November 7th, 2025 [November 7th, 2025]
- How Wikipedia co-founder Jimmy Wales may have agreed with Elon Musk that Wikipedia is 'biased' - The Times of India - November 7th, 2025 [November 7th, 2025]
- We tried Elon Musks Wikipedia clone. Its as racist as youd expect - The Age - November 7th, 2025 [November 7th, 2025]
- INSEAD launches Botipedia, an AI-created encyclopedic knowledge portal that claims to be 6,000 times larger than Wikipedia - EdTech Innovation Hub - November 7th, 2025 [November 7th, 2025]
- I tried Elon Musk's Wikipedia clone and boy is it racist - SFGATE - November 5th, 2025 [November 5th, 2025]
- Elon Musk? AI? Crazy left-wing activists? The main who built Wikipedia explains its biggest threats - BBC Science Focus Magazine - November 5th, 2025 [November 5th, 2025]
- Musk version of Wikipedia takes different tack on climate - E&E News by POLITICO - November 5th, 2025 [November 5th, 2025]
- I tried Grokipedia. It has something to teach Wikipedia about AI. - Business Insider - November 3rd, 2025 [November 3rd, 2025]
- Step aside, Wikipedia; its Grok to the future - Washington Times - November 3rd, 2025 [November 3rd, 2025]
- AI answers are taking a bite of Wikipedia's traffic. Should we be worried for the site? - Business Insider - November 3rd, 2025 [November 3rd, 2025]
- Wikipedia sends 'note' to everyone on the internet as it takes on Elon Musk's Grokipedia - The Times of India - November 3rd, 2025 [November 3rd, 2025]
- What Elon Musks Version of Wikipedia Thinks About Hitler, Putin, and Apartheid - The Atlantic - November 3rd, 2025 [November 3rd, 2025]
- I tried Grokipedia, the AI-powered anti-Wikipedia. Here's why neither is foolproof - ZDNET - November 3rd, 2025 [November 3rd, 2025]
- Why Wikipedia Is Losing Traffic to AI Overviews on Google - CNET - November 3rd, 2025 [November 3rd, 2025]
- Grokipedia vs Wikipedia: How Elon Musk's AI-generated encyclopaedia holds up against the left-leaning cro - The Times of India - November 3rd, 2025 [November 3rd, 2025]
- WIKIPEDIA CO-FOUNDER: WIKIPEDIA WILL BE LEFT IN THE DUST BY GROKIPEDIA" Ex-founder of Wikipedia, Larry Sanger: "The neat thing that theyre... - November 3rd, 2025 [November 3rd, 2025]
- How AI could soon be used by Wikipedia, according to its founder - BBC Science Focus Magazine - November 3rd, 2025 [November 3rd, 2025]
- Grokipedia Is the Antithesis of Everything That Makes Wikipedia Good, Useful, and Human - 404 Media - November 3rd, 2025 [November 3rd, 2025]
- Seth Meyers Drags Trump for Having an Entire Wikipedia Page Dedicated to His Handshake Technique | Video - TheWrap - November 3rd, 2025 [November 3rd, 2025]
- Elon Musk Launches AI-Powered Rival to Wikipedia and Its Already Been Accused of Copying Wiki Pages - People.com - November 3rd, 2025 [November 3rd, 2025]
- Wikipedia says AI answers are starting to take a bite. There are reasons to be worried. - Yahoo News Canada - November 3rd, 2025 [November 3rd, 2025]
- What Wikipedia and Grokipedia are saying about each other - KGOU - November 3rd, 2025 [November 3rd, 2025]
- I pitted Wikipedia against Elon Musks new Grokipedia heres which one gave the better answers - Tom's Guide - November 3rd, 2025 [November 3rd, 2025]
- Explained | What is Grokipedia, Musk's AI alternative to human-edited Wikipedia - Deccan Herald - November 3rd, 2025 [November 3rd, 2025]
- AI still cant beat Wikipedia when it comes to integrity - The Observer - November 3rd, 2025 [November 3rd, 2025]
- Elon Musk's 'Grokipedia' cites Wikipedia as a source, even though it's the exact thing he's trying to replace because he thinks it's 'woke' - Fortune - November 3rd, 2025 [November 3rd, 2025]
- WIKIPEDIA TRIED TO ROAST GROKIPEDIA AND COOKED ITS OWN CREDIBILITY In a new fundraising pop-up, Wikipedia throws shade at Grokipedia, bragging it's... - November 3rd, 2025 [November 3rd, 2025]
- Elon Musk wants to dethrone Wikipedia with Grokipedia - MSN - November 3rd, 2025 [November 3rd, 2025]
- Grokipedia: Far right talking points or much-needed antidote to Wikipedia? - TradingView - November 3rd, 2025 [November 3rd, 2025]
- Hi, Its Me, Wikipedia, and I Am Ready for Your Apology - McSweeneys Internet Tendency - October 28th, 2025 [October 28th, 2025]
- Watch Wikipedia Founder Wales Explores Trust in the Digital Age - Bloomberg.com - October 28th, 2025 [October 28th, 2025]
- He co-founded Wikipedia. Now hes inspiring Elon Musk to build a rival. - Yahoo - October 28th, 2025 [October 28th, 2025]
- 'An astonishing situation': Wikipedia co-founder bashes Trump's latest attacks on trust - rawstory.com - October 28th, 2025 [October 28th, 2025]
- Trust and empathy should be baked into tech from the start, says Wikipedia co-founder - marketplace.org - October 28th, 2025 [October 28th, 2025]
- Elon Musks Grokipedia copying Wikipedia? Here's all you need to know about the AI-powered encyclopedia - The Economic Times - October 28th, 2025 [October 28th, 2025]
- Explained: What is Elon Musks Grokipedia and how it differs from Wikipedia - The Federal - October 28th, 2025 [October 28th, 2025]
- Grokipedia Vs Wikipedia: How Is The Elon Musk's AI-Powered Rival Different From The Encyclopedia? - Mashable India - October 28th, 2025 [October 28th, 2025]
- Elon Musks xAI launches AI-powered Grokipedia database to replace Wikipedia - The Hindu - October 28th, 2025 [October 28th, 2025]
- Grokipedia is online: Elon Musk's AI encyclopedia wants to crush Wikipedia - Cointribune - October 28th, 2025 [October 28th, 2025]
- Elon Musks Grokipedia Takes Aim at Wikipedia Truth Revolution or Biased Echo Chamber? - ts2.tech - October 28th, 2025 [October 28th, 2025]
- Elon Musks Version of Wikipedia Is Live. Heres What the Difference Is - Gizmodo - October 28th, 2025 [October 28th, 2025]
- Even Grokipedia needs Wikipedia to exist: Is Elon Musk's AI-powered encyclopedia less biased as he claims? - theweek.in - October 28th, 2025 [October 28th, 2025]
- Elon Musks Wikipedia Alternative Grokipedia Goes Live: Heres How To Use It - NDTV Profit - October 28th, 2025 [October 28th, 2025]
- Cry Us a River: AI Chatbots May Be Killing Wikipedia - Science and Culture Today - October 28th, 2025 [October 28th, 2025]
- Elon Musk launches rival to challenge Wikipedia; Here's all you need to know about this - DNA India - October 28th, 2025 [October 28th, 2025]
- GROKIPEDIA IS ALREADY MORE ACCURATE THAN WIKIPEDIA AND IT SHOWS Grokipedia just proved why it is rewriting how knowledge works online. Look at how it... - October 28th, 2025 [October 28th, 2025]
- Nothing But The Truth: Will Elon Musk's Grokipedia Deal A Death Blow To 'Woke' Wikipedia? - News18 - October 28th, 2025 [October 28th, 2025]
- Grokipedia launched by Elon Musk to take on Wikipedia: Heres how to use it, new AI features, early controversy, and more - financialexpress.com - October 28th, 2025 [October 28th, 2025]
- Grokipedia Debuts: Elon Musks AI-Powered Alternative to Wikipedia - parameter.io - October 28th, 2025 [October 28th, 2025]
- The Wikipedia Page on "Brain Rot" Is Protected Until 2026 Due to Extensive Vandalism - Futurism - October 26th, 2025 [October 26th, 2025]
- 'I was very nervous at first' - how the founder of Wikipedia learnt to embrace trust - RNZ - October 26th, 2025 [October 26th, 2025]
- A Wikipedia cofounder is fueling the rights campaign against it - The Washington Post - October 24th, 2025 [October 24th, 2025]
- Where does Wikipedia go in the age of AI? - Financial Times - October 24th, 2025 [October 24th, 2025]
- Wikipedia co-founder Larry Sangers long-standing claims of liberal bias and mismanagement at the worlds dominant online encyclopedia are being... - October 24th, 2025 [October 24th, 2025]
- Grokipedia was supposed to rival Wikipedia but Elon Musk pulled the plug (for now) - Tom's Guide - October 24th, 2025 [October 24th, 2025]
- Murdaugh: Death In The Family Owes More Than You Think To One Wikipedia Line - Screen Rant - October 24th, 2025 [October 24th, 2025]
- Wikipedia blames ChatGPT for falling traffic and claims bots are stealing its hard work - New York Post - October 24th, 2025 [October 24th, 2025]
- Wikipedia co-founder Jimmy Wales on the crisis of trust in the age of Trump - Channel 4 - October 24th, 2025 [October 24th, 2025]
- Alabama-born co-founder of Wikipedia has a new book coming out this month - AL.com - October 23rd, 2025 [October 23rd, 2025]
- Six weeks after deadline, House panel still awaits bias, Jew-hatred materials from Wikipedia parent - JNS.org - October 23rd, 2025 [October 23rd, 2025]
- The 24 Wikipedia pages for NHL rivalries, ranked by their single wildest passage - The New York Times - October 23rd, 2025 [October 23rd, 2025]
- From clicks to chat: Why Wikipedia sees fewer visitors in the AI era - Gulf News - October 21st, 2025 [October 21st, 2025]
- Wikipedia says AI is causing visitor numbers to plummet - The Independent - October 21st, 2025 [October 21st, 2025]
- Wikipedia says traffic is falling due to AI search summaries and social video - TechCrunch - October 19th, 2025 [October 19th, 2025]
- Wikipedia Conference Disrupted by Gun Threat in NYC - Newsweek - October 19th, 2025 [October 19th, 2025]
- Even Wikipedia is hemorrhaging traffic to AI. - The Verge - October 19th, 2025 [October 19th, 2025]
- Wikipedia Views Down 8%: Are Bots and TikTok to Blame? - KnowTechie - October 19th, 2025 [October 19th, 2025]
- Man with gun arrested during Wikipedia conference in Union Square - FOX 5 New York - October 19th, 2025 [October 19th, 2025]
- Wikipedia reports decline in traffic as AI Summaries replace clicks - Times of India - October 19th, 2025 [October 19th, 2025]
- Heroic volunteers wrestle armed gunman draped in sick flag off stage during Wikipedia conference in New York - Daily Mail - October 19th, 2025 [October 19th, 2025]
- Wikimedia says AI bots and summaries are hurting Wikipedia's traffic - Engadget - October 19th, 2025 [October 19th, 2025]
- A Conversation with Jimmy Wales, Founder of Wikipedia - Welcome to the United Nations - October 17th, 2025 [October 17th, 2025]
- WIKIPEDIA CO FOUNDER: AI COMPETING TO WRITE ENCYCLOPEDIAS WOULD BE FASCINATING Wikipedia Co-Founder, Larry Sanger: "I think competition to write... - October 17th, 2025 [October 17th, 2025]
- Can humans and bots share the Internet? Wikipedia thinks so. - IBM - October 15th, 2025 [October 15th, 2025]