This could lead to the next big breakthrough in common sense AI – MIT Technology Review
AI models that can parse both language and visual input also have very practical uses. If we want to build robotic assistants, for example, they need computer vision to navigate the world and language to communicate about it to humans.
But combining both types of AI is easier said than done. It isnt as simple as stapling together an existing language model with an existing object recognition system. It requires training a new model from scratch with a data set that includes text and images, otherwise known as a visual-language data set.
The most common approach for curating such a data set is to compile a collection of images with descriptive captions. A picture like the one below, for example, would be captioned An orange cat sits in the suitcase ready to be packed. This differs from typical image data sets, which would label the same picture with only one noun, like cat. A visual-language data set can therefore teach an AI model not just how to recognize objects but how they relate to and act on one other, using verbs and prepositions.
But you can see why this data curation process would take forever. This is why the visual-language data sets that exist are so puny. A popular text-only data set like English Wikipedia (which indeed includes nearly all the English-language Wikipedia entries) might contain nearly 3 billion words. A visual-language data set like Microsoft Common Objects in Context, or MS COCO, contains only 7 million. Its simply not enough data to train an AI model for anything useful.
Vokenization gets around this problem, using unsupervised learning methods to scale the tiny amount ofdata in MS COCO to the size of English Wikipedia. The resultant visual-language model outperforms state-of-the-art models in some of the hardest tests used to evaluate AI language comprehension today.
You dont beat state of the art on these tests by just trying a little bit, says Thomas Wolf, the cofounder and chief science officer of the natural-language processing startup Hugging Face, who was not part of the research. This is not a toy test. This is why this is super exciting.
Lets first sort out some terminology. What on earth is a voken?
In AI speak, the words that are used to train language models are known as tokens. So the UNC researchers decided to call the image associated with each token in their visual-language model a voken. Vokenizer is what they call the algorithm that finds vokens for each token, and vokenization is what they call the whole process.
The point of this isnt just to show how much AI researchers love making up words. (They really do.) It also helps break down the basic idea behind vokenization. Instead of starting with an image data set and manually writing sentences to serve as captionsa very slow processthe UNC researchers started with a language data set and used unsupervised learning to match each word with a relevant image (more on this later). This is a highly scalable process.
The unsupervised learning technique, here, is ultimately the contribution of the paper. How do you actually find a relevant image for each word?
Lets go back for a moment to GPT-3. GPT-3 is part of a family of language models known as transformers, which represented a major breakthrough in applying unsupervised learning to natural-language processing when the first one was introduced in 2017. Transformers learn the patterns of human language by observing how words are used in context and then creating a mathematical representation of each word, known as a word embedding, based on that context. The embedding for the word cat might show, for example, that it is frequently used around the words meow and orange but less often around the words bark or blue.
This is how transformers approximate the meanings of words, and how GPT-3 can write such human-like sentences. It relies in part on these embeddings to tell it how to assemble words into sentences, and sentences into paragraphs.
Theres a parallel technique that can also be used for images. Instead of scanning text for word usage patterns, it scans images for visual patterns. It tabulates how often a cat, say, appears on a bed versus on a tree, and creates a cat embedding with this contextual information.
The insight of the UNC researchers was that they should use both embedding techniques on MS COCO. They converted the images into visual embeddings and the captions into word embeddings. Whats really neat about these embeddings is that they can then be graphed in a three-dimensional space, and you can literally see how they are related to one another. Visual embeddings that are closely related to word embeddings will appear closer in the graph. In other words, the visual cat embedding should (in theory) overlap with the text-based cat embedding. Pretty cool.
You can see where this is going. Once the embeddings are all graphed and compared and related to one another, its easy to start matching images (vokens) with words (tokens). And remember, because the images and words are matched based on their embeddings, theyre also matched based on context. This is useful when one word can have totally different meanings. The technique successfully handles that by finding different vokens for each instance of the word.
For example:
Go here to read the rest:
This could lead to the next big breakthrough in common sense AI - MIT Technology Review
- An AI Agent Was Banned From Creating Wikipedia Articles, Then Wrote Angry Blogs About Being Banned - 404 Media - April 5th, 2026 [April 5th, 2026]
- Edit War Breaks Out on Chillis Wikipedia Page Over Trump Donations - meidasnews.com - April 5th, 2026 [April 5th, 2026]
- Wikipedia Editors Tried and Tried to Work With AI Content, Eventually Realized It Was Total Trash and Banned It Entirely - Futurism - April 5th, 2026 [April 5th, 2026]
- Wikidata graphs for data visualisation of endangered horse breeds in Wikipedia - Wikimedia.org - April 5th, 2026 [April 5th, 2026]
- How Wikipedia of cyber helps SAP make sense of threat data - Computer Weekly - April 5th, 2026 [April 5th, 2026]
- Closing the Gender Gap on Wikipedia: Art + Feminism Edit-a-thon - WashU Libraries - April 5th, 2026 [April 5th, 2026]
- Wikipedia Shares Its Stance on AI-Written Articles - newsbreaks.infotoday.com - April 5th, 2026 [April 5th, 2026]
- AI Agent Runs the Im Being Censored Playbook After Getting Banned from Wikipedia - Gizmodo - April 5th, 2026 [April 5th, 2026]
- AI Agent Gets Banned From Wikipedia Then Accuses Human Editors of Uncivil Behavior - tech.yahoo.com - April 5th, 2026 [April 5th, 2026]
- Colm O'Regan: 'Browsing Wikipedia is like taking a bus, missing your stop, and waking up in a strange town' - Irish Examiner - April 5th, 2026 [April 5th, 2026]
- AI bot gets banned from Wikipedia, then writes angry blogs protesting about it - indiatoday.in - April 5th, 2026 [April 5th, 2026]
- Wikipedia Banned an AI Bot from Writing Articles. It Then Wrote an Angry Rant Blog - Republic World - April 5th, 2026 [April 5th, 2026]
- Wikipedia bans AI bot 'Tom': It responded with furious blog posts that went viral; heres what it said - bhaskarenglish.in - April 5th, 2026 [April 5th, 2026]
- AI Bot Protests Wikipedia Ban With Viral Angry Blogs; Heres What It Said - Mashable India - April 5th, 2026 [April 5th, 2026]
- Wikipedia Bans AI Agent for Spamming Articles AI Responds With Furious Blog Rants - International Business Times UK - April 5th, 2026 [April 5th, 2026]
- Arabic-language Wikipedia filled with terrorist propaganda, bias report - The Times of Israel - March 26th, 2026 [March 26th, 2026]
- I was surprised how upset some people got: A conversation with the creator of TomWikiAssist, the bot that edited Wikipedia - Nieman Lab - March 26th, 2026 [March 26th, 2026]
- Arabic Wikipedia Riddled With Terror Propaganda and Bias, New Investigation Shows - Algemeiner.com - March 26th, 2026 [March 26th, 2026]
- Wikipedia mulling whether to rename entry on Hamas beheading babies hoax - JNS - March 26th, 2026 [March 26th, 2026]
- GZERO WORLD WITH IAN BREMMER: In Wikipedia We Trust? - KPBS - March 26th, 2026 [March 26th, 2026]
- AI Memory Project Transforms Personal Photos Into a Wikipedia-Style Archive - Tech Times - March 26th, 2026 [March 26th, 2026]
- This guy used AI to document his grandmother's life on a personal Wikipedia and now you can, too - Boing Boing - March 26th, 2026 [March 26th, 2026]
- Wikipedia Bans AI-Generated Text With Two Exceptions What Every Editor Must Know Now - International Business Times UK - March 26th, 2026 [March 26th, 2026]
- Twenty-Five Years of Free Knowledge: Wiki Palestine Celebrates a Quarter Century of Wikipedia - Wikimedia.org - March 26th, 2026 [March 26th, 2026]
- Who is pushing the propaganda tag against Dhurandar on Wikipedia? How an anti-Hindu Wikipedia Editor booked in Manipur for inciting violence cited... - March 26th, 2026 [March 26th, 2026]
- World Jewish Congress report finds extensive, systemic bias on Arabic Wikipedia - JNS.org - JNS - March 26th, 2026 [March 26th, 2026]
- Quiz: Name these 10 national team managers from Wikipedia - Planet Football - March 26th, 2026 [March 26th, 2026]
- The Unsung Heroes of Kit Culture: Appreciating Wikipedia's Pixel Kit Artists - Footy Headlines - March 24th, 2026 [March 24th, 2026]
- Wikipedia has banned AI-generated text, with two exceptions - How-To Geek - March 24th, 2026 [March 24th, 2026]
- 39 Unusual Places With Their Own Wikipedia Pages That Showcase The Worlds Weirdest Sites - AOL.com - March 24th, 2026 [March 24th, 2026]
- PR firm linked to Gates-backed AGRA edited Wikipedia to remove criticism - U.S. Right to Know - March 24th, 2026 [March 24th, 2026]
- In Wikipedia We Trust? - WLIW - March 24th, 2026 [March 24th, 2026]
- Palestinians trained to fill Wikipedia with anti-Israel propaganda - The Telegraph - March 15th, 2026 [March 15th, 2026]
- SimWikiMap for MSFS 2024 brings Wikipedia to your cockpit tablet - MSFS Addons - March 15th, 2026 [March 15th, 2026]
- The Editors by Stephen Harrison: Wikipedia, internet communities, and the battle for truth in the digital age - New America - March 11th, 2026 [March 11th, 2026]
- Wikipedia Forced to Lock Down Edits Over JavaScript That Could Delete Pages - PCMag - March 9th, 2026 [March 9th, 2026]
- At 25, Wikipedia faces a double threat: the rise of AI and the decline of local media - CBC - March 9th, 2026 [March 9th, 2026]
- Oh no, Wikipedia has been turned into a gacha card game and I can already feel my time slipping away from me - Rock Paper Shotgun - March 9th, 2026 [March 9th, 2026]
- Please send help: We can't stop opening packs in Wikigacha, a browser-based card game where you collect Wikipedia articles like 'List of Red Hot Chili... - March 9th, 2026 [March 9th, 2026]
- Wikipedia hit by self-propagating JavaScript worm that vandalized pages - BleepingComputer - March 9th, 2026 [March 9th, 2026]
- Wikipedia's been turned into a Pokemon TCG-like gacha game where you collect its pages, because the random article button wasn't distracting enough... - March 9th, 2026 [March 9th, 2026]
- At 25, Wikipedia confronts twin challenges: the surge of AI and the downturn of local journalism. - stl.news - March 9th, 2026 [March 9th, 2026]
- Wikipedia administrator account compromised and temporarily put into read-only mode - GIGAZINE - March 9th, 2026 [March 9th, 2026]
- Zara Larsson Begs Wikipedia Editors to 'Cut It Out' and Stop Changing Her Photo to Unflattering Snap - People.com - February 20th, 2026 [February 20th, 2026]
- Knowledge is human: Co-founder Jimmy Wales on why Wikipedia still matters in an AI world - The Indian Express - February 20th, 2026 [February 20th, 2026]
- Zara Larsson begs fans to stop changing her Wikipedia photo - The Independent - February 20th, 2026 [February 20th, 2026]
- How to Use Jwikithe Wikipedia for all Things Epstein Files - inc.com - February 20th, 2026 [February 20th, 2026]
- Zara Larsson is at to war with Wikipedia over her photo - - Happy Mag - February 20th, 2026 [February 20th, 2026]
- Hamas-Linked NGO Trains Gazans to Influence Wikipedia Narratives on Israel - Combat Antisemitism Movement - February 20th, 2026 [February 20th, 2026]
- Zara Larsson Is Begging You to Stop Changing Her Wikipedia Photo - Exclaim! - February 20th, 2026 [February 20th, 2026]
- Meet wonderkid Tom Edozie who doesn't have Wikipedia and unknown to Wolves boss - The Sun - February 20th, 2026 [February 20th, 2026]
- IIT Guwahati Unveils Scalable Method To Detect Wikipedia Name Errors At AI Summit 2026 - BW Education - February 20th, 2026 [February 20th, 2026]
- Org. trains Gazans to edit Israel, Palestine on Wikipedia - The Jerusalem Post - February 18th, 2026 [February 18th, 2026]
- Theres a whole show about Wikipedia, and its delightful and hopeful - San Francisco Chronicle - February 18th, 2026 [February 18th, 2026]
- Wikipedia is having a renaissance in the age of AI - vox.com - February 18th, 2026 [February 18th, 2026]
- Wikipedia: The Non-Profit Exception on the Web in the AI Era | 2026 - nssmag.com - February 18th, 2026 [February 18th, 2026]
- German Wikipedia bans AI-generated content while other language editions take a softer approach - the-decoder.com - February 18th, 2026 [February 18th, 2026]
- #MCGlobalExclusive | ~ "AI doesn't understand what is real and what's not real.. At Wikipedia we believe knowledge is human." "There is... - February 18th, 2026 [February 18th, 2026]
- Wikipedia Founder Jimmy Wales On Building Systems That Trust People - Forbes - February 18th, 2026 [February 18th, 2026]
- Not sure whats going to happen, says Wikipedia co-founder Jimmy Wales as traffic dips - Moneycontrol - February 18th, 2026 [February 18th, 2026]
- Only 20% of Wikipedia Biographies Are About Women: This Effort Wants to Change That - ColoradoBoulevard.net - February 11th, 2026 [February 11th, 2026]
- Epstein Files: Al Seckel Boasts of Hacking Wikipedia to Scrub Epsteins Mugshot and Sex Offender Label Epstein bragged that his team bypassed... - February 11th, 2026 [February 11th, 2026]
- Building Teachers Capacity to Read and Use Wikipedia in the Classroom - Wikimedia.org - February 11th, 2026 [February 11th, 2026]
- What AI Can Learn from YouTube and Wikipedia - Muse by Clio - February 7th, 2026 [February 7th, 2026]
- When Wikipedia Takes the Stage: A Slam to Celebrate 25 Years of Free Knowledge - Wikimedia.org - February 7th, 2026 [February 7th, 2026]
- Clearance watch suits season 1 episode 6 Hotsell Suits season 6 Wikipedia - Through The Fence Baseball - February 7th, 2026 [February 7th, 2026]
- Celebrating Wikipedia at 25: Reflections from the January 2026 EduWiki Knowledge Showcase - Wikimedia.org - February 7th, 2026 [February 7th, 2026]
- Extreme anti-Zionists taking over Wikipedia, former US official says - JNS.org - February 7th, 2026 [February 7th, 2026]
- Celebrating Wikipedia 25 by Gathering and Editing Sasaknese Wikipedia and Wiktionary - Wikimedia.org - February 7th, 2026 [February 7th, 2026]
- Wikipedia's list of inventors killed by their own inventions keeps growing - Boing Boing - February 7th, 2026 [February 7th, 2026]
- Wikipedia's "List of lists of lists" contains itself - Boing Boing - February 7th, 2026 [February 7th, 2026]
- Shark Tanks Barbara Corcoran Once Faked Her Own Death and Even Fooled Wikipedia - Shark Tank Blog - February 7th, 2026 [February 7th, 2026]
- As Wikipedia celebrates its 25th anniversary, we spoke with the head of machine learning and data engineering at the Wikimedia Foundation about AI,... - February 7th, 2026 [February 7th, 2026]
- Creepy jail cell pics and Trump Wikipedia page included in new Jeffrey Epstein files - The Independent - February 1st, 2026 [February 1st, 2026]
- Wikipedia Inks AI Deals with Microsoft, Meta and Perplexity on 25th Birthday - Broadband Breakfast - February 1st, 2026 [February 1st, 2026]
- People Shared The Most Extremely Wild, Dark, And Interesting Wikipedia "Facts" - BuzzFeed - February 1st, 2026 [February 1st, 2026]
- Wikipedia Is 25 Years Old. How Does That Make You Feel? - VICE - February 1st, 2026 [February 1st, 2026]
- The IAC and Wikimedia Spain promote an edit-a-thon to raise the profile of women in astronomy on Wikipedia - Instituto de Astrofsica de Canarias IAC - February 1st, 2026 [February 1st, 2026]
- Fact check | Viral screenshot shows Ajit Pawar's death was updated on Wikipedia hours before Baramati crash - WION - February 1st, 2026 [February 1st, 2026]
- Netflixs Take That documentary feels like a Wikipedia entry brought to life - The Telegraph - January 28th, 2026 [January 28th, 2026]