This could lead to the next big breakthrough in common sense AI – MIT Technology Review
AI models that can parse both language and visual input also have very practical uses. If we want to build robotic assistants, for example, they need computer vision to navigate the world and language to communicate about it to humans.
But combining both types of AI is easier said than done. It isnt as simple as stapling together an existing language model with an existing object recognition system. It requires training a new model from scratch with a data set that includes text and images, otherwise known as a visual-language data set.
The most common approach for curating such a data set is to compile a collection of images with descriptive captions. A picture like the one below, for example, would be captioned An orange cat sits in the suitcase ready to be packed. This differs from typical image data sets, which would label the same picture with only one noun, like cat. A visual-language data set can therefore teach an AI model not just how to recognize objects but how they relate to and act on one other, using verbs and prepositions.
But you can see why this data curation process would take forever. This is why the visual-language data sets that exist are so puny. A popular text-only data set like English Wikipedia (which indeed includes nearly all the English-language Wikipedia entries) might contain nearly 3 billion words. A visual-language data set like Microsoft Common Objects in Context, or MS COCO, contains only 7 million. Its simply not enough data to train an AI model for anything useful.
Vokenization gets around this problem, using unsupervised learning methods to scale the tiny amount ofdata in MS COCO to the size of English Wikipedia. The resultant visual-language model outperforms state-of-the-art models in some of the hardest tests used to evaluate AI language comprehension today.
You dont beat state of the art on these tests by just trying a little bit, says Thomas Wolf, the cofounder and chief science officer of the natural-language processing startup Hugging Face, who was not part of the research. This is not a toy test. This is why this is super exciting.
Lets first sort out some terminology. What on earth is a voken?
In AI speak, the words that are used to train language models are known as tokens. So the UNC researchers decided to call the image associated with each token in their visual-language model a voken. Vokenizer is what they call the algorithm that finds vokens for each token, and vokenization is what they call the whole process.
The point of this isnt just to show how much AI researchers love making up words. (They really do.) It also helps break down the basic idea behind vokenization. Instead of starting with an image data set and manually writing sentences to serve as captionsa very slow processthe UNC researchers started with a language data set and used unsupervised learning to match each word with a relevant image (more on this later). This is a highly scalable process.
The unsupervised learning technique, here, is ultimately the contribution of the paper. How do you actually find a relevant image for each word?
Lets go back for a moment to GPT-3. GPT-3 is part of a family of language models known as transformers, which represented a major breakthrough in applying unsupervised learning to natural-language processing when the first one was introduced in 2017. Transformers learn the patterns of human language by observing how words are used in context and then creating a mathematical representation of each word, known as a word embedding, based on that context. The embedding for the word cat might show, for example, that it is frequently used around the words meow and orange but less often around the words bark or blue.
This is how transformers approximate the meanings of words, and how GPT-3 can write such human-like sentences. It relies in part on these embeddings to tell it how to assemble words into sentences, and sentences into paragraphs.
Theres a parallel technique that can also be used for images. Instead of scanning text for word usage patterns, it scans images for visual patterns. It tabulates how often a cat, say, appears on a bed versus on a tree, and creates a cat embedding with this contextual information.
The insight of the UNC researchers was that they should use both embedding techniques on MS COCO. They converted the images into visual embeddings and the captions into word embeddings. Whats really neat about these embeddings is that they can then be graphed in a three-dimensional space, and you can literally see how they are related to one another. Visual embeddings that are closely related to word embeddings will appear closer in the graph. In other words, the visual cat embedding should (in theory) overlap with the text-based cat embedding. Pretty cool.
You can see where this is going. Once the embeddings are all graphed and compared and related to one another, its easy to start matching images (vokens) with words (tokens). And remember, because the images and words are matched based on their embeddings, theyre also matched based on context. This is useful when one word can have totally different meanings. The technique successfully handles that by finding different vokens for each instance of the word.
For example:
Go here to read the rest:
This could lead to the next big breakthrough in common sense AI - MIT Technology Review
- Wikipedia's What Came First? game arrives on iPhone: Here's how to access it - Moneycontrol.com - June 16th, 2026 [June 16th, 2026]
- Wikipedia can be biased on disputed issues: Co-founder on bias against India, Hindus | 4 anonymous editors altered HAF's Wikipedia page | Inshorts -... - June 16th, 2026 [June 16th, 2026]
- Wikipedia Exposed: 12 cases that raise serious questions about Anti-Hindu bias and narrative manipulation - VSK Telangana - June 16th, 2026 [June 16th, 2026]
- Harry Souttar: Socceroos rock and newly appointed defence minister according to Wikipedia - LCANews - June 16th, 2026 [June 16th, 2026]
- Wikipedia just launched its daily historical facts game on iPhone: Which came first? - 9to5Mac - June 12th, 2026 [June 12th, 2026]
- Wikipedia Has More Than 40 Million Entries But These 82 Are Weirder Than The Others - AOL.com - June 7th, 2026 [June 7th, 2026]
- Celebrating 25 Years of Wikipedia in Slovakia - Wikimedia.org - June 7th, 2026 [June 7th, 2026]
- Translating Baltics Naval History: Bringing the Royal Baltic Fleet to Spanish Wikipedia - Wikimedia.org - June 7th, 2026 [June 7th, 2026]
- I Chose to Preserve, Not Just Translate: Keeping Setswana Alive on Wikipedia - Wikimedia.org - June 7th, 2026 [June 7th, 2026]
- This retelling of Aztec history will lead you down some random Wikipedia pages - waterford-news.ie - June 7th, 2026 [June 7th, 2026]
- Hundreds of prolific Wikipedia editors are threatening to go on strike - The Verge - May 29th, 2026 [May 29th, 2026]
- Cross-Platform and Cross-Lingual Dynamics of Wikipedia Sharing and Contribution - The Association for the Advancement of Artificial Intelligence - May 29th, 2026 [May 29th, 2026]
- 2026 Wikipedia Edit-a-thon: Women Photojournalists - National Museum of Women in the Arts - May 29th, 2026 [May 29th, 2026]
- Hundreds of Wikipedia editors are threatening to go on strike and the reason is these engineers - The Times of India - May 29th, 2026 [May 29th, 2026]
- How Anonymous Wikipedia Editors Influence Global Narratives and AI Systems - Foundation for Defense of Democracies - May 29th, 2026 [May 29th, 2026]
- 21 Extremely Creepy Wikipedia Pages That Are For Adults Only - BuzzFeed - May 29th, 2026 [May 29th, 2026]
- Every Museum Has a Story: Shared Through Collaboration on Bangla Wikipedia - Wikimedia.org - May 29th, 2026 [May 29th, 2026]
- The Wikimedia Foundation, which operates Wikipedia, has fired its former CTO and disbanded its community technology team, drawing criticism for... - May 29th, 2026 [May 29th, 2026]
- Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms - The Association for the Advancement of Artificial Intelligence - May 29th, 2026 [May 29th, 2026]
- This Page Shared 69 Weird Animal Facts People Discovered While Falling Into A Wikipedia Rabbit Hole - AOL.com - May 29th, 2026 [May 29th, 2026]
- Records of an elementary school that is closing will be preserved on Wikipedia - Wikimedia.org - May 29th, 2026 [May 29th, 2026]
- This Wikipedia clone is entirely generated by AI. Users are turning it into a cesspool - Fast Company - May 16th, 2026 [May 16th, 2026]
- Wikipedia and Reddit Now Drive Over 25% of ChatGPT Citations in the U.S., New 5W Research Finds -- WSJ, NYT, and Bloomberg Do Not Appear in the Top 20... - May 16th, 2026 [May 16th, 2026]
- A Wikipedia Clone Built on AI Hallucinations Is Here to Hasten Along the Death of the Internet - Gizmodo - May 16th, 2026 [May 16th, 2026]
- Left-Wing Wikipedia Editors Fight To Keep Democrat Adam Hamawys Ties to Blind Sheikh Offline Even Though House Candidate Testified to Their Friendship... - May 16th, 2026 [May 16th, 2026]
- This bloody Wikipedia is 100% AI delusion and thats the point - Cybernews - May 16th, 2026 [May 16th, 2026]
- Halupedia explained: Why AI Wikipedia clone is raising red flags - The News International - May 16th, 2026 [May 16th, 2026]
- The Perfect Degenerate Time-Killer: Halupedia The Infinite Hallucinating Wikipedia - quasa.io - May 13th, 2026 [May 13th, 2026]
- 'A really bad idea': Wikipedia's Jimmy Wales on Australia's social media ban, trust and the truth - Crikey - May 1st, 2026 [May 1st, 2026]
- The Wikipedia Play: Overlooked Reputation Lever for Law Firms in the AI Era - Law.com - May 1st, 2026 [May 1st, 2026]
- Indonesia, Wikimedia reach deal to keep Wikipedia accessible amid regulatory concerns - Indonesia Business Post - May 1st, 2026 [May 1st, 2026]
- Capacity Building: Beyond Article Writing Organizing Wikipedia in Your Language with Categories and Other Curation Tools - Wikimedia.org - May 1st, 2026 [May 1st, 2026]
- Wikipedia has become a battlefield, and we are on the losing side - ynetnews - April 27th, 2026 [April 27th, 2026]
- How to Find the Best and Cheapest Airfares Using Google Flights and Wikipedia (Yes, Wikipedia!) - AFAR - April 27th, 2026 [April 27th, 2026]
- FAO expands free public access to agrifood knowledge through collaboration on Wikipedia - Food and Agriculture Organization - April 27th, 2026 [April 27th, 2026]
- Depth Of A Wikipedia Article: Michael Jackson Biopic Earns Negative Reviews, Here Are The Most Brutal - AOL.com - April 27th, 2026 [April 27th, 2026]
- Meta is logging employee keystrokes on Google LinkedIn and Wikipedia to feed its AI models - Startup Fortune - April 27th, 2026 [April 27th, 2026]
- Pat Kane: Wikipedia, encyclopaedias, and the dark art of 'wiki-laundering' - The National Scot - April 27th, 2026 [April 27th, 2026]
- 25 years of Wikipedia - ucanews.com - April 19th, 2026 [April 19th, 2026]
- In Belarusian Wikipedia, edits to political articles can no longer be hidden. Why did this happen, and what a - - April 19th, 2026 [April 19th, 2026]
- March @ WMGH: Documenting Women in Highlife and Growing Our Wikipedia Editing Community - Wikimedia.org - April 19th, 2026 [April 19th, 2026]
- Now the PlayStation 3 game emulator configures everything itself - RPCS3 will use data from Wikipedia - ixbt.games - April 19th, 2026 [April 19th, 2026]
- Celebrating Wikipedia 25 in Tashkent: A New Generation of Uzbek Wikimedians Takes the Lead - Wikimedia.org - April 17th, 2026 [April 17th, 2026]
- Cebuano Wikipedia: From Ghost Town to Growth Engine - Wikimedia.org - April 17th, 2026 [April 17th, 2026]
- Celebrating 25 Years of Wikipedia at Manipal University Jaipur: Learning, Innovation, and Community - Wikimedia.org - April 17th, 2026 [April 17th, 2026]
- Wikipedia founder says trust is broken here's how to rebuild it - axios.com - April 7th, 2026 [April 7th, 2026]
- Women in the spotlight: stories that are shaping Wikipedia - Wikimedia.org - April 7th, 2026 [April 7th, 2026]
- Writing against the status quo: What can a Suriname edit-a-thon add to the Wikipedia public sphere? - Diggit Magazine - April 7th, 2026 [April 7th, 2026]
- Musician Plays Magnetic Reel-to-Reel Tape in Sync With Wikipedia Articles for Its 25th Anniversary - Laughing Squid - April 7th, 2026 [April 7th, 2026]
- Meet the group correcting gender bias on Wikipedia and beyond - Thenational Scot - April 7th, 2026 [April 7th, 2026]
- Coming Soon To Wikipedia Archaeology In Aotearoa - Scoop - New Zealand News - April 7th, 2026 [April 7th, 2026]
- An AI Agent Was Banned From Creating Wikipedia Articles, Then Wrote Angry Blogs About Being Banned - 404 Media - April 5th, 2026 [April 5th, 2026]
- Edit War Breaks Out on Chillis Wikipedia Page Over Trump Donations - meidasnews.com - April 5th, 2026 [April 5th, 2026]
- Wikipedia Editors Tried and Tried to Work With AI Content, Eventually Realized It Was Total Trash and Banned It Entirely - Futurism - April 5th, 2026 [April 5th, 2026]
- Wikidata graphs for data visualisation of endangered horse breeds in Wikipedia - Wikimedia.org - April 5th, 2026 [April 5th, 2026]
- How Wikipedia of cyber helps SAP make sense of threat data - Computer Weekly - April 5th, 2026 [April 5th, 2026]
- Closing the Gender Gap on Wikipedia: Art + Feminism Edit-a-thon - WashU Libraries - April 5th, 2026 [April 5th, 2026]
- Wikipedia Shares Its Stance on AI-Written Articles - newsbreaks.infotoday.com - April 5th, 2026 [April 5th, 2026]
- AI Agent Runs the Im Being Censored Playbook After Getting Banned from Wikipedia - Gizmodo - April 5th, 2026 [April 5th, 2026]
- AI Agent Gets Banned From Wikipedia Then Accuses Human Editors of Uncivil Behavior - tech.yahoo.com - April 5th, 2026 [April 5th, 2026]
- Colm O'Regan: 'Browsing Wikipedia is like taking a bus, missing your stop, and waking up in a strange town' - Irish Examiner - April 5th, 2026 [April 5th, 2026]
- AI bot gets banned from Wikipedia, then writes angry blogs protesting about it - indiatoday.in - April 5th, 2026 [April 5th, 2026]
- Wikipedia Banned an AI Bot from Writing Articles. It Then Wrote an Angry Rant Blog - Republic World - April 5th, 2026 [April 5th, 2026]
- Wikipedia bans AI bot 'Tom': It responded with furious blog posts that went viral; heres what it said - bhaskarenglish.in - April 5th, 2026 [April 5th, 2026]
- AI Bot Protests Wikipedia Ban With Viral Angry Blogs; Heres What It Said - Mashable India - April 5th, 2026 [April 5th, 2026]
- Wikipedia Bans AI Agent for Spamming Articles AI Responds With Furious Blog Rants - International Business Times UK - April 5th, 2026 [April 5th, 2026]
- Arabic-language Wikipedia filled with terrorist propaganda, bias report - The Times of Israel - March 26th, 2026 [March 26th, 2026]
- I was surprised how upset some people got: A conversation with the creator of TomWikiAssist, the bot that edited Wikipedia - Nieman Lab - March 26th, 2026 [March 26th, 2026]
- Arabic Wikipedia Riddled With Terror Propaganda and Bias, New Investigation Shows - Algemeiner.com - March 26th, 2026 [March 26th, 2026]
- Wikipedia mulling whether to rename entry on Hamas beheading babies hoax - JNS - March 26th, 2026 [March 26th, 2026]
- GZERO WORLD WITH IAN BREMMER: In Wikipedia We Trust? - KPBS - March 26th, 2026 [March 26th, 2026]
- AI Memory Project Transforms Personal Photos Into a Wikipedia-Style Archive - Tech Times - March 26th, 2026 [March 26th, 2026]
- This guy used AI to document his grandmother's life on a personal Wikipedia and now you can, too - Boing Boing - March 26th, 2026 [March 26th, 2026]
- Wikipedia Bans AI-Generated Text With Two Exceptions What Every Editor Must Know Now - International Business Times UK - March 26th, 2026 [March 26th, 2026]
- Twenty-Five Years of Free Knowledge: Wiki Palestine Celebrates a Quarter Century of Wikipedia - Wikimedia.org - March 26th, 2026 [March 26th, 2026]
- Who is pushing the propaganda tag against Dhurandar on Wikipedia? How an anti-Hindu Wikipedia Editor booked in Manipur for inciting violence cited... - March 26th, 2026 [March 26th, 2026]
- World Jewish Congress report finds extensive, systemic bias on Arabic Wikipedia - JNS.org - JNS - March 26th, 2026 [March 26th, 2026]
- Quiz: Name these 10 national team managers from Wikipedia - Planet Football - March 26th, 2026 [March 26th, 2026]
- The Unsung Heroes of Kit Culture: Appreciating Wikipedia's Pixel Kit Artists - Footy Headlines - March 24th, 2026 [March 24th, 2026]
- Wikipedia has banned AI-generated text, with two exceptions - How-To Geek - March 24th, 2026 [March 24th, 2026]