This could lead to the next big breakthrough in common sense AI – MIT Technology Review
AI models that can parse both language and visual input also have very practical uses. If we want to build robotic assistants, for example, they need computer vision to navigate the world and language to communicate about it to humans.
But combining both types of AI is easier said than done. It isnt as simple as stapling together an existing language model with an existing object recognition system. It requires training a new model from scratch with a data set that includes text and images, otherwise known as a visual-language data set.
The most common approach for curating such a data set is to compile a collection of images with descriptive captions. A picture like the one below, for example, would be captioned An orange cat sits in the suitcase ready to be packed. This differs from typical image data sets, which would label the same picture with only one noun, like cat. A visual-language data set can therefore teach an AI model not just how to recognize objects but how they relate to and act on one other, using verbs and prepositions.
But you can see why this data curation process would take forever. This is why the visual-language data sets that exist are so puny. A popular text-only data set like English Wikipedia (which indeed includes nearly all the English-language Wikipedia entries) might contain nearly 3 billion words. A visual-language data set like Microsoft Common Objects in Context, or MS COCO, contains only 7 million. Its simply not enough data to train an AI model for anything useful.
Vokenization gets around this problem, using unsupervised learning methods to scale the tiny amount ofdata in MS COCO to the size of English Wikipedia. The resultant visual-language model outperforms state-of-the-art models in some of the hardest tests used to evaluate AI language comprehension today.
You dont beat state of the art on these tests by just trying a little bit, says Thomas Wolf, the cofounder and chief science officer of the natural-language processing startup Hugging Face, who was not part of the research. This is not a toy test. This is why this is super exciting.
Lets first sort out some terminology. What on earth is a voken?
In AI speak, the words that are used to train language models are known as tokens. So the UNC researchers decided to call the image associated with each token in their visual-language model a voken. Vokenizer is what they call the algorithm that finds vokens for each token, and vokenization is what they call the whole process.
The point of this isnt just to show how much AI researchers love making up words. (They really do.) It also helps break down the basic idea behind vokenization. Instead of starting with an image data set and manually writing sentences to serve as captionsa very slow processthe UNC researchers started with a language data set and used unsupervised learning to match each word with a relevant image (more on this later). This is a highly scalable process.
The unsupervised learning technique, here, is ultimately the contribution of the paper. How do you actually find a relevant image for each word?
Lets go back for a moment to GPT-3. GPT-3 is part of a family of language models known as transformers, which represented a major breakthrough in applying unsupervised learning to natural-language processing when the first one was introduced in 2017. Transformers learn the patterns of human language by observing how words are used in context and then creating a mathematical representation of each word, known as a word embedding, based on that context. The embedding for the word cat might show, for example, that it is frequently used around the words meow and orange but less often around the words bark or blue.
This is how transformers approximate the meanings of words, and how GPT-3 can write such human-like sentences. It relies in part on these embeddings to tell it how to assemble words into sentences, and sentences into paragraphs.
Theres a parallel technique that can also be used for images. Instead of scanning text for word usage patterns, it scans images for visual patterns. It tabulates how often a cat, say, appears on a bed versus on a tree, and creates a cat embedding with this contextual information.
The insight of the UNC researchers was that they should use both embedding techniques on MS COCO. They converted the images into visual embeddings and the captions into word embeddings. Whats really neat about these embeddings is that they can then be graphed in a three-dimensional space, and you can literally see how they are related to one another. Visual embeddings that are closely related to word embeddings will appear closer in the graph. In other words, the visual cat embedding should (in theory) overlap with the text-based cat embedding. Pretty cool.
You can see where this is going. Once the embeddings are all graphed and compared and related to one another, its easy to start matching images (vokens) with words (tokens). And remember, because the images and words are matched based on their embeddings, theyre also matched based on context. This is useful when one word can have totally different meanings. The technique successfully handles that by finding different vokens for each instance of the word.
For example:
Go here to read the rest:
This could lead to the next big breakthrough in common sense AI - MIT Technology Review
- Terrifying Survey Claims ChatGPT Has Overtaken Wikipedia - futurism.com - May 24th, 2025 [May 24th, 2025]
- Wikipedia wants you to wear your love for an open internet on your sleeve - Fast Company - May 24th, 2025 [May 24th, 2025]
- Wikipedia knew first? What really happened after Portnovs killing in Madrid - Euro Weekly News - May 24th, 2025 [May 24th, 2025]
- Can Wikipedia survive the rise of AI and the age of Donald Trump? - Australian Broadcasting Corporation - May 11th, 2025 [May 11th, 2025]
- Wikipedia fights the UKs flawed and burdensome online safety rules - The Verge - May 10th, 2025 [May 10th, 2025]
- Not courts duty to tell media to delete this and take that down: SC sets aside Delhi HCs order to take down page on ANI vs Wikipedia case - The Indian... - May 10th, 2025 [May 10th, 2025]
- Propaganda tool row: SC reverses Wikipedia takedown in ANI defamation case - Siasat.com - May 10th, 2025 [May 10th, 2025]
- Wikipedia is using (some) generative AI now - The Verge - May 8th, 2025 [May 8th, 2025]
- Jay-Z Accuses Attorney Of Wikipedia Manipulation In Legal Battle - Evrim Aac - May 8th, 2025 [May 8th, 2025]
- US jurist accuses Wikipedia of disseminating propaganda and rewriting history - MSN - May 8th, 2025 [May 8th, 2025]
- Wikipedia Foundation Withdraws Appeal Before Delhi High Court Following Supreme Court Ruling - The Law Advice - May 8th, 2025 [May 8th, 2025]
- Generative AI will help Wikipedia editors moderate, translate, and onboard newcomers - the-decoder.com - May 8th, 2025 [May 8th, 2025]
- Wikipedia will apply generative AI to support editors and reduce technical barriers - The Weekly Journal - May 8th, 2025 [May 8th, 2025]
- Wikipedia turns to generative AI to support its volunteer community - TechSpot - May 8th, 2025 [May 8th, 2025]
- How is Wikipedia Progressive in the Age of AI? - Analytics Insight - May 8th, 2025 [May 8th, 2025]
- Members of Congress call on Wikipedia to curb its antisemitism - Israel National News - May 8th, 2025 [May 8th, 2025]
- Is Wikipedia in trouble? - London Evening Standard - April 12th, 2025 [April 12th, 2025]
- Wikipedia Has an Alter Ego Thats Obsessed With Questions. Everyone Should Browse It. - Slate - April 12th, 2025 [April 12th, 2025]
- ANI vs Wikipedia: What the case is about and what has happened so far - Business Standard - April 12th, 2025 [April 12th, 2025]
- Delhi HC refuses to stay order asking Wikipedia to remove alleged defamatory description of ANI - The Economic Times - April 10th, 2025 [April 10th, 2025]
- The ADL says Wikipedia contains antisemitic bias, amid dispute over how the Israel-Hamas conflict is represented on the site - CNN - March 22nd, 2025 [March 22nd, 2025]
- I Tried a TikTok-Style Version of Wikipedia, and It's Now My Favorite Way of Learning - MUO - MakeUseOf - March 22nd, 2025 [March 22nd, 2025]
- How obscure is prospective Celtics buyer William Chisholm? He didnt have a Wikipedia page until Thursday. - The Boston Globe - March 22nd, 2025 [March 22nd, 2025]
- How biased Wikipedia trashed Trumps nominees after he named them - New York Post - March 22nd, 2025 [March 22nd, 2025]
- Deconstructing Wikipedia: Its biased, lopsided and partisan - The Sunday Guardian - March 22nd, 2025 [March 22nd, 2025]
- ADL report finds clear evidence of anti-Israel bias among Wikipedia editors - JNS.org - March 22nd, 2025 [March 22nd, 2025]
- ADL: Anti-Israel Wikipedia editors colluding in anti-Israel bias on site - The Times of Israel - March 22nd, 2025 [March 22nd, 2025]
- What happens when Wikipedia, Joe Biden, and Ms. Frizzle walk into a reality show? - Queen's Journal - March 22nd, 2025 [March 22nd, 2025]
- Wikipedia posts updated to smear Patel, Hegseth, Gabbard: Watchdog - Washington Examiner - March 22nd, 2025 [March 22nd, 2025]
- John Oliver Marvels at Wikipedia Page of Mel Gibson's Father: Somehow Your Son 'Is Not the Worst Thing About You' - TheWrap - March 22nd, 2025 [March 22nd, 2025]
- Wikipedia disrupted by edit wars to manipulate pages on war in Gaza with at least 14 editors banned: report - New York Post - March 13th, 2025 [March 13th, 2025]
- Volunteer photographers are fixing Wikipedia's terrible celebrity headshots - Engadget - March 13th, 2025 [March 13th, 2025]
- Photographers Are on a Mission to Fix Wikipedia's Famously Bad Celebrity Portraits - 404 Media - March 13th, 2025 [March 13th, 2025]
- Wikipedia roiled with internal strife over page edits about the Middle East - Detroit News - March 13th, 2025 [March 13th, 2025]
- Wikipedia has a huge gender equality problem heres why it matters - The Conversation Indonesia - March 13th, 2025 [March 13th, 2025]
- Wikipedia Co-founder: It's Not Neutral, Needs to Be Investigated - Newsmax - March 13th, 2025 [March 13th, 2025]
- Volunteer Photographers Tackle Terrible Celeb Headshots on Wikipedia - PCMag UK - March 13th, 2025 [March 13th, 2025]
- Bored? Check out the Museum of All Things and dive into Wikipedia in 3D - GamingOnLinux - March 3rd, 2025 [March 3rd, 2025]
- This free interactive museum lets you explore Wikipedia like never before - Digital Trends - March 3rd, 2025 [March 3rd, 2025]
- The Wild World of Wikipedia Speedrunning - LAFM - March 3rd, 2025 [March 3rd, 2025]
- Wikipedia co-founder's open challenge to Musk: Which US govt branches 'paid to edit, monitor, update, lobby' the website? - Business Today - March 3rd, 2025 [March 3rd, 2025]
- Wikipedia co-founder may just have agreed with Elon Musk in his first viral post in a few years - The Times of India - February 27th, 2025 [February 27th, 2025]
- Elon Musk wants to change the name of Wikipedia $1 billion on the table to achieve it - Unin Rayo - February 27th, 2025 [February 27th, 2025]
- Wikipedia is now an endless 3D museum, and admission is free - Rock Paper Shotgun - February 27th, 2025 [February 27th, 2025]
- This slick new service puts ChatGPT, Perplexity, and Wikipedia on the map - Fast Company - February 27th, 2025 [February 27th, 2025]
- From agnostic to believer: Wikipedia co-founder publicly shares his testimony - CHVN Radio - February 27th, 2025 [February 27th, 2025]
- Wikipedia co-founder's request to Donald Trump and Elon Musk to probe the dubious website - OpIndia - February 27th, 2025 [February 27th, 2025]
- User booked for adding content on Chhatrapati Sambhaji Maharaj on Wikipedia - The Times of India - February 27th, 2025 [February 27th, 2025]
- Remove derogatory and objectionable reference from Wikipedia about Sambhaji Maharaj: Fadnavis - Deccan Herald - February 20th, 2025 [February 20th, 2025]
- 'There's limit to free speech': Fadnavis orders action against Wikipedia content - The Times of India - February 20th, 2025 [February 20th, 2025]
- Why these scientists devote time to editing and updating Wikipedia - Nature.com - February 20th, 2025 [February 20th, 2025]
- Elon Musk's 'reminder' to Wikipedia: $1 billion offer for name change to ... still stands; come on, do .. - The Times of India - February 20th, 2025 [February 20th, 2025]
- Maharashtra CM directs cyber police to get objectionable content on Sambhaji Maharaj removed from Wikipedia - The Indian Express - February 20th, 2025 [February 20th, 2025]
- Elon Musk and Wikipedia are feuding - The Week - February 20th, 2025 [February 20th, 2025]
- Wikipedia UnReliable Sources: Case Study How Wikipedia is Rigged to Prevent Balance When It Comes to Religious Articles - World Religion News - February 20th, 2025 [February 20th, 2025]
- Behind the Blog: Backdoors and the Miracle of Wikipedia - 404 Media - February 20th, 2025 [February 20th, 2025]
- What if TikTok and Wikipedia had a baby? - The Washington Post - February 20th, 2025 [February 20th, 2025]
- How Wikipedia Co-Founder Found Faith After 35 Years as a Nonbeliever - Movieguide - February 20th, 2025 [February 20th, 2025]
- Wikipedia, Are You Ready? Musk's $1 Billion Name Change Offer Still On - Analytics Insight - February 20th, 2025 [February 20th, 2025]
- Remove objectional reference about Sambhaji Maharaj from Wikipedia: Fadnavis - The Hindu - February 20th, 2025 [February 20th, 2025]
- Zee 24 TAAS forces Wikipedia to take action on false content about Chhatrapati Sambhaji Maharaj - MediaNews4U - February 20th, 2025 [February 20th, 2025]
- Elon Musks $1 Billion Wikipedia Challenge: Reality or Stunt? - The Octant - February 20th, 2025 [February 20th, 2025]
- Fadnavis asks to remove objectionable Wikipedia content on Sambhaji Maharaj - Business Standard - February 20th, 2025 [February 20th, 2025]
- Kumbh mela among most viewed content on Wikipedia - The Times of India - February 20th, 2025 [February 20th, 2025]
- This Web App Is TikTok for Reading Wikipedia - Lifehacker - February 14th, 2025 [February 14th, 2025]
- An infinite Wikipedia scroll I created in mere hours went viral. I think people may be tired of curated algorithms. - Business Insider - February 14th, 2025 [February 14th, 2025]
- Wikipedia Prepares for 'Increase in Threats' to US Editors From Musk and His Allies - 404 Media - February 12th, 2025 [February 12th, 2025]
- Want to know how the world ends? Try this Wikipedia page - The Guardian - February 12th, 2025 [February 12th, 2025]
- Anti-algorithm app combines Wikipedia and TikTok to combat brain rot - Interesting Engineering - February 12th, 2025 [February 12th, 2025]
- This website combines Wikipedia and TikTok to fight doomscrolling - Fast Company - February 12th, 2025 [February 12th, 2025]
- A developer from the US crossed Wikipedia with TikTok using AI. Now WikiToks endless stream of useful articles cures users of boredom and addiction to... - February 12th, 2025 [February 12th, 2025]
- Wikipedia instead of TikTok the developer has created an endless feed of knowledge without tracking algorithms - ITC - February 12th, 2025 [February 12th, 2025]
- Wikipedia accused of blacklisting conservative US media - The Times - February 7th, 2025 [February 7th, 2025]
- Chamber of Commerce leading the charge for updated city Wikipedia page - KFDX - Texomashomepage.com - February 7th, 2025 [February 7th, 2025]
- Edit wars over Israel spur rare ban of 8 Wikipedia editors from both sides - The Times of Israel - February 7th, 2025 [February 7th, 2025]
- Does Left-Wing Tendency of Wikipedia Editors and Admins Contribute to Bias in the Platforms Coverage of Religion? - World Religion News - February 7th, 2025 [February 7th, 2025]
- Wikipedia rabbit holes trained me for this genealogical mystery game - Polygon - February 7th, 2025 [February 7th, 2025]
- Stanford University Introduces an LLM that Writes Wikipedia-Like Reports - IBL News - February 7th, 2025 [February 7th, 2025]
- Wikipedia blacklists conservative sources in favor of left-wing bias - Washington Examiner - February 7th, 2025 [February 7th, 2025]
- Edit wars over Israel spur rare ban of 8 Wikipedia editors from both sides - JTA News - Jewish Telegraphic Agency - January 24th, 2025 [January 24th, 2025]