Reinforcement learning for the real world – TechTalks
This article is part of ourreviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.
Labor- and data-efficiency remain two of the key challenges of artificial intelligence. In recent decades, researchers have proven that big data and machine learning algorithms reduce the need for providing AI systems with prior rules and knowledge. But machine learningand more recently deep learninghave presented their own challenges, which require manual labor albeit of different nature.
Creating AI systems that can genuinely learn on their own with minimal human guidance remain a holy grail and a great challenge. According to Sergey Levine, assistant professor at the University of California, Berkeley, a promising direction of research for the AI community is self-supervised offline reinforcement learning.
This is a variation of the RL paradigm that is very close to how humans and animals learn to reuse previously acquired data and skills, and it can be a great boon for applying AI to real-world settings. In a paper titled Understanding the World Through Action and a talk at the NeurIPS 2021 conference, Levine explained how self-supervised learning objectives and offline RL can help create generalized AI systems that can be applied to various tasks.
One common argument in favor of machine learning algorithms is their ability to scale with the availability of data and compute resources. Decades of work on developing symbolic AI systems have produced limited results. These systems require human experts and engineers to manually provide the rules and knowledge that define the behavior of the AI system.
The problem is that in some applications, the rules can be virtually limitless, while in others, they cant be explicitly defined.
In contrast, machine learning models can derive their behavior from data, without the need for explicit rules and prior knowledge. Another advantage of machine learning is that it can glean its own solutions from its training data, which are often more accurate than knowledge engineered by humans.
But machine learning faces its own challenges. Most ML applications are based on supervised learning and require training data to be manually labeled by human annotators. Data annotation poses severe limits to the scaling of ML models.
More recently, researchers have been exploring unsupervised and self-supervised learning, ML paradigms that obviate the need for manual labels. These approaches have helped overcome the limits of machine learning in some applications such as language modeling and medical imaging. But theyre still faced with challenges that prevent their use in more general settings.
Current methods for learning without human labels still require considerable human insight (which is often domain-specific!) to engineer self-supervised learning objectives that allow large models to acquire meaningful knowledge from unlabeled datasets, Levine writes.
Levine writes that the next objective should be to create AI systems that dont require manual labeling or the manual design of self-supervised objectives. These models should be able to distill a deep and meaningful understanding of the world and can perform downstream tasks with robustness generalization, and even a degree of common sense.
Reinforcement learning is inspired by intelligent behavior in animals and humans. Reinforcement learning pioneer Richard Sutton describes RL as the first computational theory of intelligence. An RL agent develops its behavior by interacting with its environment, weighing the punishments and rewards of its actions, and developing policies that maximize rewards.
RL, and more recently deep RL, have proven to be particularly efficient at solving complicated problems such as playing games and training robots. And theres reason to believe reinforcement learning can overcome the limits of current ML systems.
But before it does, RL must overcome its own set of challenges that limit its use in real-world settings.
We could think of modern RL research as consisting of three threads: (1) getting good results in simulated benchmarks (e.g., video games); (2) using simulation+ transfer; (3) running RL in the real world, Levine told TechTalks. I believe that ultimately (3) is the most importantthing, because thats the most promising approach to solve problems that we cant solve today.
Games are simple environments. Board games such as chess and go are closed worlds with deterministic environments. Even games such as StarCraft and Dota, which are played in real-time and have near unlimited states, are much simpler than the real world. Their rules dont change. This is partly why game-playing AI systems have found very few applications in the real world.
On the other hand, physics simulators have seen tremendous advances in recent years. One of the popular methods in fields such as robotics and self-driving cars has been to train reinforcement learning models in simulated environments and then finetune the models with real-world experience. But as Levine explained, this approach is limited too because the domains where we most need learningthe ones where humans far outperform machinesare also the ones that are hardest to simulate.
This approach is only effective at addressing tasks that can be simulated, which is bottlenecked by our ability to create lifelike simulated analogues of the real world and to anticipate all the possible situations that an agent might encounter in reality, Levine said.
One of the biggest challenges we encounter when we try to do real-world RL is generalization, Levine said.
For example, in 2016, Levine was part of a team that constructed an arm farm at Google with 14 robots all learning concurrently from their shared experience. They collected more than half a million grasp attempts, and it was possible to learn effective grasping policies in this way.
But we cant repeat this process for every single task we want robots to learn with RL, he says. Therefore, we need more general-purpose approaches, where a single ever-growing dataset is used as the basis for a general understanding of the world on which more specific skills can be built.
In his paper, Levine points to two key obstacles in reinforcement learning. First, RL systems require manually defined reward functions or goals before they can learn the behaviors that help accomplish those goals. And second, reinforcement learning requires online experience and is not data-driven, which makes it hard to train them on large datasets. Most recent accomplishments in RL have relied on engineers at very wealthy tech companies using massive compute resources to generate immense experiences instead of reusing available data.
Therefore, RL systems need solutions that can learn from past experience and repurpose their learnings in more generalized ways. Moreover, they should be able to handle the continuity of the real world. Unlike simulated environments, you cant reset the real world and start everything from scratch. You need learning systems that can quickly adapt to the constant and unpredictable changes to their environment.
In his NeurIPS talk, Levine compares real-world RL to the story of Robinson Crusoe, the story of a man who is stranded on an island and learns to deal with unknown situations through inventiveness and creativity, using his knowledge of the world and continued exploration in his new habitat.
RL systems in the real world have to deal with a lifelong learning problem, evaluate objectives and performance based entirely on realistic sensing without access to privileged information, and must deal with real-world constraints, including safety, Levine said. These are all things that are typically abstracted away in widely used RL benchmark tasks and video game environments.
However, RL does work in more practical real-world settings, Levine says. For example, in 2018, he and his colleagues an RL-based robotic grasping system attain state-of-the-art results with raw sensory perception. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, in their method, the robot continuously updated its grasp strategy based on the most recent observations to optimize long-horizon grasp success.
To my knowledge this is still the best existing system for grasping from monocular RGB images, Levine said. But this sort of thing requires algorithms that are somewhat different from those that perform best in simulated video game settings: it requires algorithms that are adept at utilizing and reusing previously collected data, algorithms that can train large models that generalize, and algorithms that can support large-scale real-world data collection.
Levines reinforcement learning solution includes two key components: unsupervised/self-supervised learning and offline learning.
In his paper, Levine describes self-supervised reinforcement learning as a system that can learn behaviors that control the world in meaningful ways and provides some mechanism to learn to control [the world] in as many ways as possible.
Basically, this means that instead of being optimized for a single goal, the RL agent should be able to achieve many different goals by computing counterfactuals, learning causal models, and obtaining a deep understanding of how actions affect its environment in the long term.
However, creating self-supervised RL models that can solve various goals would still require a massive amount of experience. To address this challenge, Levine proposes offline reinforcement learning, which makes it possible for models to continue learning from previously collected data without the need for continued online experience.
Offline RL can make it possible to apply self-supervised or unsupervised RL methods even in settings where online collection is infeasible, and such methods can serve as one of the most powerful tools for incorporating large and diverse datasets into self-supervised RL, he writes.
The combination of self-supervised and offline RL can help create agents that can create building blocks for learning new tasks and continue learning with little need for new data.
This is very similar to how we learn in the real world. For example, when you want to learn basketball, you use basic skills you learned in the past such as walking, running, jumping, handling objects, etc. You use these capabilities to develop new skills such as dribbling, crossovers, jump shots, free throws, layups, straight and bounce passes, eurosteps, dunks (if youre tall enough), etc. These skills build on each other and help you reach the bigger goal, which is to outscore your opponent. At the same time, you can learn from offline data by reflecting on your past experience and thinking about counterfactuals (e.g., what would have happened if you passed to an open teammate instead of taking a contested shot). You can also learn by processing other data such as videos of yourself and your opponents. In fact, on-court experience is just part of your continuous learning.
Ina paper, Yevgen Chetobar, one of Levines colleagues, shows how self-supervised offline RL can learn policies for fairly general robotic manipulation skills, directly reusing data that they had collected for another project.
This system was able to reach a variety of user-specified goals, and also act as a general-purpose pretraining procedure (a kind of BERT for robotics) for other kinds of tasks specified with conventional reward functions, Levine said.
One of the great benefits of offline and self-supervised RL is learning from real-world data instead of simulated environments.
Basically, it comes down to this question: is it easier to create a brain, or is it easier to create the universe? I think its easier to create a brain, because it is part of the universe, he said.
This is, in fact, one of the great challenges engineers face when creating simulated environments. For example, Levine says, effective simulation for autonomous driving requires simulating other drivers, which requires having an autonomous driving system, which requires simulating other drivers, which requires having an autonomous driving system, etc.
Ultimately, learning from real data will be more effective because it will simply be much easier and more scalable, just as weve seen in supervised learning domains in computer vision and NLP, where no one worries about using simulation, he said. My perspective is that we should figure out how to do RL in a scalable and general-purpose way using real data, and this will spare us from having to expend inordinate amounts of effort building simulators.
See the article here:
Reinforcement learning for the real world - TechTalks
- A longitudinal machine-learning approach to predicting nursing home closures in the U.S. - Nature - January 11th, 2026 [January 11th, 2026]
- Occams Razor in Machine Learning. The Power of Simplicity in a Complex World - DataDrivenInvestor - January 11th, 2026 [January 11th, 2026]
- Study Explores Use of Automated Machine Learning to Compare Frailty Indices in Predicting Spinal Surgery Outcomes - geneonline.com - January 11th, 2026 [January 11th, 2026]
- Hunting for "Oddballs" With Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit... - January 9th, 2026 [January 9th, 2026]
- A Machine Learning-Driven Electrophysiological Platform for Real-Time Tumor-Neural Interaction Analysis and Modulation - Nature - January 9th, 2026 [January 9th, 2026]
- Machine learning elucidates associations between oral microbiota and the decline of sweet taste perception during aging - Nature - January 9th, 2026 [January 9th, 2026]
- Prognostic model for pancreatic cancer based on machine learning of routine slides and transcriptomic tumor analysis - Nature - January 9th, 2026 [January 9th, 2026]
- Bidgely Redefines Energy AI in 2025: From Machine Learning to Agentic AI - galvnews.com - January 9th, 2026 [January 9th, 2026]
- Machine Learning in Pharmaceutical Industry Market Size Reach USD 26.2 Billion by 2031 - openPR.com - January 9th, 2026 [January 9th, 2026]
- Noise-resistant Qubit Control With Machine Learning Delivers Over 90% Fidelity - Quantum Zeitgeist - January 9th, 2026 [January 9th, 2026]
- Machine Learning Models Forecast Parshwanath Corporation Limited Uptick - Real-Time Stock Alerts & High Return Trading Ideas -... - January 9th, 2026 [January 9th, 2026]
- Machine Learning Models Forecast Imagicaaworld Entertainment Limited Uptick - Technical Resistance Breaks & Outstanding Capital Returns -... - January 2nd, 2026 [January 2nd, 2026]
- Cognitive visual strategies are associated with delivery accuracy in elite wheelchair curling: insights from eye-tracking and machine learning -... - January 2nd, 2026 [January 2nd, 2026]
- Machine Learning Models Forecast Covidh Technologies Limited Uptick - Earnings Forecast Updates & Small Investment Trading Plans -... - January 2nd, 2026 [January 2nd, 2026]
- Machine Learning Models Forecast Sri Adhikari Brothers Television Network Limited Uptick - Stock Split Announcements & Rapid Wealth Accumulation -... - January 2nd, 2026 [January 2nd, 2026]
- Army to ring in new year with new AI and machine learning career path for officers - Stars and Stripes - December 31st, 2025 [December 31st, 2025]
- Army launches AI and machine-learning career path for officers - Federal News Network - December 31st, 2025 [December 31st, 2025]
- AI and Machine Learning Transforming Business Operations, Strategy, and Growth AI - openPR.com - December 31st, 2025 [December 31st, 2025]
- New at Mouser: Infineon Technologies PSOC Edge Machine Learning MCUs for Robotics, Industrial, and Smart Home Applications - Business Wire - December 31st, 2025 [December 31st, 2025]
- Machine Learning Models Forecast The Federal Bank Limited Uptick - Double Top/Bottom Patterns & Affordable Growth Trading - bollywoodhelpline.com - December 31st, 2025 [December 31st, 2025]
- Machine Learning Models Forecast Future Consumer Limited Uptick - Stock Valuation Metrics & Free Stock Market Beginner Guides - earlytimes.in - December 31st, 2025 [December 31st, 2025]
- Machine learning identifies statin and phenothiazine combo for neuroblastoma treatment - Medical Xpress - December 29th, 2025 [December 29th, 2025]
- Machine Learning Framework Developed to Align Educational Curricula with Workforce Needs - geneonline.com - December 29th, 2025 [December 29th, 2025]
- Study Develops Multimodal Machine Learning System to Evaluate Physical Education Effectiveness - geneonline.com - December 29th, 2025 [December 29th, 2025]
- AI Indicators Detect Buy Opportunity in Everest Organics Limited - Healthcare Stock Analysis & Smarter Trades Backed by Machine Learning -... - December 29th, 2025 [December 29th, 2025]
- Automated Fractal Analysis of Right and Left Condyles on Digital Panoramic Images Among Patients With Temporomandibular Disorder (TMD) and Use of... - December 29th, 2025 [December 29th, 2025]
- Machine Learning Models Forecast Gayatri Highways Limited Uptick - Inflation Impact on Stocks & Fast Profit Trading Ideas - bollywoodhelpline.com - December 29th, 2025 [December 29th, 2025]
- Machine Learning Models Forecast Punjab Chemicals and Crop Protection Limited Uptick - Blue Chip Stock Analysis & Double Or Triple Investment -... - December 29th, 2025 [December 29th, 2025]
- Machine Learning Models Forecast Walchand PeopleFirst Limited Uptick - Risk Adjusted Returns & Investment Recommendations You Can Trust -... - December 27th, 2025 [December 27th, 2025]
- Machine learning helps robots see clearly in total darkness using infrared - Tech Xplore - December 27th, 2025 [December 27th, 2025]
- Momentum Traders Eye Manas Properties Limited for Quick Bounce - Market Sentiment Report & Smarter Trades Backed by Machine Learning -... - December 27th, 2025 [December 27th, 2025]
- Machine Learning Models Forecast Bigbloc Construction Limited Uptick - MACD Trading Signals & Minimal Risk High Reward - bollywoodhelpline.com - December 27th, 2025 [December 27th, 2025]
- Avoid These 10 Machine Learning Project Mistakes - Analytics Insight - December 27th, 2025 [December 27th, 2025]
- Infleqtion Secures $2M U.S. Army Contract to Advance Contextual Machine Learning for Assured Navigation and Timing - Yahoo Finance - December 12th, 2025 [December 12th, 2025]
- A county-level machine learning model for bottled water consumption in the United States - ESS Open Archive - December 12th, 2025 [December 12th, 2025]
- Grainge AI: Solving the ingredient testing blind spot with machine learning - foodingredientsfirst.com - December 12th, 2025 [December 12th, 2025]
- Improved herbicide stewardship with remote sensing and machine learning decision-making tools - Open Access Government - December 12th, 2025 [December 12th, 2025]
- Hero Medical Technologies Awarded OTA by MTEC to Advance Machine Learning and Wearable Sensing for Field Triage - PRWeb - December 12th, 2025 [December 12th, 2025]
- Lieprune Achieves over Compression of Quantum Neural Networks with Negligible Performance Loss for Machine Learning Tasks - Quantum Zeitgeist - December 12th, 2025 [December 12th, 2025]
- WFS Leverages Machine Learning to Accurately Forecast Air Cargo Volumes and Align Workforce Resources - Metropolitan Airport News - December 12th, 2025 [December 12th, 2025]
- "Emerging AI and Machine Learning Technologies Revolutionize Diagnostic Accuracy in Endoscope Imaging" - GlobeNewswire - December 12th, 2025 [December 12th, 2025]
- Study Uses Multi-Scale Machine Learning to Classify Cognitive Status in Parkinsons Disease Patients - geneonline.com - December 12th, 2025 [December 12th, 2025]
- WFS uses machine learning to forecast cargo volumes and staffing - STAT Times - December 12th, 2025 [December 12th, 2025]
- Portfolio Management with Machine Learning and AI Integration - The AI Journal - December 12th, 2025 [December 12th, 2025]
- AI, Machine Learning to drive power sector transformation: Manohar Lal - DD News - December 7th, 2025 [December 7th, 2025]
- AI WebTracker and Machine-Learning Compliance Tools Help Law Firms Acquire High-Value Personal Injury Cases While Reducing Fake Leads and TCPA Risk -... - December 7th, 2025 [December 7th, 2025]
- AI AND MACHINE LEARNING BASED APPLICATIONS TO PLAY PIVOTAL ROLE IN TRANSFORMING INDIAS POWER SECTOR, SAYS SHRI MANOHAR LAL - pib.gov.in - December 7th, 2025 [December 7th, 2025]
- AI and Machine Learning to Transform Indias Power Sector, Says Manohar Lal - The Impressive Times - December 7th, 2025 [December 7th, 2025]
- Exploring LLMs with MLX and the Neural Accelerators in the M5 GPU - Apple Machine Learning Research - November 23rd, 2025 [November 23rd, 2025]
- Machine learning model for HBsAg seroclearance after 48-week pegylated interferon therapy in inactive HBsAg carriers: a retrospective study - Virology... - November 23rd, 2025 [November 23rd, 2025]
- IIT Madras Free Machine Learning Course 2026: What to know - Times of India - November 23rd, 2025 [November 23rd, 2025]
- Towards a Better Evaluation of 3D CVML Algorithms: Immersive Debugging of a Localization Model - Apple Machine Learning Research - November 23rd, 2025 [November 23rd, 2025]
- A machine-learning powered liquid biopsy predicts response to paclitaxel plus ramucirumab in advanced gastric cancer: results from the prospective IVY... - November 23rd, 2025 [November 23rd, 2025]
- Monitoring for early prediction of gram-negative bacteremia using machine learning and hematological data in the emergency department - Nature - November 23rd, 2025 [November 23rd, 2025]
- Development and validation of an interpretable machine learning model for osteoporosis prediction using routine blood tests: a retrospective cohort... - November 23rd, 2025 [November 23rd, 2025]
- Snowflake Supercharges Machine Learning for Enterprises with Native Integration of NVIDIA CUDA-X Libraries - Snowflake - November 23rd, 2025 [November 23rd, 2025]
- Rethinking Revenue: How AI and Machine Learning Are Unlocking Hidden Value in the Post-Booking Space - Aviation Week Network - November 23rd, 2025 [November 23rd, 2025]
- Machine Learning Prediction of Material Properties Improves with Phonon-Informed Datasets - Quantum Zeitgeist - November 23rd, 2025 [November 23rd, 2025]
- A predictive model for the treatment outcomes of patients with secondary mitral regurgitation based on machine learning and model interpretation - BMC... - November 23rd, 2025 [November 23rd, 2025]
- Mobvista (1860.HK) Delivers Solid Revenue Growth in Q3 2025 as Mintegral Strengthens Its AI and Machine Learning Technology - Business Wire - November 23rd, 2025 [November 23rd, 2025]
- Machine learning beats classical method in predicting cosmic ray radiation near Earth - Phys.org - November 23rd, 2025 [November 23rd, 2025]
- Top Ways AI and Machine Learning Are Revolutionizing Industries in 2025 - nerdbot - November 23rd, 2025 [November 23rd, 2025]
- Snowflake Supercharges Machine Learning for Enterprises with Native Integration of NVIDIA CUDA-X Libraries - Yahoo Finance - November 18th, 2025 [November 18th, 2025]
- An interpretable machine learning model for predicting 5year survival in breast cancer based on integration of proteomics and clinical data -... - November 18th, 2025 [November 18th, 2025]
- scMFF: a machine learning framework with multiple feature fusion strategies for cell type identification - BMC Bioinformatics - November 18th, 2025 [November 18th, 2025]
- URI professor examines how machine learning can help with depression diagnosis Rhody Today - The University of Rhode Island - November 18th, 2025 [November 18th, 2025]
- Predicting drug solubility in supercritical carbon dioxide green solvent using machine learning models based on thermodynamic properties - Nature - November 18th, 2025 [November 18th, 2025]
- Relationship between C-reactive protein triglyceride glucose index and cardiovascular disease risk: a cross-sectional analysis with machine learning -... - November 18th, 2025 [November 18th, 2025]
- Using machine learning to predict student outcomes for early intervention and formative assessment - Nature - November 18th, 2025 [November 18th, 2025]
- Prevalence, associated factors, and machine learning-based prediction of probable depression among individuals with chronic diseases in Bangladesh -... - November 18th, 2025 [November 18th, 2025]
- Snowflake supercharges machine learning for enterprises with native integration of Nvidia CUDA-X libraries - MarketScreener - November 18th, 2025 [November 18th, 2025]
- Unlocking Cardiovascular Disease Insights Through Machine Learning - BIOENGINEER.ORG - November 18th, 2025 [November 18th, 2025]
- Machine learning boosts solar forecasts in diverse climates of India - researchmatters.in - November 18th, 2025 [November 18th, 2025]
- Big Data Machine Learning In Telecom Market by Type and Application Set for 14.8% CAGR Growth Through 2033 - openPR.com - November 18th, 2025 [November 18th, 2025]
- How Humans Could Soon Understand and Talk to Animals, Thanks to Machine Learning - SYFY - November 10th, 2025 [November 10th, 2025]
- Machine learning based analysis of diesel engine performance using FeO nanoadditive in sterculia foetida biodiesel blend - Nature - November 10th, 2025 [November 10th, 2025]
- Machine Learning in Maternal Care - Johns Hopkins Bloomberg School of Public Health - November 10th, 2025 [November 10th, 2025]
- Machine learning-based differentiation of benign and malignant adrenal lesions using 18F-FDG PET/CT: a two-stage classification and SHAP... - November 10th, 2025 [November 10th, 2025]
- How to Better Use AI and Machine Learning in Dermatology, With Renata Block, MMS, PA-C - HCPLive - November 10th, 2025 [November 10th, 2025]
- Avoiding Catastrophe: The Importance of Privacy when Leveraging AI and Machine Learning for Disaster Management - CSIS | Center for Strategic and... - November 10th, 2025 [November 10th, 2025]