Reinforcement learning for the real world – TechTalks
This article is part of ourreviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.
Labor- and data-efficiency remain two of the key challenges of artificial intelligence. In recent decades, researchers have proven that big data and machine learning algorithms reduce the need for providing AI systems with prior rules and knowledge. But machine learningand more recently deep learninghave presented their own challenges, which require manual labor albeit of different nature.
Creating AI systems that can genuinely learn on their own with minimal human guidance remain a holy grail and a great challenge. According to Sergey Levine, assistant professor at the University of California, Berkeley, a promising direction of research for the AI community is self-supervised offline reinforcement learning.
This is a variation of the RL paradigm that is very close to how humans and animals learn to reuse previously acquired data and skills, and it can be a great boon for applying AI to real-world settings. In a paper titled Understanding the World Through Action and a talk at the NeurIPS 2021 conference, Levine explained how self-supervised learning objectives and offline RL can help create generalized AI systems that can be applied to various tasks.
One common argument in favor of machine learning algorithms is their ability to scale with the availability of data and compute resources. Decades of work on developing symbolic AI systems have produced limited results. These systems require human experts and engineers to manually provide the rules and knowledge that define the behavior of the AI system.
The problem is that in some applications, the rules can be virtually limitless, while in others, they cant be explicitly defined.
In contrast, machine learning models can derive their behavior from data, without the need for explicit rules and prior knowledge. Another advantage of machine learning is that it can glean its own solutions from its training data, which are often more accurate than knowledge engineered by humans.
But machine learning faces its own challenges. Most ML applications are based on supervised learning and require training data to be manually labeled by human annotators. Data annotation poses severe limits to the scaling of ML models.
More recently, researchers have been exploring unsupervised and self-supervised learning, ML paradigms that obviate the need for manual labels. These approaches have helped overcome the limits of machine learning in some applications such as language modeling and medical imaging. But theyre still faced with challenges that prevent their use in more general settings.
Current methods for learning without human labels still require considerable human insight (which is often domain-specific!) to engineer self-supervised learning objectives that allow large models to acquire meaningful knowledge from unlabeled datasets, Levine writes.
Levine writes that the next objective should be to create AI systems that dont require manual labeling or the manual design of self-supervised objectives. These models should be able to distill a deep and meaningful understanding of the world and can perform downstream tasks with robustness generalization, and even a degree of common sense.
Reinforcement learning is inspired by intelligent behavior in animals and humans. Reinforcement learning pioneer Richard Sutton describes RL as the first computational theory of intelligence. An RL agent develops its behavior by interacting with its environment, weighing the punishments and rewards of its actions, and developing policies that maximize rewards.
RL, and more recently deep RL, have proven to be particularly efficient at solving complicated problems such as playing games and training robots. And theres reason to believe reinforcement learning can overcome the limits of current ML systems.
But before it does, RL must overcome its own set of challenges that limit its use in real-world settings.
We could think of modern RL research as consisting of three threads: (1) getting good results in simulated benchmarks (e.g., video games); (2) using simulation+ transfer; (3) running RL in the real world, Levine told TechTalks. I believe that ultimately (3) is the most importantthing, because thats the most promising approach to solve problems that we cant solve today.
Games are simple environments. Board games such as chess and go are closed worlds with deterministic environments. Even games such as StarCraft and Dota, which are played in real-time and have near unlimited states, are much simpler than the real world. Their rules dont change. This is partly why game-playing AI systems have found very few applications in the real world.
On the other hand, physics simulators have seen tremendous advances in recent years. One of the popular methods in fields such as robotics and self-driving cars has been to train reinforcement learning models in simulated environments and then finetune the models with real-world experience. But as Levine explained, this approach is limited too because the domains where we most need learningthe ones where humans far outperform machinesare also the ones that are hardest to simulate.
This approach is only effective at addressing tasks that can be simulated, which is bottlenecked by our ability to create lifelike simulated analogues of the real world and to anticipate all the possible situations that an agent might encounter in reality, Levine said.
One of the biggest challenges we encounter when we try to do real-world RL is generalization, Levine said.
For example, in 2016, Levine was part of a team that constructed an arm farm at Google with 14 robots all learning concurrently from their shared experience. They collected more than half a million grasp attempts, and it was possible to learn effective grasping policies in this way.
But we cant repeat this process for every single task we want robots to learn with RL, he says. Therefore, we need more general-purpose approaches, where a single ever-growing dataset is used as the basis for a general understanding of the world on which more specific skills can be built.
In his paper, Levine points to two key obstacles in reinforcement learning. First, RL systems require manually defined reward functions or goals before they can learn the behaviors that help accomplish those goals. And second, reinforcement learning requires online experience and is not data-driven, which makes it hard to train them on large datasets. Most recent accomplishments in RL have relied on engineers at very wealthy tech companies using massive compute resources to generate immense experiences instead of reusing available data.
Therefore, RL systems need solutions that can learn from past experience and repurpose their learnings in more generalized ways. Moreover, they should be able to handle the continuity of the real world. Unlike simulated environments, you cant reset the real world and start everything from scratch. You need learning systems that can quickly adapt to the constant and unpredictable changes to their environment.
In his NeurIPS talk, Levine compares real-world RL to the story of Robinson Crusoe, the story of a man who is stranded on an island and learns to deal with unknown situations through inventiveness and creativity, using his knowledge of the world and continued exploration in his new habitat.
RL systems in the real world have to deal with a lifelong learning problem, evaluate objectives and performance based entirely on realistic sensing without access to privileged information, and must deal with real-world constraints, including safety, Levine said. These are all things that are typically abstracted away in widely used RL benchmark tasks and video game environments.
However, RL does work in more practical real-world settings, Levine says. For example, in 2018, he and his colleagues an RL-based robotic grasping system attain state-of-the-art results with raw sensory perception. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, in their method, the robot continuously updated its grasp strategy based on the most recent observations to optimize long-horizon grasp success.
To my knowledge this is still the best existing system for grasping from monocular RGB images, Levine said. But this sort of thing requires algorithms that are somewhat different from those that perform best in simulated video game settings: it requires algorithms that are adept at utilizing and reusing previously collected data, algorithms that can train large models that generalize, and algorithms that can support large-scale real-world data collection.
Levines reinforcement learning solution includes two key components: unsupervised/self-supervised learning and offline learning.
In his paper, Levine describes self-supervised reinforcement learning as a system that can learn behaviors that control the world in meaningful ways and provides some mechanism to learn to control [the world] in as many ways as possible.
Basically, this means that instead of being optimized for a single goal, the RL agent should be able to achieve many different goals by computing counterfactuals, learning causal models, and obtaining a deep understanding of how actions affect its environment in the long term.
However, creating self-supervised RL models that can solve various goals would still require a massive amount of experience. To address this challenge, Levine proposes offline reinforcement learning, which makes it possible for models to continue learning from previously collected data without the need for continued online experience.
Offline RL can make it possible to apply self-supervised or unsupervised RL methods even in settings where online collection is infeasible, and such methods can serve as one of the most powerful tools for incorporating large and diverse datasets into self-supervised RL, he writes.
The combination of self-supervised and offline RL can help create agents that can create building blocks for learning new tasks and continue learning with little need for new data.
This is very similar to how we learn in the real world. For example, when you want to learn basketball, you use basic skills you learned in the past such as walking, running, jumping, handling objects, etc. You use these capabilities to develop new skills such as dribbling, crossovers, jump shots, free throws, layups, straight and bounce passes, eurosteps, dunks (if youre tall enough), etc. These skills build on each other and help you reach the bigger goal, which is to outscore your opponent. At the same time, you can learn from offline data by reflecting on your past experience and thinking about counterfactuals (e.g., what would have happened if you passed to an open teammate instead of taking a contested shot). You can also learn by processing other data such as videos of yourself and your opponents. In fact, on-court experience is just part of your continuous learning.
Ina paper, Yevgen Chetobar, one of Levines colleagues, shows how self-supervised offline RL can learn policies for fairly general robotic manipulation skills, directly reusing data that they had collected for another project.
This system was able to reach a variety of user-specified goals, and also act as a general-purpose pretraining procedure (a kind of BERT for robotics) for other kinds of tasks specified with conventional reward functions, Levine said.
One of the great benefits of offline and self-supervised RL is learning from real-world data instead of simulated environments.
Basically, it comes down to this question: is it easier to create a brain, or is it easier to create the universe? I think its easier to create a brain, because it is part of the universe, he said.
This is, in fact, one of the great challenges engineers face when creating simulated environments. For example, Levine says, effective simulation for autonomous driving requires simulating other drivers, which requires having an autonomous driving system, which requires simulating other drivers, which requires having an autonomous driving system, etc.
Ultimately, learning from real data will be more effective because it will simply be much easier and more scalable, just as weve seen in supervised learning domains in computer vision and NLP, where no one worries about using simulation, he said. My perspective is that we should figure out how to do RL in a scalable and general-purpose way using real data, and this will spare us from having to expend inordinate amounts of effort building simulators.
See the article here:
Reinforcement learning for the real world - TechTalks
- Between rain and snow, machine learning finds nine precipitation types - Phys.org - October 9th, 2025 [October 9th, 2025]
- Between rain and snow, machine learning finds 9 precipitation types - Michigan Engineering News - October 9th, 2025 [October 9th, 2025]
- Machine learning optimizes nanoparticle design for drug delivery to the brain - Physics World - October 9th, 2025 [October 9th, 2025]
- Development and validation of a machine learning-based prediction model for prolonged length of stay after laparoscopic gastrointestinal surgery: a... - October 9th, 2025 [October 9th, 2025]
- G Sachs: Stock Mkt Not in Bubble Yet; Machine Learning/ AI Expected to Spawn New Wave of Superstars - AASTOCKS.com - October 9th, 2025 [October 9th, 2025]
- AI and Machine Learning - See.Sense works with City of Sydney to develop AI dashboard - Smart Cities World - October 9th, 2025 [October 9th, 2025]
- Machine Learning Used to Predict Live Birth Outcomes in Fresh Embryo Transfers - geneonline.com - October 9th, 2025 [October 9th, 2025]
- RIT researchers use machine learning to better understand the pathways of disease - Rochester Institute of Technology - October 7th, 2025 [October 7th, 2025]
- Leveraging machine learning to predict mosquito bed net utilization among women of reproductive age in sub-Saharan Africa - Malaria Journal - October 7th, 2025 [October 7th, 2025]
- Machine learning-based radiomics using magnetic resonance images for prediction of clinical complete response to neoadjuvant chemotherapy in patients... - October 7th, 2025 [October 7th, 2025]
- Machine Learning Self Driving Cars: The Technology Driving the Future of Mobility - SpeedwayMedia.com - October 7th, 2025 [October 7th, 2025]
- Investigating the relationship between blood factors and HDL-C levels in the bloodstream using machine learning methods - Journal of Health,... - October 7th, 2025 [October 7th, 2025]
- AI in the fast lane: F1 teams Alpine, Audi use machine learning as force multiplier - The Business Times - October 7th, 2025 [October 7th, 2025]
- Future Scope of Machine Learning in Healthcare Market Set to Witness Significant Growth by 2025-2032 - openPR.com - October 7th, 2025 [October 7th, 2025]
- AI and Machine Learning - AI readiness and adoption toolkit launched - Smart Cities World - October 4th, 2025 [October 4th, 2025]
- Machine Learning Model UmamiPredict Developed to Forecast Savory Taste of Molecules and Peptides - geneonline.com - October 4th, 2025 [October 4th, 2025]
- Machine Learning Boosts Crop Yield Predictions in Senegal - Bioengineer.org - October 4th, 2025 [October 4th, 2025]
- Machine learning-driven stability analysis of eco-friendly superhydrophobic graphene-based coatings on copper substrate - Nature - October 4th, 2025 [October 4th, 2025]
- Integrated machine learning analysis of proteomic and transcriptomic data identifies healing associated targets in diabetic wound repair - Nature - October 4th, 2025 [October 4th, 2025]
- Development and evaluation of a machine learning prediction model for short-term mortality in patients with diabetes or hyperglycemia at emergency... - October 4th, 2025 [October 4th, 2025]
- Fast and robust mixed gas identification and recognition using tree-based machine learning and sensor array response - Nature - October 4th, 2025 [October 4th, 2025]
- Estimation of sexual dimorphism of adult human mandibles of South Indian origin using non-metric parameters and machine learning classification... - October 4th, 2025 [October 4th, 2025]
- Cloud-Based Machine Learning Platforms Technologies Market Growth and Future Prospects - Precedence Research - October 4th, 2025 [October 4th, 2025]
- Machine Learning Framework Developed to Optimize Phosphorus Recovery in Hydrothermal Treatment of Livestock Manure - geneonline.com - October 4th, 2025 [October 4th, 2025]
- Unifying machine learning and interpolation theory via interpolating neural networks - Nature - October 2nd, 2025 [October 2nd, 2025]
- Anna: an open-source platform for real-time integration of machine learning classifiers with veterinary electronic health records - BMC Veterinary... - October 2nd, 2025 [October 2nd, 2025]
- The Future of Liver Health: Can Human Models and Machine Learning Reduce Disease Rates? - Technology Networks - October 2nd, 2025 [October 2nd, 2025]
- Machine Learning Radiomics Predicts Pancreatic Cancer Invasion - Bioengineer.org - October 2nd, 2025 [October 2nd, 2025]
- Next-generation COVID-19 detection using a metasurface biosensor with machine learning-enhanced refractive index sensing - Nature - October 2nd, 2025 [October 2nd, 2025]
- Machine learning-based models for screening of anemia and leukemia using features of complete blood count reports - Nature - October 2nd, 2025 [October 2nd, 2025]
- Estimating the peak age of chess players through statistical and machine learning techniques - Nature - October 2nd, 2025 [October 2nd, 2025]
- Optimizing water quality index using machine learning: a six-year comparative study in riverine and reservoir systems - Nature - October 2nd, 2025 [October 2nd, 2025]
- Physics-informed machine learning-based real-time long-horizon temperature fields prediction in metallic additive manufacturing - Nature - October 2nd, 2025 [October 2nd, 2025]
- The Silicon Revolution: How AI and Machine Learning Are Forging the Future of Semiconductor Manufacturing - FinancialContent - October 2nd, 2025 [October 2nd, 2025]
- Machine learning model for differentiating Pneumocystis jirovecii pneumonia from colonization and analyzing mortality risk in non-HIV patients using... - October 2nd, 2025 [October 2nd, 2025]
- Radiomics and Machine Learning Applied to CECT Scans Show Potential in Predicting Perineural Invasion in Pancreatic Cancer - geneonline.com - October 2nd, 2025 [October 2nd, 2025]
- Machine learning and response surface optimization to enhance diesel engine performance using milk scum biodiesel with alumina nanoparticles - Nature - October 2nd, 2025 [October 2nd, 2025]
- Landmark Patent Appeal Decision Strengthens Protection for AI and Machine Learning Innovations - The National Law Review - October 2nd, 2025 [October 2nd, 2025]
- Machine learning researchers and industry leaders gathering at Santa Clara University - Stories - News & Events - Santa Clara University - September 30th, 2025 [September 30th, 2025]
- Building better batteries with amorphous materials and machine learning - Tech Xplore - September 30th, 2025 [September 30th, 2025]
- Machine Learning-Supported Fragment Hit Expansion in Absence of X-Ray Structures - Evotec - September 30th, 2025 [September 30th, 2025]
- Machine learning model predicts which radiotherapy patients are most vulnerable to adverse side effects - Health Imaging - September 30th, 2025 [September 30th, 2025]
- How AI and Machine Learning Are Revolutionizing Laser Welding - Downbeach - September 30th, 2025 [September 30th, 2025]
- What if A.I. Doesnt Get Much Better Than This? - Machine Learning Week 2025 - September 30th, 2025 [September 30th, 2025]
- Sex estimation from the sternum in Turkish population using various machine learning methods and deep neural networks - SpringerOpen - September 30th, 2025 [September 30th, 2025]
- Predictive AI Must Be Valuated But Rarely Is. Heres How To Do It - Machine Learning Week 2025 - September 30th, 2025 [September 30th, 2025]
- Interpretable machine learning incorporating major lithology for regional landslide warning in northern and eastern Guangdong - Nature - September 28th, 2025 [September 28th, 2025]
- Building Machine Learning Application with Django - KDnuggets - September 28th, 2025 [September 28th, 2025]
- Evaluating the use of body mass index change as a proxy for anorexia nervosa recovery: a machine learning perspective - Journal of Eating Disorders - September 28th, 2025 [September 28th, 2025]
- Prediction of cutting parameters and reduction of output parameters using machine learning in milling of Inconel 718 alloy - Nature - September 28th, 2025 [September 28th, 2025]
- How AI and machine learning are changing both retail and online casino experiences - Retail Technology Innovation Hub - September 28th, 2025 [September 28th, 2025]
- Machine learning and cell imaging combine to predict effectiveness of multiple sclerosis medication - Medical Xpress - September 25th, 2025 [September 25th, 2025]
- IC combines machine learning and analogue inferencing - Electronics Weekly - September 25th, 2025 [September 25th, 2025]
- ODU Awarded $2.3M NIH Grant to Improve Detection of Brain Tumor Recurrence with AI and Machine Learning - Old Dominion University - September 25th, 2025 [September 25th, 2025]
- Development of a machine learning-based depression risk identification tool for older adults with asthma - BMC Psychiatry - September 25th, 2025 [September 25th, 2025]
- AI and Machine Learning Uses in Neuroscience Drug Discovery, Upcoming Webinar Hosted by Xtalks - PR Newswire - September 25th, 2025 [September 25th, 2025]
- Error-controlled non-additive interaction discovery in machine learning models - Nature - September 23rd, 2025 [September 23rd, 2025]
- AI, Machine Learning Will Drive Market Data Consumption - Markets Media - September 23rd, 2025 [September 23rd, 2025]
- Machine Learning Model May Optimize Treatment Selection and Survival in HCC - Targeted Oncology - September 23rd, 2025 [September 23rd, 2025]
- From pixels to pumps: Machine learning and satellite imagery help map irrigation - Phys.org - September 23rd, 2025 [September 23rd, 2025]
- CMU physicist challenges what we know about particle physics with machine learning - The Tartan - September 23rd, 2025 [September 23rd, 2025]
- Hire Python Developers to Leverage the Power of Machine Learning & AI - WebWire - September 23rd, 2025 [September 23rd, 2025]
- AI-Powered Biology Careers in 2025: Opportunities with Machine Learning Skills - BioTecNika - September 23rd, 2025 [September 23rd, 2025]
- Machine learning and predictingstock price movements on NGX - Businessamlive - September 23rd, 2025 [September 23rd, 2025]
- Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems - MarkTechPost - September 21st, 2025 [September 21st, 2025]
- Development of a novel machine learning-based adaptive resampling algorithm for nuclear data processing - Nature - September 19th, 2025 [September 19th, 2025]
- Autobot platform uses machine learning to rapidly find best ways to make advanced materials - Tech Xplore - September 19th, 2025 [September 19th, 2025]
- 5 Key Takeaways | The Law of the Machine (Learning): Solving Complex AI Challenges - JD Supra - September 17th, 2025 [September 17th, 2025]
- Spectral and Machine Learning Approach Enhances Efficiency of Grape Embryo Rescue | Newswise - Newswise - September 17th, 2025 [September 17th, 2025]
- Helpful Reminders for Patent Eligibility of AI, Machine Learning, and Other Software-Related Inventions - JD Supra - September 17th, 2025 [September 17th, 2025]
- Opening the black box of machine learning-controlled plasma treatments - AIP.ORG - September 17th, 2025 [September 17th, 2025]
- Post-compilation Circuit Scaling for Quantum Machine Learning Models Reveals Resource Trends and Topology Impacts - Quantum Zeitgeist - September 17th, 2025 [September 17th, 2025]
- Machine-learning tool gives doctors a more detailed 3D picture of fetal health - Medical Xpress - September 17th, 2025 [September 17th, 2025]
- Portable Electronic Nose with Machine Learning Enhances VOC Detection in Forensic Science - Chromatography Online - September 15th, 2025 [September 15th, 2025]
- Developing a predictive model for breast cancer detection using radiomics-based mammography and machine learning - SpringerOpen - September 13th, 2025 [September 13th, 2025]
- and correlation of drug solubility via hybrid machine learning and gradient based optimization - Nature - September 11th, 2025 [September 11th, 2025]
- Rice-Houston Methodist partnership uses machine learning to reveal hidden patient groups in common heart valve disease - Rice University - September 11th, 2025 [September 11th, 2025]
- Amazon Uses Machine Learning to Tell Sellers if FBA Is a Good Fit - EcommerceBytes - September 11th, 2025 [September 11th, 2025]
- Eli Lilly Launches AI, Machine Learning Platform Called TuneLab For Biotech Companies - Stocktwits - September 11th, 2025 [September 11th, 2025]
- How AI and Machine Learning are Shaping the Future of Mobile Apps - indiatechnologynews.in - September 11th, 2025 [September 11th, 2025]