Reinforcement learning: The next great AI tech moving from the lab to the real world – VentureBeat
Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.
Reinforcement learning (RL) is a powerful type of artificial intelligence technology that can be used to learn strategies to optimally control large, complex systems such as manufacturing plants, traffic control systems (road/train/aircraft), financial portfolios, robots, etc. It is currently transitioning from research labs to highly impactful, real world applications. For example, self-driving car companies like Wayveand Waymoare using reinforcement learning to develop the control systems for their cars.
AI systems that are typically used in industry perform pattern recognition to make a prediction. For instance, they may recognize patterns in images to detect faces (face detection), or recognize patterns in sales data to predict a change in demand (demand forecasting), and so on. Reinforcement learning methods, on the other hand, are used to make optimal decisions or take optimal actions in applications where there is a feedback loop. An example where both traditional AI methods and RL may be used, but for different purposes, will make the distinction clearer.
Say we are using AI to help operate a manufacturing plant. Pattern recognition may be used for quality assurance, where the AI system uses images and scans of the finished product to detect any imperfections or flaws. An RL system, on the other hand, would compute and execute the strategy for controlling the manufacturing process itself (by, for example, deciding which lines to run, controlling machines/robots, deciding which product to manufacture, and so on). The RL system will also try to ensure that the strategy is optimal in that it maximizes some metric of interest such as the output volume while maintaining a certain level of product quality. The problem of computing the optimal control strategy, which RL solves, is very difficult for some subtle reasons (often much more difficult than pattern recognition).
In computing the optimal strategy, or policy in RL parlance, the main challenge an RL learning algorithm faces is the so-called temporal credit assignment problem. That is, the impact of an action (e.g. run line 1 on Wednesday) in a given system state (e.g. current output level of machines, how busy each line is, etc.) on the overall performance (e.g. total output volume) is not known until after (potentially) a long time. To make matters worse, the overall performance also depends on all the actions that are taken subsequent to the action being evaluated. Together, this implies that, when a candidate policy is executed for evaluation, it is difficult to know which actions were the good ones and which were the bad ones in other words, it is very difficult to assign credit to the different actions appropriately. The large number of potential system states in these complex problems further exacerbates the situation via the dreaded curse of dimensionality. A good way to get an intuition for how an RL system solves all these problems at the same time is by looking at the recent spectacular successes they have had in the lab.
Many of the recent, prominent demonstrations of the power of RL come from applying them to board games and video games. The first RL system to impress the global AI community was able to learn to outplay humans in different Atari games when only given as input the images on screen and the scores received by playing the game. This was created in 2013 by London-based AI research lab Deepmind (now part of Alphabet Inc.). The same lab later created a series of RL systems (or agents), starting with the AlphaGo agent, which were able to defeat the top players in the world in the board game Go. These impressive feats, which occurred between 2015 and 2017, took the world by storm because Go is a very complex game, with millions of fans and players around the world, that requires intricate, long-term strategic thinking involving both the local and global board configurations.
Subsequently, Deepmind and the AI research lab OpenAI have released systems for playing the video games Starcraft and DOTA 2 that can defeat the top human players around the world. These games are challenging because they require strategic thinking, resource management, and control and coordination of multiple entities within the game.
All the agents mentioned above were trained by letting the RL algorithm play the games many many times (e.g. millions or more) and learning which policies work and which do not against different kinds of opponents and players. The large number of trials were possible because these were all games running on a computer. In determining the usefulness of various policies, the RL algorithm often employed a complex mix of ideas. These include hill climbing in policy space, playing against itself, running leagues internally amongst candidate policies or using policies used by humans as a starting point and properly balancing exploration of the policy space vs. exploiting the good policies found so far. Roughly speaking, the large number of trials enabled exploring many different game states that could plausibly be reached, while the complex evaluation methods enabled the AI system to determine which actions are useful in the long term, under plausible plays of the games, in these different states.
A key blocker in using these algorithms in the real world is that it is not possible to run millions of trials. Fortunately, a workaround immediately suggests itself: First, create a computer simulation of the application (a manufacturing plant simulation, or market simulation etc.), then learn the optimal policy in the simulation using RL algorithms, and finally adapt the learned optimal policy to the real world by running it a few times and tweaking some parameters. Famously, in a very compelling 2019 demo, OpenAI showed the effectiveness of this approach by training a robot arm to solve the Rubiks cube puzzle one-handed.
For this approach to work, your simulation has to represent the underlying problem with a high degree of accuracy. The problem youre trying to solve also has to be closed in a certain sense there cannot be arbitrary or unseen external effects that may impact the performance of the system. For example, the OpenAI solution would not work if the simulated robot arm was too different from the real robot arm or if there were attempts to knock the Rubiks cube out of the real robot arm (though it may naturally be or be explicitly trained to be robust to certain kinds of obstructions and interferences).
These limitations will sound acceptable to most people. However, in real applications it is tricky to properly circumscribe the competence of an RL system, and this can lead to unpleasant surprises. In our earlier manufacturing plant example, if a machine is replaced with one that is a lot faster or slower, it may change the plant dynamics enough that it becomes necessary to retrain the RL system. Again, this is not unreasonable for any automated controller, but stakeholders may have far loftier expectations from a system that is artificially intelligent, and such expectations will need to be managed.
Regardless, at this point in time, the future of reinforcement learning in the real world does seem very bright. There are many startups offering reinforcement learning products for controlling manufacturing robots (Covariant, Osaro, Luffy), managing production schedules (Instadeep), enterprise decision making (Secondmind), logistics (Dorabot), circuit design (Instadeep), controlling autonomous cars (Wayve, Waymo, Five AI), controlling drones (Amazon), running hedge funds (Piit.ai), and many other applications that are beyond the reach of pattern recognition based AI systems.
Each of the Big Tech companies has made heavy investments in RL research e.g. Google acquiring Deepmind for a reported 400 million (approx $525 million) in 2015. So it is reasonable to assume that RL is either already in use internally at these companies or is in the pipeline; but theyre keeping the details pretty quiet for competitive advantage reasons.
We should expect to see some hiccups as promising applications for RL falter, but it will likely claim its place as a technology to reckon with in the near future.
M M Hassan Mahmud is a Senior AI and Machine Learning Technologist at Digital Catapult, with a background in machine learning within academia and industry.
Original post:
Reinforcement learning: The next great AI tech moving from the lab to the real world - VentureBeat
- Google's AlphaGo retires from competition after beating world number one 3 - 0 - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
- Google's AlphaGo AI just beat the number one ranked Go player in the world - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
- It was November 2015. There were two world competitions. It was four months before AlphaGo, made by - - June 22nd, 2025 [June 22nd, 2025]
- The rise of Generative AI: from AlphaGo to ChatGPT - imd.org - June 1st, 2025 [June 1st, 2025]
- With the effect of Lee Se-dol, a former Go player who beat AlphaGo, "Devils Plan 2" became the secon.. - - May 14th, 2025 [May 14th, 2025]
- Chinese teams AI paper paved the way for ChatGPT. Greater glory awaits by 2030 - South China Morning Post - April 21st, 2025 [April 21st, 2025]
- AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph - ZDNet - March 9th, 2025 [March 9th, 2025]
- The evolution of AI: From AlphaGo to AI agents, physical AI, and beyond - MIT Technology Review - March 1st, 2025 [March 1st, 2025]
- AlphaGo led Lee 4-1 in March 2016. One round Lee Se-dol won remains the last round in which a man be.. - - December 5th, 2024 [December 5th, 2024]
- Koreans picked Google Artificial Intelligence (AI) AlphaGo as an image that comes to mind when they .. - MK - - March 16th, 2024 [March 16th, 2024]
- DeepMind AI rivals the world's smartest high schoolers at geometry - Ars Technica - January 20th, 2024 [January 20th, 2024]
- Why top AI talent is leaving Google's DeepMind - Sifted - November 20th, 2023 [November 20th, 2023]
- Who Is Ilya Sutskever, Meet The Man Who Fired Sam Altman - Dataconomy - November 20th, 2023 [November 20th, 2023]
- Microsoft's LLM 'Everything Of Thought' Method Improves AI ... - AiThority - November 20th, 2023 [November 20th, 2023]
- Absolutely, here's an article on the impact of upcoming technology - Medium - November 20th, 2023 [November 20th, 2023]
- AI: Elon Musk and xAI | Formtek Blog - Formtek Blog - November 20th, 2023 [November 20th, 2023]
- Rise of the Machines Exploring the Fascinating Landscape of ... - TechiExpert.com - November 20th, 2023 [November 20th, 2023]
- What can the current EU AI approach do to overcome the challenges ... - Modern Diplomacy - November 20th, 2023 [November 20th, 2023]
- If I had to pick one AI tool... this would be it. - Exponential View - November 20th, 2023 [November 20th, 2023]
- For the first time, AI produces better weather predictions -- and it's ... - ZME Science - November 20th, 2023 [November 20th, 2023]
- Understanding the World of Artificial Intelligence: A Comprehensive ... - Medium - October 17th, 2023 [October 17th, 2023]
- On AI and the soul-stirring char siu rice - asianews.network - October 17th, 2023 [October 17th, 2023]
- Nvidias Text-to-3D AI Tool Debuts While Its Hardware Business Hits Regulatory Headwinds - Decrypt - October 17th, 2023 [October 17th, 2023]
- One step closer to the Matrix: AI defeats human champion in Street ... - TechRadar - October 17th, 2023 [October 17th, 2023]
- The Vanishing Frontier - The American Conservative - October 17th, 2023 [October 17th, 2023]
- Alphabet: The complete guide to Google's parent company - Android Police - October 17th, 2023 [October 17th, 2023]
- How AI and ML Can Drive Sustainable Revenue Growth by Waleed ... - Digital Journal - October 9th, 2023 [October 9th, 2023]
- The better the AI gets, the harder it is to ignore - BSA bureau - October 9th, 2023 [October 9th, 2023]
- What If the Robots Were Very Nice While They Took Over the World? - WIRED - September 27th, 2023 [September 27th, 2023]
- From Draughts to DeepMind (Scary Smart) | by Sud Alogu | Aug, 2023 - Medium - August 5th, 2023 [August 5th, 2023]
- The Future of Competitive Gaming: AI Game Playing AI - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- AI's Transformative Impact on Industries - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- Analyzing the impact of AI in anesthesiology - INDIAai - August 5th, 2023 [August 5th, 2023]
- Economic potential of generative AI - McKinsey - June 20th, 2023 [June 20th, 2023]
- The Intersection of Reinforcement Learning and Deep Learning - CityLife - June 20th, 2023 [June 20th, 2023]
- Chinese AI Giant SenseTime Unveils USD559 Robot That Can Play ... - Yicai Global - June 20th, 2023 [June 20th, 2023]
- Cyber attacks on AI a problem for the future - Verdict - June 20th, 2023 [June 20th, 2023]
- Taming AI to the benefit of humans - Asia News NetworkAsia News ... - asianews.network - May 20th, 2023 [May 20th, 2023]
- Evolutionary reinforcement learning promises further advances in ... - EurekAlert - May 20th, 2023 [May 20th, 2023]
- Commentary: AI's successes - and problems - stem from our own ... - CNA - May 20th, 2023 [May 20th, 2023]
- Machine anxiety: How to reduce confusion and fear about AI technology - Thaiger - May 20th, 2023 [May 20th, 2023]
- We need more than ChatGPT to have true AI. It is merely the first ingredient in a complex recipe - Freethink - May 20th, 2023 [May 20th, 2023]
- Taming AI to the benefit of humans - Opinion - Chinadaily.com.cn - China Daily - May 16th, 2023 [May 16th, 2023]
- To understand AI's problems look at the shortcuts taken to create it - EastMojo - May 16th, 2023 [May 16th, 2023]
- Terence Tao Leads White House's Generative AI Working Group ... - Pandaily - May 16th, 2023 [May 16th, 2023]
- Why we should be concerned about advanced AI - Epigram - May 16th, 2023 [May 16th, 2023]
- Purdue President Chiang to grads: Let Boilermakers lead in ... - Purdue University - May 16th, 2023 [May 16th, 2023]
- 12 shots at staying ahead of AI in the workplace - pharmaphorum - May 16th, 2023 [May 16th, 2023]
- Hypotheses and Visions for an Intelligent World - Huawei - May 16th, 2023 [May 16th, 2023]
- Cloud storage is the key to unlocking AI's full potential for businesses - TechRadar - May 16th, 2023 [May 16th, 2023]
- The Quantum Frontier: Disrupting AI and Igniting a Patent Race - Lexology - April 19th, 2023 [April 19th, 2023]
- Putin and Xi seek to weaponize Artificial Intelligence against America - FOX Bangor/ABC 7 News and Stories - April 19th, 2023 [April 19th, 2023]
- The Future of Generative Large Language Models and Potential ... - JD Supra - April 19th, 2023 [April 19th, 2023]
- A Chatbot Beat the SAT. What Now? - The Atlantic - March 23rd, 2023 [March 23rd, 2023]
- Exclusive: See the cover for Benjamn Labatut's new novel, The ... - Literary Hub - March 23rd, 2023 [March 23rd, 2023]
- These companies are creating ChatGPT alternatives - Tech Monitor - March 23rd, 2023 [March 23rd, 2023]
- Google's AlphaGo AI Beats Human Go Champion | PCMag - February 24th, 2023 [February 24th, 2023]
- AlphaGo: using machine learning to master the ancient game of Go - Google - February 10th, 2023 [February 10th, 2023]
- AI Behind AlphaGo: Machine Learning and Neural Network - February 10th, 2023 [February 10th, 2023]
- Google AlphaGo: How a recreational program will change the world - February 10th, 2023 [February 10th, 2023]
- Computer Go - Wikipedia - November 22nd, 2022 [November 22nd, 2022]
- AvataGo's Metaverse AR Environment will be Your Eternal Friend - Digital Journal - September 17th, 2022 [September 17th, 2022]
- This AI-Generated Artwork Won 1st Place At Fine Arts Contest And Enraged Artists - Bored Panda - September 3rd, 2022 [September 3rd, 2022]
- The best performing from AI in blockchain games, a new DRL model published by rct AI based on training AI in Axie Infinity, AI surpasses the real... - September 3rd, 2022 [September 3rd, 2022]
- Three Methods Researchers Use To Understand AI Decisions - RTInsights - August 20th, 2022 [August 20th, 2022]
- What is my chatbot thinking? Nothing. Here's why the Google sentient bot debate is flawed - Diginomica - August 7th, 2022 [August 7th, 2022]
- Opinion: Can AI be creative? - Los Angeles Times - August 2nd, 2022 [August 2nd, 2022]
- AI predicts the structure of all known proteins and opens a new universe for science - EL PAS USA - August 2nd, 2022 [August 2nd, 2022]
- What is Ethereum Gray Glacier? Should you be worried? - Cryptopolitan - June 24th, 2022 [June 24th, 2022]
- How AI and human intelligence will beat cancer - VentureBeat - June 19th, 2022 [June 19th, 2022]
- Race-by-race tips and preview for Newcastle on Monday - Sydney Morning Herald - June 19th, 2022 [June 19th, 2022]
- A gentle introduction to model-free and model-based reinforcement learning - TechTalks - June 13th, 2022 [June 13th, 2022]
- The role of 'God' in the 'Matrix' - Analytics India Magazine - June 3rd, 2022 [June 3rd, 2022]
- The Powerful New AI Hardware of the Future - CDOTrends - June 3rd, 2022 [June 3rd, 2022]
- The 50 Best Documentaries of All Time 24/7 Wall St. - 24/7 Wall St. - June 3rd, 2022 [June 3rd, 2022]
- How Could AI be used in the Online Casino Industry - Rebellion Research - April 12th, 2022 [April 12th, 2022]
- 5 Times Artificial Intelligence Have Busted World Champions - Analytics Insight - April 2nd, 2022 [April 2nd, 2022]
- The Guardian view on bridging human and machine learning: its all in the game - The Guardian - April 2nd, 2022 [April 2nd, 2022]
- How to Strengthen America's Artificial Intelligence Innovation - The National Interest - April 2nd, 2022 [April 2nd, 2022]
- Why it's time to address the ethical dilemmas of artificial intelligence - Economic Times - April 2nd, 2022 [April 2nd, 2022]