Reinforcement learning: The next great AI tech moving from the lab to the real world – VentureBeat
Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.
Reinforcement learning (RL) is a powerful type of artificial intelligence technology that can be used to learn strategies to optimally control large, complex systems such as manufacturing plants, traffic control systems (road/train/aircraft), financial portfolios, robots, etc. It is currently transitioning from research labs to highly impactful, real world applications. For example, self-driving car companies like Wayveand Waymoare using reinforcement learning to develop the control systems for their cars.
AI systems that are typically used in industry perform pattern recognition to make a prediction. For instance, they may recognize patterns in images to detect faces (face detection), or recognize patterns in sales data to predict a change in demand (demand forecasting), and so on. Reinforcement learning methods, on the other hand, are used to make optimal decisions or take optimal actions in applications where there is a feedback loop. An example where both traditional AI methods and RL may be used, but for different purposes, will make the distinction clearer.
Say we are using AI to help operate a manufacturing plant. Pattern recognition may be used for quality assurance, where the AI system uses images and scans of the finished product to detect any imperfections or flaws. An RL system, on the other hand, would compute and execute the strategy for controlling the manufacturing process itself (by, for example, deciding which lines to run, controlling machines/robots, deciding which product to manufacture, and so on). The RL system will also try to ensure that the strategy is optimal in that it maximizes some metric of interest such as the output volume while maintaining a certain level of product quality. The problem of computing the optimal control strategy, which RL solves, is very difficult for some subtle reasons (often much more difficult than pattern recognition).
In computing the optimal strategy, or policy in RL parlance, the main challenge an RL learning algorithm faces is the so-called temporal credit assignment problem. That is, the impact of an action (e.g. run line 1 on Wednesday) in a given system state (e.g. current output level of machines, how busy each line is, etc.) on the overall performance (e.g. total output volume) is not known until after (potentially) a long time. To make matters worse, the overall performance also depends on all the actions that are taken subsequent to the action being evaluated. Together, this implies that, when a candidate policy is executed for evaluation, it is difficult to know which actions were the good ones and which were the bad ones in other words, it is very difficult to assign credit to the different actions appropriately. The large number of potential system states in these complex problems further exacerbates the situation via the dreaded curse of dimensionality. A good way to get an intuition for how an RL system solves all these problems at the same time is by looking at the recent spectacular successes they have had in the lab.
Many of the recent, prominent demonstrations of the power of RL come from applying them to board games and video games. The first RL system to impress the global AI community was able to learn to outplay humans in different Atari games when only given as input the images on screen and the scores received by playing the game. This was created in 2013 by London-based AI research lab Deepmind (now part of Alphabet Inc.). The same lab later created a series of RL systems (or agents), starting with the AlphaGo agent, which were able to defeat the top players in the world in the board game Go. These impressive feats, which occurred between 2015 and 2017, took the world by storm because Go is a very complex game, with millions of fans and players around the world, that requires intricate, long-term strategic thinking involving both the local and global board configurations.
Subsequently, Deepmind and the AI research lab OpenAI have released systems for playing the video games Starcraft and DOTA 2 that can defeat the top human players around the world. These games are challenging because they require strategic thinking, resource management, and control and coordination of multiple entities within the game.
All the agents mentioned above were trained by letting the RL algorithm play the games many many times (e.g. millions or more) and learning which policies work and which do not against different kinds of opponents and players. The large number of trials were possible because these were all games running on a computer. In determining the usefulness of various policies, the RL algorithm often employed a complex mix of ideas. These include hill climbing in policy space, playing against itself, running leagues internally amongst candidate policies or using policies used by humans as a starting point and properly balancing exploration of the policy space vs. exploiting the good policies found so far. Roughly speaking, the large number of trials enabled exploring many different game states that could plausibly be reached, while the complex evaluation methods enabled the AI system to determine which actions are useful in the long term, under plausible plays of the games, in these different states.
A key blocker in using these algorithms in the real world is that it is not possible to run millions of trials. Fortunately, a workaround immediately suggests itself: First, create a computer simulation of the application (a manufacturing plant simulation, or market simulation etc.), then learn the optimal policy in the simulation using RL algorithms, and finally adapt the learned optimal policy to the real world by running it a few times and tweaking some parameters. Famously, in a very compelling 2019 demo, OpenAI showed the effectiveness of this approach by training a robot arm to solve the Rubiks cube puzzle one-handed.
For this approach to work, your simulation has to represent the underlying problem with a high degree of accuracy. The problem youre trying to solve also has to be closed in a certain sense there cannot be arbitrary or unseen external effects that may impact the performance of the system. For example, the OpenAI solution would not work if the simulated robot arm was too different from the real robot arm or if there were attempts to knock the Rubiks cube out of the real robot arm (though it may naturally be or be explicitly trained to be robust to certain kinds of obstructions and interferences).
These limitations will sound acceptable to most people. However, in real applications it is tricky to properly circumscribe the competence of an RL system, and this can lead to unpleasant surprises. In our earlier manufacturing plant example, if a machine is replaced with one that is a lot faster or slower, it may change the plant dynamics enough that it becomes necessary to retrain the RL system. Again, this is not unreasonable for any automated controller, but stakeholders may have far loftier expectations from a system that is artificially intelligent, and such expectations will need to be managed.
Regardless, at this point in time, the future of reinforcement learning in the real world does seem very bright. There are many startups offering reinforcement learning products for controlling manufacturing robots (Covariant, Osaro, Luffy), managing production schedules (Instadeep), enterprise decision making (Secondmind), logistics (Dorabot), circuit design (Instadeep), controlling autonomous cars (Wayve, Waymo, Five AI), controlling drones (Amazon), running hedge funds (Piit.ai), and many other applications that are beyond the reach of pattern recognition based AI systems.
Each of the Big Tech companies has made heavy investments in RL research e.g. Google acquiring Deepmind for a reported 400 million (approx $525 million) in 2015. So it is reasonable to assume that RL is either already in use internally at these companies or is in the pipeline; but theyre keeping the details pretty quiet for competitive advantage reasons.
We should expect to see some hiccups as promising applications for RL falter, but it will likely claim its place as a technology to reckon with in the near future.
M M Hassan Mahmud is a Senior AI and Machine Learning Technologist at Digital Catapult, with a background in machine learning within academia and industry.
Original post:
Reinforcement learning: The next great AI tech moving from the lab to the real world - VentureBeat
- Exclusive: Longtime Google DeepMind researcher David Silver leaves to found his own AI startup - Fortune - February 1st, 2026 [February 1st, 2026]
- 10 Years Since AlphaGo, Code Name: BlueSpot Disclosed Ahead of Handicap-Match Events - PR Newswire - January 16th, 2026 [January 16th, 2026]
- AlphaGo - The Movie | Full Award-winning Documentary Click Through The Next Website Page (KLlvj2Y96q) - Leaders.com.tn - January 14th, 2026 [January 14th, 2026]
- The last market maker? Why AGI may be the end of trading as we know it - felixonline.co.uk - January 9th, 2026 [January 9th, 2026]
- 200 Million People Watched Globally: Why Did He Win the Nobel Prize? All Revealed in 1.5 Hours - 36Kr - December 31st, 2025 [December 31st, 2025]
- The Thinking Game - How DeepMind Transformed Artificial Intelligence - Chess News | ChessBase - December 2nd, 2025 [December 2nd, 2025]
- Musk Challenges LoL Champion Team with AI - | DBR - December 2nd, 2025 [December 2nd, 2025]
- "The Man Who Beat AlphaGo" Lee Se-dol picked "Marriage" as one of the best things in his life.Recent.. - - November 7th, 2025 [November 7th, 2025]
- Schwarzenegger urges Californians to oppose Democratic redistricting ballot measure, as GOP presses on in other states - CNN - October 26th, 2025 [October 26th, 2025]
- Trump says hes targeting Democrats programs, but the suffering is bipartisan - The Hill - October 26th, 2025 [October 26th, 2025]
- Analysis | After Trump gains, New Jersey governors race offers a test for Democrats - The Washington Post - October 26th, 2025 [October 26th, 2025]
- Trump looms over 2025 races in Virginia, New Jersey, NYC and California - USA Today - October 26th, 2025 [October 26th, 2025]
- Opinion | How Democrats Became the Party of the Well-to-Do - The New York Times - October 26th, 2025 [October 26th, 2025]
- Transcript: House Minority Leader Hakeem Jeffries on "Face the Nation with Margaret Brennan," Oct. 26, 2025 - CBS News - October 26th, 2025 [October 26th, 2025]
- 'King-like powers': Chris Murphy says Trump prefers the government to remain closed - Politico - October 26th, 2025 [October 26th, 2025]
- On GPS: Is the future of the Democratic Party on the left? - CNN - October 26th, 2025 [October 26th, 2025]
- Elect the Jersey guy: How Jack Ciattarelli is trying to erase Democrats advantage in a crucial governors race - CNN - October 26th, 2025 [October 26th, 2025]
- Can Democrats harness the energy of the No Kings protests to fight Trump? - The Guardian - October 26th, 2025 [October 26th, 2025]
- Democrats face identity crisis after years of losing touch with voters - Deseret News - October 26th, 2025 [October 26th, 2025]
- Meet the candidates in the special election for Texas Senate District 9 - CBS News - October 26th, 2025 [October 26th, 2025]
- New Georgia Democratic Party leader, government shutdown, NBA gambling | On The Record with ANF - Atlanta News First - October 26th, 2025 [October 26th, 2025]
- Expert warns Democrats risk backlash over failure to condemn violent rhetoric in their ranks - Fox News - October 26th, 2025 [October 26th, 2025]
- I hate to be the one to tell you, but Democrats are starting to like Trump | Opinion - USA Today - October 26th, 2025 [October 26th, 2025]
- Why has the US government shut down and what does it mean? - BBC - October 26th, 2025 [October 26th, 2025]
- Article | Virginia Democrats are the next surprising entrant into the redistricting battle - POLITICO Pro - October 26th, 2025 [October 26th, 2025]
- Could she be Democrats' greatest Hope? Meet Tim Walz's TikTok famous daughter. - USA Today - October 26th, 2025 [October 26th, 2025]
- Democrats Join With Trump in the Death of Democracy - GV Wire - October 26th, 2025 [October 26th, 2025]
- Opinion | The exploding cigar of mid-decade gerrymandering - The Washington Post - October 26th, 2025 [October 26th, 2025]
- Minnesota Democrats hold the first of a series of town halls on gun violence - MPR News - October 26th, 2025 [October 26th, 2025]
- South Korean Go champion defeats AlphaGo for the first time in a comeback victory - Mashdigi - September 25th, 2025 [September 25th, 2025]
- Why AlphaGo, not ChatGPT, will shape the future of wealth management - Professional Wealth Management - September 17th, 2025 [September 17th, 2025]
- The world shuddered when Lee Se-dol made a "God's move" against AlphaGo in 2016. The final result wa.. - - August 26th, 2025 [August 26th, 2025]
- The Go Summit concluded with AlphaGo 2.0 defeating the human brain in three matches. - Mashdigi - August 22nd, 2025 [August 22nd, 2025]
- Lee Sedol showcases board game success and family life on 'Radio Star' - CHOSUNBIZ - Chosun Biz - August 20th, 2025 [August 20th, 2025]
- AlphaGo evolved again and in just three days learned the human Go strategy that took thousands of years to develop. - Mashdigi - August 18th, 2025 [August 18th, 2025]
- In the third round of the Man vs. Machine game, a five-player team still lost to AlphaGo 5. - Mashdigi - August 18th, 2025 [August 18th, 2025]
- AlphaGo defeated Lee Sedol 4:1 to end the century showdown - Mashdigi - August 18th, 2025 [August 18th, 2025]
- Google: The key to AlphaGo 2.0's fast thinking lies in the TensorFlow learning framework - Mashdigi - August 18th, 2025 [August 18th, 2025]
- World Go champion Ke Jie faces AlphaGo 2.0 in the showdown of the century tomorrow. - Mashdigi - August 18th, 2025 [August 18th, 2025]
- Lee Se-dol, a Go engineer who played a great match with "AlphaGo" with Lee Kuk-jong, the head of the.. - - August 14th, 2025 [August 14th, 2025]
- The Rise of Self-Improving AI : How Machines Are Redefining Innovation - Geeky Gadgets - August 6th, 2025 [August 6th, 2025]
- AI Wins Gold Medal at International Mathematical Olympiad (IMO), but "AlphaGo Moment" in Math Community Yet to Arrive - 36Kr - August 1st, 2025 [August 1st, 2025]
- It's exciting, but you can't just read it comfortably. This is the story of Jang Kang-myung's latest.. - - July 20th, 2025 [July 20th, 2025]
- Google's AlphaGo retires from competition after beating world number one 3 - 0 - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
- Google's AlphaGo AI just beat the number one ranked Go player in the world - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
- It was November 2015. There were two world competitions. It was four months before AlphaGo, made by - - June 22nd, 2025 [June 22nd, 2025]
- The rise of Generative AI: from AlphaGo to ChatGPT - imd.org - June 1st, 2025 [June 1st, 2025]
- With the effect of Lee Se-dol, a former Go player who beat AlphaGo, "Devils Plan 2" became the secon.. - - May 14th, 2025 [May 14th, 2025]
- Chinese teams AI paper paved the way for ChatGPT. Greater glory awaits by 2030 - South China Morning Post - April 21st, 2025 [April 21st, 2025]
- AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph - ZDNet - March 9th, 2025 [March 9th, 2025]
- The evolution of AI: From AlphaGo to AI agents, physical AI, and beyond - MIT Technology Review - March 1st, 2025 [March 1st, 2025]
- AlphaGo led Lee 4-1 in March 2016. One round Lee Se-dol won remains the last round in which a man be.. - - December 5th, 2024 [December 5th, 2024]
- Koreans picked Google Artificial Intelligence (AI) AlphaGo as an image that comes to mind when they .. - MK - - March 16th, 2024 [March 16th, 2024]
- DeepMind AI rivals the world's smartest high schoolers at geometry - Ars Technica - January 20th, 2024 [January 20th, 2024]
- Why top AI talent is leaving Google's DeepMind - Sifted - November 20th, 2023 [November 20th, 2023]
- Who Is Ilya Sutskever, Meet The Man Who Fired Sam Altman - Dataconomy - November 20th, 2023 [November 20th, 2023]
- Microsoft's LLM 'Everything Of Thought' Method Improves AI ... - AiThority - November 20th, 2023 [November 20th, 2023]
- Absolutely, here's an article on the impact of upcoming technology - Medium - November 20th, 2023 [November 20th, 2023]
- AI: Elon Musk and xAI | Formtek Blog - Formtek Blog - November 20th, 2023 [November 20th, 2023]
- Rise of the Machines Exploring the Fascinating Landscape of ... - TechiExpert.com - November 20th, 2023 [November 20th, 2023]
- What can the current EU AI approach do to overcome the challenges ... - Modern Diplomacy - November 20th, 2023 [November 20th, 2023]
- If I had to pick one AI tool... this would be it. - Exponential View - November 20th, 2023 [November 20th, 2023]
- For the first time, AI produces better weather predictions -- and it's ... - ZME Science - November 20th, 2023 [November 20th, 2023]
- Understanding the World of Artificial Intelligence: A Comprehensive ... - Medium - October 17th, 2023 [October 17th, 2023]
- On AI and the soul-stirring char siu rice - asianews.network - October 17th, 2023 [October 17th, 2023]
- Nvidias Text-to-3D AI Tool Debuts While Its Hardware Business Hits Regulatory Headwinds - Decrypt - October 17th, 2023 [October 17th, 2023]
- One step closer to the Matrix: AI defeats human champion in Street ... - TechRadar - October 17th, 2023 [October 17th, 2023]
- The Vanishing Frontier - The American Conservative - October 17th, 2023 [October 17th, 2023]
- Alphabet: The complete guide to Google's parent company - Android Police - October 17th, 2023 [October 17th, 2023]
- How AI and ML Can Drive Sustainable Revenue Growth by Waleed ... - Digital Journal - October 9th, 2023 [October 9th, 2023]
- The better the AI gets, the harder it is to ignore - BSA bureau - October 9th, 2023 [October 9th, 2023]
- What If the Robots Were Very Nice While They Took Over the World? - WIRED - September 27th, 2023 [September 27th, 2023]
- From Draughts to DeepMind (Scary Smart) | by Sud Alogu | Aug, 2023 - Medium - August 5th, 2023 [August 5th, 2023]
- The Future of Competitive Gaming: AI Game Playing AI - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- AI's Transformative Impact on Industries - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- Analyzing the impact of AI in anesthesiology - INDIAai - August 5th, 2023 [August 5th, 2023]
- Economic potential of generative AI - McKinsey - June 20th, 2023 [June 20th, 2023]
- The Intersection of Reinforcement Learning and Deep Learning - CityLife - June 20th, 2023 [June 20th, 2023]
- Chinese AI Giant SenseTime Unveils USD559 Robot That Can Play ... - Yicai Global - June 20th, 2023 [June 20th, 2023]
- Cyber attacks on AI a problem for the future - Verdict - June 20th, 2023 [June 20th, 2023]