Reinforcement learning: The next great AI tech moving from the lab to the real world – VentureBeat

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.

Reinforcement learning (RL) is a powerful type of artificial intelligence technology that can be used to learn strategies to optimally control large, complex systems such as manufacturing plants, traffic control systems (road/train/aircraft), financial portfolios, robots, etc. It is currently transitioning from research labs to highly impactful, real world applications. For example, self-driving car companies like Wayveand Waymoare using reinforcement learning to develop the control systems for their cars.

AI systems that are typically used in industry perform pattern recognition to make a prediction. For instance, they may recognize patterns in images to detect faces (face detection), or recognize patterns in sales data to predict a change in demand (demand forecasting), and so on. Reinforcement learning methods, on the other hand, are used to make optimal decisions or take optimal actions in applications where there is a feedback loop. An example where both traditional AI methods and RL may be used, but for different purposes, will make the distinction clearer.

Say we are using AI to help operate a manufacturing plant. Pattern recognition may be used for quality assurance, where the AI system uses images and scans of the finished product to detect any imperfections or flaws. An RL system, on the other hand, would compute and execute the strategy for controlling the manufacturing process itself (by, for example, deciding which lines to run, controlling machines/robots, deciding which product to manufacture, and so on). The RL system will also try to ensure that the strategy is optimal in that it maximizes some metric of interest such as the output volume while maintaining a certain level of product quality. The problem of computing the optimal control strategy, which RL solves, is very difficult for some subtle reasons (often much more difficult than pattern recognition).

In computing the optimal strategy, or policy in RL parlance, the main challenge an RL learning algorithm faces is the so-called temporal credit assignment problem. That is, the impact of an action (e.g. run line 1 on Wednesday) in a given system state (e.g. current output level of machines, how busy each line is, etc.) on the overall performance (e.g. total output volume) is not known until after (potentially) a long time. To make matters worse, the overall performance also depends on all the actions that are taken subsequent to the action being evaluated. Together, this implies that, when a candidate policy is executed for evaluation, it is difficult to know which actions were the good ones and which were the bad ones in other words, it is very difficult to assign credit to the different actions appropriately. The large number of potential system states in these complex problems further exacerbates the situation via the dreaded curse of dimensionality. A good way to get an intuition for how an RL system solves all these problems at the same time is by looking at the recent spectacular successes they have had in the lab.

Many of the recent, prominent demonstrations of the power of RL come from applying them to board games and video games. The first RL system to impress the global AI community was able to learn to outplay humans in different Atari games when only given as input the images on screen and the scores received by playing the game. This was created in 2013 by London-based AI research lab Deepmind (now part of Alphabet Inc.). The same lab later created a series of RL systems (or agents), starting with the AlphaGo agent, which were able to defeat the top players in the world in the board game Go. These impressive feats, which occurred between 2015 and 2017, took the world by storm because Go is a very complex game, with millions of fans and players around the world, that requires intricate, long-term strategic thinking involving both the local and global board configurations.

Subsequently, Deepmind and the AI research lab OpenAI have released systems for playing the video games Starcraft and DOTA 2 that can defeat the top human players around the world. These games are challenging because they require strategic thinking, resource management, and control and coordination of multiple entities within the game.

All the agents mentioned above were trained by letting the RL algorithm play the games many many times (e.g. millions or more) and learning which policies work and which do not against different kinds of opponents and players. The large number of trials were possible because these were all games running on a computer. In determining the usefulness of various policies, the RL algorithm often employed a complex mix of ideas. These include hill climbing in policy space, playing against itself, running leagues internally amongst candidate policies or using policies used by humans as a starting point and properly balancing exploration of the policy space vs. exploiting the good policies found so far. Roughly speaking, the large number of trials enabled exploring many different game states that could plausibly be reached, while the complex evaluation methods enabled the AI system to determine which actions are useful in the long term, under plausible plays of the games, in these different states.

A key blocker in using these algorithms in the real world is that it is not possible to run millions of trials. Fortunately, a workaround immediately suggests itself: First, create a computer simulation of the application (a manufacturing plant simulation, or market simulation etc.), then learn the optimal policy in the simulation using RL algorithms, and finally adapt the learned optimal policy to the real world by running it a few times and tweaking some parameters. Famously, in a very compelling 2019 demo, OpenAI showed the effectiveness of this approach by training a robot arm to solve the Rubiks cube puzzle one-handed.

For this approach to work, your simulation has to represent the underlying problem with a high degree of accuracy. The problem youre trying to solve also has to be closed in a certain sense there cannot be arbitrary or unseen external effects that may impact the performance of the system. For example, the OpenAI solution would not work if the simulated robot arm was too different from the real robot arm or if there were attempts to knock the Rubiks cube out of the real robot arm (though it may naturally be or be explicitly trained to be robust to certain kinds of obstructions and interferences).

These limitations will sound acceptable to most people. However, in real applications it is tricky to properly circumscribe the competence of an RL system, and this can lead to unpleasant surprises. In our earlier manufacturing plant example, if a machine is replaced with one that is a lot faster or slower, it may change the plant dynamics enough that it becomes necessary to retrain the RL system. Again, this is not unreasonable for any automated controller, but stakeholders may have far loftier expectations from a system that is artificially intelligent, and such expectations will need to be managed.

Regardless, at this point in time, the future of reinforcement learning in the real world does seem very bright. There are many startups offering reinforcement learning products for controlling manufacturing robots (Covariant, Osaro, Luffy), managing production schedules (Instadeep), enterprise decision making (Secondmind), logistics (Dorabot), circuit design (Instadeep), controlling autonomous cars (Wayve, Waymo, Five AI), controlling drones (Amazon), running hedge funds (Piit.ai), and many other applications that are beyond the reach of pattern recognition based AI systems.

Each of the Big Tech companies has made heavy investments in RL research e.g. Google acquiring Deepmind for a reported 400 million (approx $525 million) in 2015. So it is reasonable to assume that RL is either already in use internally at these companies or is in the pipeline; but theyre keeping the details pretty quiet for competitive advantage reasons.

We should expect to see some hiccups as promising applications for RL falter, but it will likely claim its place as a technology to reckon with in the near future.

M M Hassan Mahmud is a Senior AI and Machine Learning Technologist at Digital Catapult, with a background in machine learning within academia and industry.

Original post:
Reinforcement learning: The next great AI tech moving from the lab to the real world - VentureBeat

In March 2016, Lee competed against AlphaGo five times. The global community was surprised by AlphaG.. - - July 1st, 2026 [July 1st, 2026]
Google cofounder Sergey Brin says he uses the game of Go to explain the future of work - Business Insider - June 10th, 2026 [June 10th, 2026]
NVIDIA invests in new company led by AlphaGo's R&D director, exploring the future frontier of large-scale model industry. - - May 17th, 2026 [May 17th, 2026]
The Creator of AlphaGo Just Raised $1.1 Billion on One Radical Thesis And It Could Redefine the Entire Future of AI - quasa.io - May 7th, 2026 [May 7th, 2026]
The Man Behind AlphaGo Thinks AI Is Taking the Wrong Path - WIRED - April 29th, 2026 [April 29th, 2026]
AlphaGo's Father with 300,000 Citations Raises Nearly 10 Billion Yuan in Four Months of Business, Firmly Believes RL Can Achieve ASI - eu.36kr.com - April 29th, 2026 [April 29th, 2026]
Google's AlphaGo Software Beats Human Champion of 'Go' in First Round - ABC News - Breaking News, Latest News and Videos - April 29th, 2026 [April 29th, 2026]
Lee Sedol and Demis Hassabis Reunite 10 Years After the "AlphaGo Shock"... "Paving the Way for the AGI Era" - - April 29th, 2026 [April 29th, 2026]
The stone Lee Sedol cast 10 years ago became a stepping stone for the AI era - - April 29th, 2026 [April 29th, 2026]
10 Years Since AlphaGo Google Presents Blueprint for Next-Generation AI Stage: 'AGI and Scientific Innovation' in Korea - - April 29th, 2026 [April 29th, 2026]
Korea Forges AI Alliance with Google DeepMind on AlphaGo's 10th Anniversary - Seoul Economic Daily - April 29th, 2026 [April 29th, 2026]
The two figures who helped usher in the age of artificial intelligence and shook the world have met - - April 29th, 2026 [April 29th, 2026]
Broadcast of Lee Sedol's win over AlphaGo reaches 14.3% peak viewership - - April 29th, 2026 [April 29th, 2026]
'AlphaGo' CEO Hassabis Holds Consecutive Meetings with Heads of Korea's Four Major Conglomerates (Comprehensive) - - April 29th, 2026 [April 29th, 2026]
The 'AlphaGo' of table tennis has arrived? Sony AI robot defeats professional players, featured in Nature journal - - April 29th, 2026 [April 29th, 2026]
DeepMinds David Silver Raises $1.1 Billion to Build AI That Learns Without Human Data - CXO Digitalpulse - April 29th, 2026 [April 29th, 2026]
'AlphaGo Father' Hassabis Returns to Korea After 10 Years to Discuss AI's Next Move - Seoul Economic Daily - April 29th, 2026 [April 29th, 2026]
10th Anniversary of AlphaGo's Match vs Lee Sedol: In - depth Revelation of the Five - day Event in Seoul - eu.36kr.com - April 17th, 2026 [April 17th, 2026]
DeepMind CEO Discusses Two Paths for AI: Becoming a Scientific Tool or Joining the AGI Race - eu.36kr.com - April 17th, 2026 [April 17th, 2026]
Beyond AlphaGo, Can AI Earn Trust Even in Human Context? - Yahoo Finance - April 12th, 2026 [April 12th, 2026]
Lee Sedol Reveals Trick Move That Confused AlphaGo in Historic Match - - April 8th, 2026 [April 8th, 2026]
Lee Se-dol, who changed the world's Go history with the "confrontation of the century" with artifici.. - - March 9th, 2026 [March 9th, 2026]
A Decade After AlphaGo, Artificial Intelligence Has Transformed the Game of Go - Koreabizwire - March 9th, 2026 [March 9th, 2026]
How an intern helped build the AI that shook the world - New Scientist - March 9th, 2026 [March 9th, 2026]
The moment that kicked off the AI revolution - New Scientist - March 7th, 2026 [March 7th, 2026]
Is the AlphaGo AI the best in the world? We're about to find out. - Mashable - March 7th, 2026 [March 7th, 2026]
The future of Go: Lee Se-dol is back, and this time it's personal - Korea JoongAng Daily - March 4th, 2026 [March 4th, 2026]
AI is rewiring how the worlds best Go players think - MIT Technology Review - March 4th, 2026 [March 4th, 2026]
Agentic artificial intelligence (AI) startup Inns announced on the 3rd that it will hold a global AI.. - - March 4th, 2026 [March 4th, 2026]
Lee Se-dol 9 dan will play against artificial intelligence (AI) again in 10 years at the match again.. - - March 4th, 2026 [March 4th, 2026]
Exclusive: Longtime Google DeepMind researcher David Silver leaves to found his own AI startup - Fortune - February 1st, 2026 [February 1st, 2026]
10 Years Since AlphaGo, Code Name: BlueSpot Disclosed Ahead of Handicap-Match Events - PR Newswire - January 16th, 2026 [January 16th, 2026]
AlphaGo - The Movie | Full Award-winning Documentary Click Through The Next Website Page (KLlvj2Y96q) - Leaders.com.tn - January 14th, 2026 [January 14th, 2026]
The last market maker? Why AGI may be the end of trading as we know it - felixonline.co.uk - January 9th, 2026 [January 9th, 2026]
200 Million People Watched Globally: Why Did He Win the Nobel Prize? All Revealed in 1.5 Hours - 36Kr - December 31st, 2025 [December 31st, 2025]
The Thinking Game - How DeepMind Transformed Artificial Intelligence - Chess News | ChessBase - December 2nd, 2025 [December 2nd, 2025]
Musk Challenges LoL Champion Team with AI - | DBR - December 2nd, 2025 [December 2nd, 2025]
"The Man Who Beat AlphaGo" Lee Se-dol picked "Marriage" as one of the best things in his life.Recent.. - - November 7th, 2025 [November 7th, 2025]
Schwarzenegger urges Californians to oppose Democratic redistricting ballot measure, as GOP presses on in other states - CNN - October 26th, 2025 [October 26th, 2025]
Trump says hes targeting Democrats programs, but the suffering is bipartisan - The Hill - October 26th, 2025 [October 26th, 2025]
Analysis | After Trump gains, New Jersey governors race offers a test for Democrats - The Washington Post - October 26th, 2025 [October 26th, 2025]
Trump looms over 2025 races in Virginia, New Jersey, NYC and California - USA Today - October 26th, 2025 [October 26th, 2025]
Opinion | How Democrats Became the Party of the Well-to-Do - The New York Times - October 26th, 2025 [October 26th, 2025]
Transcript: House Minority Leader Hakeem Jeffries on "Face the Nation with Margaret Brennan," Oct. 26, 2025 - CBS News - October 26th, 2025 [October 26th, 2025]
'King-like powers': Chris Murphy says Trump prefers the government to remain closed - Politico - October 26th, 2025 [October 26th, 2025]
On GPS: Is the future of the Democratic Party on the left? - CNN - October 26th, 2025 [October 26th, 2025]
Elect the Jersey guy: How Jack Ciattarelli is trying to erase Democrats advantage in a crucial governors race - CNN - October 26th, 2025 [October 26th, 2025]
Can Democrats harness the energy of the No Kings protests to fight Trump? - The Guardian - October 26th, 2025 [October 26th, 2025]
Democrats face identity crisis after years of losing touch with voters - Deseret News - October 26th, 2025 [October 26th, 2025]
Meet the candidates in the special election for Texas Senate District 9 - CBS News - October 26th, 2025 [October 26th, 2025]
New Georgia Democratic Party leader, government shutdown, NBA gambling | On The Record with ANF - Atlanta News First - October 26th, 2025 [October 26th, 2025]
Expert warns Democrats risk backlash over failure to condemn violent rhetoric in their ranks - Fox News - October 26th, 2025 [October 26th, 2025]
I hate to be the one to tell you, but Democrats are starting to like Trump | Opinion - USA Today - October 26th, 2025 [October 26th, 2025]
Why has the US government shut down and what does it mean? - BBC - October 26th, 2025 [October 26th, 2025]
Article | Virginia Democrats are the next surprising entrant into the redistricting battle - POLITICO Pro - October 26th, 2025 [October 26th, 2025]
Could she be Democrats' greatest Hope? Meet Tim Walz's TikTok famous daughter. - USA Today - October 26th, 2025 [October 26th, 2025]
Democrats Join With Trump in the Death of Democracy - GV Wire - October 26th, 2025 [October 26th, 2025]
Opinion | The exploding cigar of mid-decade gerrymandering - The Washington Post - October 26th, 2025 [October 26th, 2025]
Minnesota Democrats hold the first of a series of town halls on gun violence - MPR News - October 26th, 2025 [October 26th, 2025]
South Korean Go champion defeats AlphaGo for the first time in a comeback victory - Mashdigi - September 25th, 2025 [September 25th, 2025]
Why AlphaGo, not ChatGPT, will shape the future of wealth management - Professional Wealth Management - September 17th, 2025 [September 17th, 2025]
The world shuddered when Lee Se-dol made a "God's move" against AlphaGo in 2016. The final result wa.. - - August 26th, 2025 [August 26th, 2025]
The Go Summit concluded with AlphaGo 2.0 defeating the human brain in three matches. - Mashdigi - August 22nd, 2025 [August 22nd, 2025]
Lee Sedol showcases board game success and family life on 'Radio Star' - CHOSUNBIZ - Chosun Biz - August 20th, 2025 [August 20th, 2025]
AlphaGo evolved again and in just three days learned the human Go strategy that took thousands of years to develop. - Mashdigi - August 18th, 2025 [August 18th, 2025]
In the third round of the Man vs. Machine game, a five-player team still lost to AlphaGo 5. - Mashdigi - August 18th, 2025 [August 18th, 2025]
AlphaGo defeated Lee Sedol 4:1 to end the century showdown - Mashdigi - August 18th, 2025 [August 18th, 2025]
Google: The key to AlphaGo 2.0's fast thinking lies in the TensorFlow learning framework - Mashdigi - August 18th, 2025 [August 18th, 2025]
World Go champion Ke Jie faces AlphaGo 2.0 in the showdown of the century tomorrow. - Mashdigi - August 18th, 2025 [August 18th, 2025]
Lee Se-dol, a Go engineer who played a great match with "AlphaGo" with Lee Kuk-jong, the head of the.. - - August 14th, 2025 [August 14th, 2025]
The Rise of Self-Improving AI : How Machines Are Redefining Innovation - Geeky Gadgets - August 6th, 2025 [August 6th, 2025]
AI Wins Gold Medal at International Mathematical Olympiad (IMO), but "AlphaGo Moment" in Math Community Yet to Arrive - 36Kr - August 1st, 2025 [August 1st, 2025]
It's exciting, but you can't just read it comfortably. This is the story of Jang Kang-myung's latest.. - - July 20th, 2025 [July 20th, 2025]
Google's AlphaGo retires from competition after beating world number one 3 - 0 - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
Google's AlphaGo AI just beat the number one ranked Go player in the world - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
It was November 2015. There were two world competitions. It was four months before AlphaGo, made by - - June 22nd, 2025 [June 22nd, 2025]
The rise of Generative AI: from AlphaGo to ChatGPT - imd.org - June 1st, 2025 [June 1st, 2025]
With the effect of Lee Se-dol, a former Go player who beat AlphaGo, "Devils Plan 2" became the secon.. - - May 14th, 2025 [May 14th, 2025]
Chinese teams AI paper paved the way for ChatGPT. Greater glory awaits by 2030 - South China Morning Post - April 21st, 2025 [April 21st, 2025]
AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph - ZDNet - March 9th, 2025 [March 9th, 2025]

April 2nd, 2021

No comments yet

Comments are closed.

Mediaboss Marketing

Reinforcement learning: The next great AI tech moving from the lab to the real world – VentureBeat

About

Pages

Categories

Media Sites

Recommended Sites

Archives