Archive for the ‘Alphago’ Category

Monte Carlo Tree Search Tutorial | DeepMind AlphaGo

Introduction

A best of five game series, $1 million dollars in prize money A high stakes shootout. Between 9 and 15 March, 2016, the second-highest ranked Go player, Lee Sidol, took on a computer program named AlphaGo.

AlphaGo emphatically outplayed and outclassed Mr. Sidol and won the series 4-1. Designed by Googles DeepMind, the program has spawned many other developments in AI, including AlphaGo Zero. These breakthroughs are widely considered as stepping stones towards Artificial General Intelligence (AGI).

In this article, I will introduce you to the algorithm at the heart of AlphaGo Monte Carlo Tree Search (MCTS). This algorithm has one main purpose given the state of a game, choose the most promising move.

To give you some context behind AlphaGo, well first briefly look at the history of game playing AI programs. Then, well see the components of AlphaGo, the Game Tree Concept, a few tree search algorithm, and finally dive into how the MCTS algorithm works.

AI is a vast and complex field. But before AI officially became a recognized body of work, early pioneers in computer science wrote game-playing programs to test whether computers could solve human-intelligence level tasks.

To give you a sense of where Game Playing AI started from and its journey till date, I have put together the below key historical developments:

And this is just skimming the surface! There are plenty of other examples where AI programs exceeded expectations. But this should give you a fair idea of where we stand today.

The core parts of the Alpha Go comprise of:

In this blog, we willfocus on the working of Monte Carlo Tree Searchonly. This helps AlphaGo and AlphaGo Zero smartly explore and reach interesting/good states in a finite time period which in turn helps the AI reach human level performance.

Its application extends beyond games. MCTS can theoretically be applied to any domain that can be described in terms of {state,action} pairs and simulation used to forecast outcomes. Dont worry if this sounds too complex right now, well break down all these concepts in this article.

Game Trees are the most well known data structures that can represent a game. This concept is actually pretty straightforward.

Each node of a game tree represents a particular state in a game. On performing a move, one makes a transition from a node to its children. The nomenclature is very similar to decision trees wherein the terminal nodes are called leaf nodes.

For example, in the above tree, each move is equivalent to putting a cross at different positions. This branches into various other states where a zero is put at each position to generate new states. This process goes on until the leaf node is reached where the win-loss result becomes clear.

Our primary objective behind designing these algorithms is to find best the path to follow in order to win the game. In other words, look/search for a way of traversing the tree that finds the best nodes to achieve victory.

The majority of AI problems can be cast as search problems, which can be solved by finding the best plan, path, model or function.

Tree search algorithms can be seen as building a search tree:

The tree branches out because there are typically several different actions that can be taken in a given state. Tree search algorithms differ depending on which branches are explored and in what order.

Lets discuss a few tree search algorithms.

Uninformed Search algorithms, as the name suggests, search a state space without any further information about the goal. These are considered basic computer science algorithms rather than as a part of AI. Two basic algorithms that fall under this type of search are depth first search (DFS) and breadth first search (BFS). You can read more about them in this blog post.

The Best First Search (BFS) method explores a graph by expanding the most promising node chosen according to a specific rule.The defining characteristic of this search is that, unlikeDFSorBFS(which blindly examine/expand a cell without knowing anything about it), BFS uses an evaluation function (sometimes called a heuristic) to determine which node is the most promising, and then examines this node.

For example, A* algorithm keeps a list of open nodes which are next to an explored node. Note that these open nodes have not been explored. For each open node, an estimate of its distance from the goal is made. New nodes are chosen to explore based on the lowest cost basis, where the cost is the distance from the origin node plus the estimate of the distance to the goal.

For single-player games, simple uninformed or informed search algorithms can be used to find a path to the optimal game state. What should we do for two-player adversarialgames where there is another player to account for? The actions of both players depend on each other.

For these games, we rely on adversarial search. This includes the actions of two (or more) adversarial players. The basic adversarial search algorithm is called Minimax.

This algorithm has been used very successfully for playing classic perfect-information two-player board games such as Checkers and Chess. In fact, it was (re)invented specifically for the purpose of building a chess-playing program.

The core loop of the Minimax algorithm alternates between player 1 and player 2, quite like the white and black players in chess. These are called the min player and the max player. All possible moves are explored for each player.

For each resulting state, all possible moves by the other player are also explored. This goes on until all possible move combinations have been tried out to the point where the game ends (with a win, loss or draw). The entire game tree is generated through this process, from the root node down to the leaves:

Each node is explored to find the moves that give us the maximum value or score.

Games like tic-tac-toe, checkers and chess can arguably be solved using the minimax algorithm. However, things can get a little tricky when there are a large number of potential actions to be taken at each state. This is because minimax explores all the nodes available. It can become frighteningly difficult to solve a complex game like Go in a finite amount of time.

Go has a branching factor of approximately 300 i.e. from each state there are around 300 actions possible, whereas chess typically has around 30 actions to choose from. Further, the positional nature of Go, which is all about surrounding the adversary, makes it very hard to correctly estimate the value of a given board state. For more information on rules for Go, please refer this link.

There are several other games with complex rules that minimax is ill-equipped to solve. These include Battleship Poker with imperfect information and non-deterministic games such as Backgammon and Monopoly. Monte Carlo Tree Search, invented in 2007, provides a possible solution.

The basic MCTS algorithm is simple: a search tree is built, node-by-node, according to the outcomes of simulated playouts. The process can be broken down into the following steps:

Before we delve deeper and understand tree traversal and node expansion, lets get familiar with a few terms.

UCB Value

UCB1, or upper confidence bound for a node, is given by the following formula:

where,

What do we mean by a rollout? Until we reach the leaf node, we randomly choose an action at each step and simulate this action to receive an average reward when the game is over.

Flowchart for Monte Carlo Tree Search

Tree Traversal & Node Expansion

You start with S0, which is the initial state. If the current node is not a leaf node, we calculate the values for UCB1 and choose the node that maximises the UCB value. We keep doing this until we reach the leaf node.

Next, we ask how many times this leaf node was sampled. If its never been sampled before, we simply do a rollout (instead of expanding). However, if it has been sampled before, then we add a new node (state) to the tree for each available action (which we are calling expansion here).

Your current node is now this newly created node. We then do a rollout from this step.

Lets do a complete walkthrough of the algorithm to truly ingrain this concept and understand it in a lucid manner.

Iteration 1:

Initial State

Rollout from S1

Post Backpropogation

The way MCTS works is that we run it for a defined number of iterations or until we are out of time. This will tell us what is the best action at each step that one should take to get the maximum return.

Iteration 2:

Backpropogation from S2

Iteration 3:

Iteration 4:

That is the gist of this algorithm. We can perform more iterations as long as required (or is computationally possible). The underlying idea is thatthe estimate of values at each node becomes more accurate as the number of iterations keep increasing.

Deepminds AlphaGo and AlphaGo Zero programs are far more complex with various other facets that are outside the scope of this article. However, the Monte Carlo Tree Search algorithm remains at the heart of it. MCTS plays the primary role in making complex games like Go easier to crack in a finite amount of time. Some open source implementations of MCTS are linked below:

Implementation in Python

Implementation in C++

I expect reinforcement learning to make a lot of headway in 2019. It wont be surprising to see a lot more complex games being cracked by machines soon. This is a great time to learn reinforcement learning!

I would love to hear your thoughts and suggestions regarding this article and this algorithm in the comments section below. Have you used this algorithm before? If not, which game would you want to try it out on?

Related

See the article here:
Monte Carlo Tree Search Tutorial | DeepMind AlphaGo

Lee Sedol – Wikipedia

South Korean Go player

Lee Sedol (Korean: ; born 2 March 1983), or Lee Se-dol, is a former South Korean professional Go player of 9 dan rank.[1] As of February 2016, he ranked second in international titles (18), behind only Lee Chang-ho (21). He is the fifth youngest (12 years 4 months) to become a professional Go player in South Korean history behind Cho Hun-hyun (9 years 7 months), Lee Chang-ho (11 years 1 months), Cho Hye-yeon (11 years 10 months) and Choi Cheol-han (12 years 2 months). His nickname is "The Strong Stone" ("Ssen-dol"). In March 2016, he played a notable series of matches against AlphaGo that ended in 1-4.[2]

On 19 November 2019, Lee announced his retirement from professional play, stating that he could never be the top overall player of Go due to the increasing dominance of AI. Lee referred to them as being "an entity that cannot be defeated".[3]

Lee was born in South Korea in 1983 and studied at the Korea Baduk Association. He ranks second in international titles (18), behind only Lee Chang-ho (21). Despite this, he describes his opening play as "very weak".[4] In February 2013, Lee announced that he planned to retire within three years and move to the U.S. to promote Go.[5] He plays on Tygem as "gjopok".[6] He is known as 'Bigeumdo Boy' because he was born and grew up on Bigeumdo Island.[7]

He is married to Kim Hyun-jin, and has a daughter, Lee Hye-rim.[8] His older brother Lee Sang-hoon[ko] is also a 9 dan professional go player.[9]

This game was played between Lee Sedol and Hong Chang-sik during the 2003 KAT cup, on 23 April 2003. The game is notable for Lee's use of a broken ladder formation.

Normally playing out a broken ladder is a mistake, associated with beginner play, because the chasing stones are left appallingly weak. Between experts it should be decisive, leading to a lost game. Lee, playing black, defied the conventional wisdom, using the broken ladder to capture a large group of Hong's stones in the lower-right side of the board. White ultimately resigned.[10]

Starting March 9, 2016, Lee played a five-game match, broadcast live, against the computer program AlphaGo, developed by a London-based artificial intelligence firm Google DeepMind, for a $1 million match prize.[11][12][13] He said I have heard that Google DeepMinds AI is surprisingly strong and getting stronger, but I am confident that I can win at least this time.[14] In an interview with Sohn Suk-hee of JTBC Newsroom on February 22, 2016,[15] he showed confidence in his chances again, while saying that even beating AlphaGo by 41 may allow the Google DeepMind team to claim its de facto victory and the defeat of him, or even humanity. In this interview he pointed out the time rule in this match, which seems well-balanced so that both he and the AI would fairly undergo time pressure. In another interview at Yonhap News, Lee Se-dol said that he was confident of beating AlphaGo by a score of 50, at least 41 and accepted the challenge in only five minutes. He also stated "Of course, there would have been many updates in the last four or five months, but that isnt enough time to challenge me".[16]

On March 9, Lee played black and lost the first game by resignation.[17] On March 10, he played white and lost the second game by resignation.[18] On March 12, he played black and lost the third game as well.[19] On March 13, he played white and won the fourth game, following an unexpected move at White 78 described as "a brilliant tesuji", and by Gu Li 9 dan as a "divine move" and completely unforeseen by him. GoGameGuru commented that this game was "a masterpiece for Lee Sedol and will almost certainly become a famous game in the history of Go".[20] Lee commented after the victory that he considered AlphaGo was strongest when playing white (second). For this reason, and because he thought winning a second time with black would be more valuable than winning with white, he requested that he play black in the final fifth game, which is considered more risky when following Chinese Go rules.[21] On March 15, he played black and lost the fifth game, to lose the Go series 14.[22]

After his fourth-match victory, Lee was overjoyed: "I don't think I've ever felt so good after winning just one match. I remember when I said I will win all or lose just one game in the beginning. If this had really happened I won 3 rounds and lost this round it would have had a great bearing on my reputation. However, since I won after losing 3 games in a row, I am so happy. I will never exchange this win for anything in the world."[23] He added: "I, Lee Se-dol, lost, but mankind did not."[21] After the last match, however, Lee was saddened: "I failed. I feel sorry that the match is over and it ended like this. I wanted it to end well." He also confessed that "As a professional Go player, I never want to play this kind of match again. I endured the match because I accepted it."[24]

Lee Sedol turned pro in 1995 as 1 dan, and reached 9 dan in 2003.[25]

Ranks #3 in total number of titles in Korea and #2 in international titles.

Go here to see the original:
Lee Sedol - Wikipedia

Object Detection Tutorial using TensorFlow | Real-Time …

Creating accurate Machine Learning Models which are capable of identifying and localizing multiple objects in a single image remained a core challenge in computer vision. But, with recent advancements in Deep Learning, Object Detectionapplications are easier to developthan ever before. TensorFlows Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.

So guys, in this Object Detection Tutorial, Ill be covering the following topics:

You can go through this real-time object detection video lecture where our Deep Learning Training expert is discussing how to detect an object in real-time using TensorFlow.

This Edureka video will provide you with a detailed and comprehensive knowledge of TensorFlow Object detection and how it works. It will also provide you with the details on how to use Tensorflow to detect objects in the deep learning methods.

Object Detection is the process of finding real-world object instances like car, bike, TV, flowers, and humans in still images or Videos. Itallows for the recognition, localization, and detection of multiple objects within an image which provides us with a much better understanding of an image as a whole. It is commonly used in applications such as image retrieval, security, surveillance, and advanced driver assistance systems (ADAS).

Object Detection can be done via multiple ways:

In this Object Detection Tutorial, well focus on Deep Learning Object Detection as Tensorflow uses Deep Learning for computation.

Lets move forward with our Object Detection Tutorial and understand itsvarious applicationsin the industry.

A deep learning facial recognition system called the DeepFace has been developed by a group of researchers in the Facebook, which identifies human faces in a digital image very effectively. Google uses its own facial recognition system in Google Photos, which automatically segregates all the photos based on the person in the image. There are various components involved in Facial Recognition like the eyes, nose, mouthand the eyebrows.

Object detection can be also used for people counting, it is used for analyzing store performance or crowd statistics during festivals. These tend to be more difficult as people move out of the frame quickly.

It is a very important application,as during crowd gathering this feature can be used for multiple purposes.

Object detection is also used in industrial processes to identify products. Finding a specific object through visual inspection is a basic task that is involved in multiple industrial processes like sorting, inventory management, machining, quality management, packaging etc.

Inventory management can be very tricky as items are hard to track in real time. Automatic object counting and localization allows improving inventory accuracy.

Self-driving cars are the Future, theres no doubt in that. But the working behind it is very tricky as it combines a variety of techniques to perceive their surroundings, including radar, laser light, GPS, odometry, and computer vision.

Advanced control systems interpret sensory information to identify appropriate navigation paths, as well as obstacles and once the image sensor detects any sign of a living being in its path, it automatically stops. This happens at a very fast rate and is a big steptowards Driverless Cars.

Object Detection plays a very important role in Security. Be it face ID of Apple or the retina scan used in all the sci-fi movies.

It is also used by the government to access the security feed and match it with their existing database to find any criminals or to detect the robbers vehicle.

The applications are limitless.

Every Object Detection Algorithm has a different way of working, but they all work on the same principle.

Feature Extraction: They extract features from the input images at hands and use these features to determine the class of the image. Be it through MatLab, Open CV, Viola Jones or Deep Learning.

Now that you have understood the basic workflow of Object Detection, lets move ahead in Object Detection Tutorial and understand what Tensorflow is and what are its components?

Tensorflow is Googles Open Source Machine Learning Frameworkfor dataflow programming across a range of tasks.Nodes in the graph represent mathematical operations, while the graph edges represent the multi-dimensional data arrays (tensors) communicated between them.

Tensors are just multidimensional arrays, an extension of 2-dimensional tables to data with a higher dimension. There are many features of Tensorflow which makes it appropriate for Deep Learning. So, without wasting any time, lets see how we can implement Object Detection using Tensorflow.

After the environment is set up, you need to go to the object_detection directory and then create a new python file. You can use Spyder or Jupyter to write your code.

Next, we will download the model which is trained on the COCO dataset. COCO stands for Common Objects in Context,this dataset contains around 330K labeled images. Now the model selection is important as you need to make an important tradeoff between Speed and Accuracy. Depending upon your requirement and the system memory, the correct model must be selected.

Inside models>research>object_detection>g3doc>detection_model_zoo contains all the models with different speed and accuracy(mAP).

Now, lets move ahead in our Object Detection Tutorial and see how we can detect objects in Live Video Feed.

For this Demo, we will use the same code, but well do a few tweakings. Here we are going to use OpenCV and the camera Module to use the live feed of the webcam to detect objects.

Remove This

With

Remove This

With

This code will use OpenCV that will, in turn, use the camera object initialized earlier to open a new window named Object_Detection of the size 800600. It will wait for 25 milliseconds for the camera to show imagesotherwise, it will close the window.

Now with this, we come to an end to this Object Detection Tutorial. I Hope you guys enjoyed this article and understood the power of Tensorflow, and how easy it is to detect objects in images and video feed. So, if you have read this, you are no longer a newbie to Object Detection and TensorFlow. Try out these examples and let me know if there are any challenges you are facing while deploying the code.

Now that you have understood the basics ofObject Detection, check out the AI and Deep Learning With Tensorflowby Edureka,a trusted online learning companywith a network of more than250,000satisfied learnersspread acrossthe globe. This Certification Training is curated by industry professionals as per the industry requirements & demands. You will master the concepts such as SoftMax function, Autoencoder Neural Networks, Restricted Boltzmann Machine (RBM) and work with libraries like Keras & TFLearn.

Got a question for us? Please mention it in the comments section of Object Detection Tutorial and we will get back to you.

Go here to read the rest:
Object Detection Tutorial using TensorFlow | Real-Time ...

KataGo Distributed Training

About This Run

KataGo is a strong open-source self-play-trained Go engine, with many improvements to accelerate learning (arXiv paper and further techniques since). It can predict score and territory, play handicap games reasonably, and handle many board sizes and rules all with the same neural net.

This site hosts KataGo's first public-distributed training run! With the help of volunteers, we are attempting to resume training from the end of KataGo's previous official run ("g170") that ended in June 2020, and see how much further we can go. If would like to contribute, see below!

If you simply want to run KataGo, the latest releases are here and you can download the latest networks from here.You very likely want a GUI as well, because the engine alone is command-line-only. Some possible GUIs include KaTrain, Lizzie, and q5Go, more can be found searching online.

Contributors are much appreciated! If you'd like to contribute your spare GPU cycles to generate training data for the run, the steps are:

First, create an account on this site, picking a username and secure password. Make sure to verify your email so that the site considers your account fully active. Note: the username you pick will be publicly visible in statistics and on the games you contribute.

Then pick one of the following methods.

Likely easiest method, for a home desktop computer:

Command line method: if running on a remote server, or have already set up KataGo for other things, or if you want a command line that will work in the background without any GUI, or want slightly more flexibility to configure things:

Either way, once some games are finished, you can view the results at https://katagotraining.org/contributions/ - scroll down and find your username! If anything looks unusual or buggy about the games, or KataGo is behaving weirdly on your machine, please let us know, so we can avoid uploading and training on bad data. Or, if you encounter any error messages, feel to ask for help on KataGo's GitHub or the Discord chat.

For advanced users, instead of downloading a release, you can also build it from source. If you do so, use the stable branch, NOT the master branch. The example config can be found in cpp/configs/contribute_example.cfg

And if you're interested contribute to development via coding, or have a cool idea for a tool, check out either KataGo's GitHub or the this website's GitHub, and/or the Discord chat where various devs hang out. If you want to test a change that affects the distributed client and you need a test server to experiment with modified versions of KataGo, it is available at test.katagodistributed.org, contact lightvector or tychota in Discord for a testing account.

In the last week, 75 distinct users have uploaded 7,507,800 rows of training data, 147,272 new training games, and 3,006 new rating games.

In the last 24h, 44 distinct users have uploaded 942,623 rows of training data, 18,387 new training games, and 381 new rating games.

Look up and view games for this run here.

Latest network: kata1-b40c256-s10452530432-d2547930297

Strongest confidently-rated network: kata1-b40c256-s10336005120-d2519775087

Click and drag to zoom. Double-click or click on a button to reset zoom.

By Upload TimeBy Data Rows (linear)By Data Rows (log)By Data Rows (log, recent)

Continue reading here:
KataGo Distributed Training

Artificial intelligence is smart, but does it play well with others? – MIT News

When it comes to games such as chess or Go, artificial intelligence (AI) programs have far surpassed the best players in the world. These "superhuman" AIs are unmatched competitors, but perhaps harder than competing against humans is collaborating with them. Can the same technology get along with people?

In a new study, MIT Lincoln Laboratory researchers sought to find out how well humans could play the cooperative card game Hanabi with an advanced AI model trained to excel at playing with teammates it has never met before. In single-blind experiments, participants played two series of the game: one with the AI agent as their teammate, and the other with a rule-based agent, a bot manually programmed to play in a predefined way.

The results surprised the researchers. Not only were the scores no better with the AI teammate than with the rule-based agent, but humans consistently hated playing with their AI teammate. They found it to be unpredictable, unreliable, and untrustworthy, and felt negatively even when the team scored well. A paper detailing this study has been accepted to the 2021 Conference on Neural Information Processing Systems (NeurIPS).

"It really highlights the nuanced distinction between creating AI that performs objectively well and creating AI that is subjectively trusted or preferred," says Ross Allen, co-author of the paper and a researcher in the Artificial Intelligence Technology Group. "It may seem those things are so close that there's not really daylight between them, but this study showed that those are actually two separate problems. We need to work on disentangling those."

Humans hating their AI teammates could be of concern for researchers designing this technology to one day work with humans on real challenges like defending from missiles or performing complex surgery. This dynamic, called teaming intelligence, is a next frontier in AI research, and it uses a particular kind of AI called reinforcement learning.

A reinforcement learning AI is not told which actions to take, but instead discovers which actions yield the most numerical "reward" by trying out scenarios again and again. It is this technology that has yielded the superhuman chess and Go players. Unlike rule-based algorithms, these AI arent programmed to follow "if/then" statements, because the possible outcomes of the human tasks they're slated to tackle, like driving a car, are far too many to code.

"Reinforcement learning is a much more general-purpose way of developing AI. If you can train it to learn how to play the game of chess, that agent won't necessarily go drive a car. But you can use the same algorithms to train a different agent to drive a car, given the right data Allen says. "The sky's the limit in what it could, in theory, do."

Bad hints, bad plays

Today, researchers are using Hanabi to test the performance of reinforcement learning models developed for collaboration, in much the same way that chess has served as a benchmark for testing competitive AI for decades.

The game of Hanabi is akin to a multiplayer form of Solitaire. Players work together to stack cards of the same suit in order. However, players may not view their own cards, only the cards that their teammates hold. Each player is strictly limited in what they can communicate to their teammates to get them to pick the best card from their own hand to stack next.

The Lincoln Laboratory researchers did not develop either the AI or rule-based agents used in this experiment. Both agents represent the best in their fields for Hanabi performance. In fact, when the AI model was previously paired with an AI teammate it had never played with before, the team achieved the highest-ever score for Hanabi play between two unknown AI agents.

"That was an important result," Allen says. "We thought, if these AI that have never met before can come together and play really well, then we should be able to bring humans that also know how to play very well together with the AI, and they'll also do very well. That's why we thought the AI team would objectively play better, and also why we thought that humans would prefer it, because generally we'll like something better if we do well."

Neither of those expectations came true. Objectively, there was no statistical difference in the scores between the AI and the rule-based agent. Subjectively, all 29 participants reported in surveys a clear preference toward the rule-based teammate. The participants were not informed which agent they were playing with for which games.

"One participant said that they were so stressed out at the bad play from the AI agent that they actually got a headache," says Jaime Pena, a researcher in the AI Technology and Systems Group and an author on the paper. "Another said that they thought the rule-based agent was dumb but workable, whereas the AI agent showed that it understood the rules, but that its moves were not cohesive with what a team looks like. To them, it was giving bad hints, making bad plays."

Inhuman creativity

This perception of AI making "bad plays" links to surprising behavior researchers have observed previously in reinforcement learning work. For example, in 2016, when DeepMind's AlphaGo first defeated one of the worlds best Go players, one of the most widely praised moves made by AlphaGo was move 37 in game 2, a move so unusual that human commentators thought it was a mistake. Later analysis revealed that the move was actually extremely well-calculated, and was described as genius.

Such moves might be praised when an AI opponent performs them, but they're less likely to be celebrated in a team setting. The Lincoln Laboratory researchers found that strange or seemingly illogical moves were the worst offenders in breaking humans' trust in their AI teammate in these closely coupled teams. Such moves not only diminished players' perception of how well they and their AI teammate worked together, but also how much they wanted to work with the AI at all, especially when any potential payoff wasnt immediately obvious.

"There was a lot of commentary about giving up, comments like 'I hate working with this thing,'" adds Hosea Siu, also an author of the paper and a researcher in the Control and Autonomous Systems Engineering Group.

Participants who rated themselves as Hanabi experts, which the majority of players in this study did, more often gave up on the AI player. Siu finds this concerning for AI developers, because key users of this technology will likely be domain experts.

"Let's say you train up a super-smart AI guidance assistant for a missile defense scenario. You aren't handing it off to a trainee; you're handing it off to your experts on your ships who have been doing this for 25 years. So, if there is a strong expert bias against it in gaming scenarios, it's likely going to show up in real-world ops," he adds.

Squishy humans

The researchers note that the AI used in this study wasn't developed for human preference. But, that's part of the problem not many are. Like most collaborative AI models, this model was designed to score as high as possible, and its success has been benchmarked by its objective performance.

If researchers dont focus on the question of subjective human preference, "then we won't create AI that humans actually want to use," Allen says. "It's easier to work on AI that improves a very clean number. It's much harder to work on AI that works in this mushier world of human preferences."

Solving this harder problem is the goal of the MeRLin (Mission-Ready Reinforcement Learning) project, which this experiment was funded under in Lincoln Laboratory's Technology Office, in collaboration with the U.S. Air Force Artificial Intelligence Accelerator and the MIT Department of Electrical Engineering and Computer Science. The project is studying what has prevented collaborative AI technology from leaping out of the game space and into messier reality.

The researchers think that the ability for the AI to explain its actions will engender trust. This will be the focus of their work for the next year.

"You can imagine we rerun the experiment, but after the fact and this is much easier said than done the human could ask, 'Why did you do that move, I didn't understand it?" If the AI could provide some insight into what they thought was going to happen based on their actions, then our hypothesis is that humans would say, 'Oh, weird way of thinking about it, but I get it now,' and they'd trust it. Our results would totally change, even though we didn't change the underlying decision-making of the AI," Allen says.

Like a huddle after a game, this kind of exchange is often what helps humans build camaraderie and cooperation as a team.

"Maybe it's also a staffing bias. Most AI teams dont have people who want to work on these squishy humans and their soft problems," Siu adds, laughing. "It's people who want to do math and optimization. And that's the basis, but that's not enough."

Mastering a game such as Hanabi between AI and humans could open up a universe of possibilities for teaming intelligence in the future. But until researchers can close the gap between how well an AI performs and how much a human likes it, the technology may well remain at machine versus human.

See the original post:
Artificial intelligence is smart, but does it play well with others? - MIT News