Archive for the ‘Alphazero’ Category

It’s Called Artificial Intelligencebut What Is Intelligence? – WIRED

Elizabeth Spelke, a cognitive psychologist at Harvard, has spent her career testing the worlds most sophisticated learning systemthe mind of a baby.

Gurgling infants might seem like no match for artificial intelligence. They are terrible at labeling images, hopeless at mining text, and awful at videogames. Then again, babies can do things beyond the reach of any AI. By just a few months old, they've begun to grasp the foundations of language, such as grammar. They've started to understand how the physical world works, how to adapt to unfamiliar situations.

Yet even experts like Spelke don't understand precisely how babiesor adults, for that matterlearn. That gap points to a puzzle at the heart of modern artificial intelligence: We're not sure what to aim for.

Consider one of the most impressive examples of AI, AlphaZero, a program that plays board games with superhuman skill. After playing thousands of games against itself at hyperspeed, and learning from winning positions, AlphaZero independently discovered several famous chess strategies and even invented new ones. It certainly seems like a machine eclipsing human cognitive abilities. But AlphaZero needs to play millions more games than a person during practice to learn a game. Most tellingly, it cannot take what it has learned from the game and apply it to another area.

To some members of the AI priesthood, that calls for a new approach. What makes human intelligence special is its adaptabilityits power to generalize to never-seen-before situations, says Franois Chollet, a well-known AI engineer and the creator of Keras, a widely used framework for deep learning. In a November research paper, he argued that it's misguided to measure machine intelligence solely according to its skills at specific tasks. Humans don't start out with skills; they start out with a broad ability to acquire new skills, he says. What a strong human chess player is demonstrating isn't the ability to play chess per se, but the potential to acquire any task of a similar difficulty. That's a very different capability.

Chollet posed a set of problems designed to test an AI program's ability to learn in a more generalized way. Each problem requires arranging colored squares on a grid based on just a few prior examples. It's not hard for a person. But modern machine-learning programstrained on huge amounts of datacannot learn from so few examples. As of late April, more than 650 teams had signed up to tackle the challenge; the best AI systems were getting about 12 percent correct.

A self-driving car cannot intuit from common sense what will happen if a truck spills its load.

It isn't yet clear how humans solve these problems, but Spelke's work offers a few clues. For one thing, it suggests that humans are born with an innate ability to quickly learn certain things, like what a smile means or what happens when you drop something. It also suggests we learn a lot from each other. One recent experiment showed that 3-month-olds appear puzzled when someone grabs a ball in an inefficient way, suggesting that they already appreciate that people cause changes in their environment. Even the most sophisticated and powerful AI systems on the market can't grasp such concepts. A self-driving car, for instance, cannot intuit from common sense what will happen if a truck spills its load.

Josh Tenenbaum, a professor in MIT's Center for Brains, Minds & Machines, works closely with Spelke and uses insights from cognitive science as inspiration for his programs. He says much of modern AI misses the bigger picture, likening it to a Victorian-era satire about a two-dimensional world inhabited by simple geometrical people. We're sort of exploring Flatlandonly some dimensions of basic intelligence, he says. Tenenbaum believes that, just as evolution has given the human brain certain capabilities, AI programs will need a basic understanding of physics and psychology in order to acquire and use knowledge as efficiently as a baby. And to apply this knowledge to new situations, he says, they'll need to learn in new waysfor example, by drawing causal inferences rather than simply finding patterns. At some pointyou know, if you're intelligentyou realize maybe there's something else out there, he says.

This article appears in the June issue. Subscribe now.

Let us know what you think about this article. Submit a letter to the editor at mail@wired.com.

Special Series: The Future of Thinking Machines

Original post:
It's Called Artificial Intelligencebut What Is Intelligence? - WIRED

AlphaZero – Wikipedia

Game-playing artificial intelligence

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

On December 5, 2017, the DeepMind team released a preprint introducing AlphaZero, which within 24 hours of training achieved a superhuman level of play in these three games by defeating world-champion programs Stockfish, elmo, and the 3-day version of AlphaGo Zero. In each case it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use.[1] AlphaZero was trained solely via "self-play" using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After four hours of training, DeepMind estimated AlphaZero was playing at a higher Elo rating than Stockfish 8; after 9 hours of training, the algorithm defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).[1][2][3] The trained algorithm played on a single machine with four TPUs.

DeepMind's paper on AlphaZero was published in the journal Science on 7 December 2018.[4] In 2019 DeepMind published a new paper detailing MuZero, a new algorithm able to generalise on AlphaZero work playing both Atari and board games without knowledge of the rules or representations of the game.[5]

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[1]

Comparing Monte Carlo tree search searches, AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variation.[1]

AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks. In parallel, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero's performance exceeded the benchmark after around four hours of training for Stockfish, two hours for elmo, and eight hours for AlphaGo Zero.[1]

In AlphaZero's chess tournament against Stockfish 8 (2016 TCEC world champion), each program was given one minute per move. Stockfish was allocated 64 threads and a hash size of 1 GB,[1] a setting that Stockfish's Tord Romstad later criticized as suboptimal.[6][note 1] AlphaZero was trained on chess for a total of nine hours before the tournament. During the tournament, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72.[8] In a series of twelve, 100-game matches (of unspecified time or resource constraints) against Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24.[1]

AlphaZero was trained on shogi for a total of two hours before the tournament. In 100 shogi games against elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73 search), AlphaZero won 90 times, lost 8 times and drew twice.[8] As in the chess games, each program got one minute per move, and elmo was given 64 threads and a hash size of 1GB.[1]

After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40.[1][8]

DeepMind stated in its preprint, "The game of chess represented the pinnacle of AI research over several decades. State-of-the-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic reinforcement learning algorithm originally devised for the game of go that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules."[1] DeepMind's Demis Hassabis, a chess player himself, called AlphaZero's play style "alien": It sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a positional advantage. "It's like chess from another dimension."[9]

Given the difficulty in chess of forcing a win against a strong opponent, the +28 0 =72 result is a significant margin of victory. However, some grandmasters, such as Hikaru Nakamura and Komodo developer Larry Kaufman, downplayed AlphaZero's victory, arguing that the match would have been closer if the programs had access to an opening database (since Stockfish was optimized for that scenario).[10] Romstad additionally pointed out that Stockfish is not optimized for rigidly fixed-time moves and the version used is a year old.[6][11]

Similarly, some shogi observers argued that the elmo hash size was too low, that the resignation settings and the "EnteringKingRule" settings (cf. shogi Entering King) may have been inappropriate, and that elmo is already obsolete compared with newer programs.[12][13]

Papers headlined that the chess training took only four hours: "It was managed in little more than the time between breakfast and lunch."[2][14] Wired hyped AlphaZero as "the first multi-skilled AI board-game champ".[15] AI expert Joanna Bryson noted that Google's "knack for good publicity" was putting it in a strong position against challengers. "It's not only about hiring the best programmers. It's also very political, as it helps make Google as strong as possible when negotiating with governments and regulators looking at the AI sector."[8]

Human chess grandmasters generally expressed excitement about AlphaZero. Danish grandmaster Peter Heine Nielsen likened AlphaZero's play to that of a superior alien species.[8] Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero's play as "insane attacking chess" with profound positional understanding.[2] Former champion Garry Kasparov said "It's a remarkable achievement, even if we should have expected it after AlphaGo."[10][16]

Grandmaster Hikaru Nakamura was less impressed, and stated "I don't necessarily put a lot of credibility in the results simply because my understanding is that AlphaZero is basically using the Google supercomputer and Stockfish doesn't run on that hardware; Stockfish was basically running on what would be my laptop. If you wanna have a match that's comparable you have to have Stockfish running on a supercomputer as well."[7]

Top US correspondence chess player Wolff Morrow was also unimpressed, claiming that AlphaZero would probably not make the semifinals of a fair competition such as TCEC where all engines play on equal hardware. Morrow further stated that although he might not be able to beat AlphaZero if AlphaZero played drawish openings such as the Petroff Defence, AlphaZero would not be able to beat him in a correspondence chess game either.[17]

Motohiro Isozaki, the author of YaneuraOu, noted that although AlphaZero did comprehensively beat elmo, the rating of AlphaZero in shogi stopped growing at a point which is at most 100~200 higher than elmo. This gap is not that high, and elmo and other shogi software should be able to catch up in 12 years.[18]

DeepMind addressed many of the criticisms in their final version of the paper, published in December 2018 in Science.[4] They further clarified that AlphaZero was not running on a supercomputer; it was trained using 5,000 tensor processing units (TPUs), but only ran on four TPUs and a 44-core CPU in its matches.[19]

In the final results, Stockfish version 8 ran under the same conditions as in the TCEC superfinal: 44 CPU cores, Syzygy endgame tablebases, and a 32GB hash size. Instead of a fixed time control of one move per minute, both engines were given 3 hours plus 15 seconds per move to finish the game. In a 1000-game match, AlphaZero won with a score of 155 wins to 6 losses, with the rest drawn. DeepMind also played a series of games using the TCEC opening positions; AlphaZero also won convincingly.

Similar to Stockfish, Elmo ran under the same conditions as in the 2017 CSA championship. The version of Elmo used was WCSC27 in combination with YaneuraOu 2017 Early KPPT 4.79 64AVX2 TOURNAMENT. Elmo operated on the same hardware as Stockfish: 44 CPU cores and a 32GB hash size. AlphaZero won 98.2% of games when playing black (which plays first in shogi) and 91.2% overall.

Human grandmasters were generally impressed with AlphaZero's games against Stockfish.[20] Former world champion Garry Kasparov said it was a pleasure to watch AlphaZero play, especially since its style was open and dynamic like his own.[21][22]

In the chess community, Komodo developer Mark Lefler called it a "pretty amazing achievement", but also pointed out that the data was old, since Stockfish had gained a lot of strength since January 2018 (when Stockfish 8 was released). Fellow developer Larry Kaufman said AlphaZero would probably lose a match against the latest version of Stockfish, Stockfish 10, under Top Chess Engine Championship (TCEC) conditions. Kaufman argued that the only advantage of neural networkbased engines was that they used a GPU, so if there was no regard for power consumption (e.g. in an equal-hardware contest where both engines had access to the same CPU and GPU) then anything the GPU achieved was "free". Based on this, he stated that the strongest engine was likely to be a hybrid with neural networks and standard alphabeta search.[23]

AlphaZero inspired the computer chess community to develop Leela Chess Zero, using the same techniques as AlphaZero. Leela contested several championships against Stockfish, where it showed similar strength.[24]

In 2019 DeepMind published MuZero, a unified system that played excellent chess, shogi, and go, as well as games in the Atari Learning Environment, without being pre-programmed with their rules.[25][26]

The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions.[7]

The rest is here:
AlphaZero - Wikipedia

AlphaZero: How Intuition Demolished Logic – Intuition …

Photo by Luiz Hanfilaque on Unsplash

Modern civilization and the trappings of technology has lead to the decline of our own intuition. Many of us have become unaware of its value or even its very existence. Intuition as a basis of complex computation is easily dismissed as an approach outside of the conventional. This lack of conventionality leads many researchers to ignore its potential.

The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift. Albert Einstein

The research that I do in Artificial Intelligence (AI) revolves around the idea that advanced cognitive machines will use intuition as the substrate of its intelligence (see: artificial intuition). Our own human minds provide ample evidence for general intelligence. Humans are fundamentally intuition machines and our rational (and conscious) self are just a simulation layered on top of intuition-based machinery (see: cognitive stack). This is in stark contrast to Descartes famous saying I think. therefore I am (Cogito ergo sum), which implies that our rational thinking is what separates us from all of biology. We thus have a cognitive bias to demand technologies and methodologies that are driven by logical machinery. This is indeed the reason for multi-decade failure of Good Old Fashioned AI (GOFAI) which attempted to solve the problem of intelligence from formal logic as its starting point.

One of the counter-intuitive predictions of intuition based machines is how can logical thought arise from intuition machines? Since 2012, we have seen the incredible advances of Deep Learning technology. Deep Learning networks are intuition machines. These systems learn to perform inference (or make predictions) by using induction. Deep Learning systems have been able to perform tasks that are usually reserved for biological brains. Tasks that have known to be difficult for conventional computing, such as facial and speech recognition, can be performed at super human levels by these machines.

Deep Learning networks however are incapable of performing logical tasks such as long division. One should not expect to be able to teach an animal (i.e. your dog) to perform multiplication much less addition or subtraction. However, human brains are able to perform all sorts of logical problems. We have to ask though, can a caveman be able to do multiplication? Are we innately capable of advanced logical cognition or is this capability something we learned as a consequence of our advanced civilization?

The big chasm that needs to be crossed to achieve more general artificial intelligence is what is known as the semantic gap. How do we fuse the capabilities of Deep Learning (sub-symbolic) system with logical (symbolic) systems?

Human minds are capable of performing great feats of logical reasoning. How are our minds able to do this if our machinery is all intuition based? I am going to make the assumption here that we dont have any innate logical machinery. It is unlikely that Homo sapiens have evolved this cognitive machinery in the short time weve existed in this planet. Therefore, to bridge the semantic gap, we need to bridge it using intuition only mechanisms. What this means is that we dont need to perform a fusion of logical components with intuition components. All we ever need is intuition components.

Therefore we need to show ample evidence that complex logical thinking can be performed by an intuition machine.

This is where AlphaZero makes its revolutionary revelation. AlphaZero is the latest evolution of DeepMindss Go play program. I have written previously about AlphaGo Zero (different from AlphaZero) and how it was able to learn to master the game of Go from scratch (without human knowledge). 99% of Westerners have never played the game of Go and simply dont understand it at all. So the relevance of DeepMinds AlphaGo Zero achievement has been muted. We dont understand the enormity of the achievement. Go however has been known to be a game of intuition. So its somewhat (ignorantly) unsurprising that an intuition machine (one based on Deep Learning) is able to master the game.

However, what DeepMinds new incarnation (AlphaZero) is able to do is play the game of chess. This of course may not be surprising to many since the game of chess has been solved by computer ever since IBMs DeepBlue bested Kasparov in 1996. It may not be remarkable for the uninitiated that it took AlphaZero a few hours to master the game of chess from scratch. It may not be remarkable that AlphaZero was able to destroy the best chess playing program (Stockfish) in 100 games.

What is truly remarkable is how AlphaZero played in dismantling its more logical opponent. To give you an idea, I will quote some impressions from the chess playing community.

It approaches the Type B, human-like approach to machine chess dreamt of by Claude Shannon and Alan Turing instead of brute force. Gary Kasparov.

I always wondered how it would be if a superior species landed on earth and showed us how they play chess. I feel now I know. Peter Heine Nielsen

It doesnt play like a human, and it doesnt play like a program. It plays in a third, almost alien, way. Demis Hassabis (who also plays chess)

For those who understand chess play, its probably best to watch the actual game play of AlphaZero versus Stockfish. What you will see is how an intuition based system dismantles an opponent that is based on logic (that is, one that cant refuse a gambit). Below are games with expert commentary:

AlphaZero plays a very different game of chess. It is willing to sacrifice pieces in order to gain a positional advantage over its opponent. It is playing a kind of chess judo where it uses an opponents eagerness in achieving an immediate gain against itself. It sets up its opponent into what is known in chess as zugzwang, where every move that one makes leads to a worse outcome. It seems to have a more holistic sense of the game of chess where all its pieces move in a highly coordinated manner. AlphaZero plays a game that maximizes its creativeness against a logical opponent that is unable to see beyond short term gains. It plays a game of chess that is not only unimaginable, but would in the past been placed in a pedestal for all to marvel.

The paper about AlphaZero was presented in the recently concluded NIPS 2017 conference. It is an extremely short paper, the main body is only 7 pages long. It provides an interesting detail about how extensively it evaluates the board position to decide on its move.

AlphaZero searches just 80 thousand positions per second in chess, compared to 70 million for Stockfish.

The intuition machine is using 1,000 times less evaluations than the logical opponent.

What you are witnessing here with AlphaZero is validation of my original thesis about intuition machines and their ability to perform logical reasoning. This is the semantic gap being bridged. This is an extremely difficult AGI milestone being surmounted at a record pace. I doubt anyone in the AI community expected this kind of progress to be achieved so quickly. Yet is has happened and the landscape has been changed forever.

Go here to see the original:
AlphaZero: How Intuition Demolished Logic - Intuition ...

AlphaGo Zero – Wikipedia

Artificial intelligence that plays Go

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version.[1] By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.[2]

Training artificial intelligence (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills because expert data is "often expensive, unreliable or simply unavailable."[3] Demis Hassabis, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge".[4] David Silver, one of the first authors of DeepMind's papers published in Nature on AlphaGo, said that it is possible to have generalised AI algorithms by removing the need to learn from humans.[5]

Google later developed AlphaZero, a generalized version of AlphaGo Zero that could play chess and Shgi in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (Stockfish) and a top Shgi program (Elmo).[6][7]

AlphaGo Zero's neural network was trained using TensorFlow, with 64 GPU workers and 19 CPU parameter servers.Only four TPUs were used for inference. The neural network initially knew nothing about Go beyond the rules. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome.[8] In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession.[9] It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.[10]

For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run.[11] DeepMind submitted its initial findings in a paper to Nature in April 2017, which was then published in October 2017.[1]

The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million.[12]

According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as protein folding or accurately simulating chemical reactions.[13] AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car.[14] DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings.[15][16]

AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. Oren Etzioni of the Allen Institute for Artificial Intelligence called AlphaGo Zero "a very impressive technical result" in "both their ability to do itand their ability to train the system in 40 days, on four TPUs".[8] The Guardian called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of Sheffield University and Tom Mitchell of Carnegie Mellon University, who called it an impressive feat and an outstanding engineering accomplishment" respectively.[14] Mark Pesce of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".[17]

Gary Marcus, a psychologist at New York University, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains".[9]

In response to the reports, South Korean Go professional Lee Sedol said, "The previous version of AlphaGo wasnt perfect, and I believe thats why AlphaGo Zero was made."On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players.Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGos playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, Ive become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. Its now between computers."Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team."Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said.[18]Chinese Go professional, Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."[19]

Future of Go Summit

89:11 against AlphaGo Master

On 5 December 2017, DeepMind team released a preprint on arXiv, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in chess, shogi, and Go, defeating world-champion programs, Stockfish, Elmo, and 3-day version of AlphaGo Zero in each case.[6]

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[6]

An open source program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a GPU instead of the TPUs recent versions of AlphaGo rely on.

Read this article:
AlphaGo Zero - Wikipedia

How to build your own AlphaZero AI using Python and Keras

Connect4

The game that our algorithm will learn to play is Connect4 (or Four In A Row). Not quite as complex as Go but there are still 4,531,985,219,092 game positions in total.

The game rules are straightforward. Players take it in turns to enter a piece of their colour in the top of any available column. The first player to get four of their colour in a row each vertically, horizontally or diagonally, wins. If the entire grid is filled without a four-in-a-row being created, the game is drawn.

Heres a summary of the key files that make up the codebase:

This file contains the game rules for Connect4.

Each squares is allocated a number from 0 to 41, as follows:

The game.py file gives the logic behind moving from one game state to another, given a chosen action. For example, given the empty board and action 38, the takeAction method return a new game state, with the starting players piece at the bottom of the centre column.

You can replace the game.py file with any game file that conforms to the same API and the algorithm will in principal, learn strategy through self play, based on the rules you have given it.

This contains the code that starts the learning process. It loads the game rules and then iterates through the main loop of the algorithm, which consist of three stages:

There are two agents involved in this loop, the best_player and the current_player.

The best_player contains the best performing neural network and is used to generate the self play memories. The current_player then retrains its neural network on these memories and is then pitched against the best_player. If it wins, the neural network inside the best_player is switched for the neural network inside the current_player, and the loop starts again.

This contains the Agent class (a player in the game). Each player is initialised with its own neural network and Monte Carlo Search Tree.

The simulate method runs the Monte Carlo Tree Search process. Specifically, the agent moves to a leaf node of the tree, evaluates the node with its neural network and then backfills the value of the node up through the tree.

The act method repeats the simulation multiple times to understand which move from the current position is most favourable. It then returns the chosen action to the game, to enact the move.

The replay method retrains the neural network, using memories from previous games.

This file contains the Residual_CNN class, which defines how to build an instance of the neural network.

It uses a condensed version of the neural network architecture in the AlphaGoZero paper i.e. a convolutional layer, followed by many residual layers, then splitting into a value and policy head.

The depth and number of convolutional filters can be specified in the config file.

The Keras library is used to build the network, with a backend of Tensorflow.

To view individual convolutional filters and densely connected layers in the neural network, run the following inside the the run.ipynb notebook:

This contains the Node, Edge and MCTS classes, that constitute a Monte Carlo Search Tree.

The MCTS class contains the moveToLeaf and backFill methods previously mentioned, and instances of the Edge class store the statistics about each potential move.

This is where you set the key parameters that influence the algorithm.

Adjusting these variables will affect that running time, neural network accuracy and overall success of the algorithm. The above parameters produce a high quality Connect4 player, but take a long time to do so. To speed the algorithm up, try the following parameters instead.

Contains the playMatches and playMatchesBetweenVersions functions that play matches between two agents.

To play against your creation, run the following code (its also in the run.ipynb notebook)

When you run the algorithm, all model and memory files are saved in the run folder, in the root directory.

To restart the algorithm from this checkpoint later, transfer the run folder to the run_archive folder, attaching a run number to the folder name. Then, enter the run number, model version number and memory version number into the initialise.py file, corresponding to the location of the relevant files in the run_archive folder. Running the algorithm as usual will then start from this checkpoint.

An instance of the Memory class stores the memories of previous games, that the algorithm uses to retrain the neural network of the current_player.

This file contains a custom loss function, that masks predictions from illegal moves before passing to the cross entropy loss function.

The locations of the run and run_archive folders.

Log files are saved to the log folder inside the run folder.

To turn on logging, set the values of the logger_disabled variables to False inside this file.

Viewing the log files will help you to understand how the algorithm works and see inside its mind. For example, here is a sample from the logger.mcts file.

Equally from the logger.tourney file, you can see the probabilities attached to each move, during the evaluation phase:

Training over a couple of days produces the following chart of loss against mini-batch iteration number:

The top line is the error in the policy head (the cross entropy of the MCTS move probabilities, against the output from the neural network). The bottom line is the error in the value head (the mean squared error between the actual game value and the neural network predict of the value). The middle line is an average of the two.

Clearly, the neural network is getting better at predicting the value of each game state and the likely next moves. To show how this results in stronger and stronger play, I ran a league between 17 players, ranging from the 1st iteration of the neural network, up to the 49th. Each pairing played twice, with both players having a chance to play first.

Here are the final standings:

Clearly, the later versions of the neural network are superior to the earlier versions, winning most of their games. It also appears that the learning hasnt yet saturated with further training time, the players would continue to get stronger, learning more and more intricate strategies.

As an example, one clear strategy that the neural network has favoured over time is grabbing the centre column early. Observe the difference between the first version of the algorithm and say, the 30th version:

1st neural network version

30th neural network version

This is a good strategy as many lines require the centre column claiming this early ensures your opponent cannot take advantage of this. This has been learnt by the neural network, without any human input.

There is a game.py file for a game called Metasquares in the games folder. This involves placing X and O markers in a grid to try to form squares of different sizes. Larger squares score more points than smaller squares and the player with the most points when the grid is full wins.

If you switch the Connect4 game.py file for the Metasquares game.py file, the same algorithm will learn how to play Metasquares instead.

Hopefully you find this article useful let me know in the comments below if you find any typos or have questions about anything in the codebase or article and Ill get back to you as soon as possible.

Excerpt from:
How to build your own AlphaZero AI using Python and Keras