Archive for the ‘Alphazero’ Category

Facebook’s New Algorithm Can Play Poker And Beat Humans At It – Digital Information World

Have you ever thought about an AI-based machine playing poker with you? If your imagination has gone that wild then Facebook is all set to make it a reality with its new general AI framework called Recursive Belief-based Learning (ReBeL) that can even perform better than humans in poker and with little domain knowledge as compared to the previous poker setups made with AI.

With ReBel, Facebook is also going for multi-agent interactions - which means that the general algorithms will soon have the capacity to be deployed on a large scale and for multi-agent settings as well. The potential applications include workings like auction, negotiations, and cybersecurity or the operation of self-driving cars and trucks.

Facebooks plan of combining reinforcement learning with search for AI model training can lead to some remarkable advancements. This is because Reinforcement Learning is based on agents learning to achieve goals in order to maximize rewards whereas search is basically defined as a process that starts from the plan to the stage of setting the goal.

One such example is of Deepminds Alpha Zero that is based on a similar program to deliver state-of-the-art performance in board games like chess, shogi, and Go. However, the combination falls short when it is being applied for games like poker because of imperfect information that can arise as a result of how the situation in the game changes. Actions then take help from probability or the playing strategy.

Hence, proposing a solution to the problem in the form of ReBel, Facebook researchers have now expanded the notion of game state while including the agents belief which relies on the state they are in while playing - counting the common knowledge and policies of other players as well.

When working, ReBel trains two AI models; one is of a value network and the other is of policy network. There is reinforcement learning happening with search during the self-play which eventually has resulted into a flexible algorithm that now holds the potential to beat human players.

For a high level, ReBel operates with public belief states rather than going for world states. If that has surprised you then public belief states are there to generalize the notion of state value in games with imperfect information like Poker. PBS is also more often regarded as a common-knowledge probability distribution over a limited arrangement of possible actions and states, which we sometimes call history as well.

Now in perfect-information games, PBS can be distilled down to histories just like the way it distills down to world states in two-player zero-sum games. Not to forget that a PBS is actually the decisions that a player can and also the outcomes of the possibilities on one hand.

As soon as ReBel starts to work for every new game, it creates a subgame in the beginning which is very much similar to the original one, except for the fact that its roots go back to the initial PBS. The algorithm actually wins by repeating the runtime of equilibrium-finding algorithm and then take advantage of the trained value network to create estimates on every stage of the iteration. Furthermore, with enforcement learning, the values come out easily and then added back to the network as training examples. The policies in the subgame are also added as examples. The process continues to repeat itself until PBS becomes the new subgame root and completes a certain accuracy threshold.

The researchers also benchmarked ReBel, as a part of the experiment, for games of heads-up no-limit Texas holdem poker, Liars Dice, and turn endgame holdem. They used 128 PCs with eight graphic cards only to generate the stimulated game data and of course place random bets and stack sizes (ranging from 5000 to 25000 chips) to test its abilities.

ReBel was also trained on a game with one of the best heads up poker players in the world Don Kim and the results turned out to be ReBel playing faster than two seconds per hand across 7,500 hands and how it didnt take more than 5 seconds for any decision. Overall ReBel scored 165 thousandths - which is a pretty good result when compared to the previous poker playing system by the social media giant Libratus that resulted in 147 thousandths.

To prevent cheating, Facebook has decided that they will not release ReBels codebase for Poker. The company only open-sourced Liar Dices implementation, which according to researchers is easier to understand and adjust.

Photo: Josh Edelson / Agence France-Presse / Getty Images

Read next: Facebook Boasts 2.7 Billion Monthly Active Users In The Second Quarter of 2020, 3.14 Billion Combined MAU across Whatsapp, Messenger, Instagram and FB

See more here:
Facebook's New Algorithm Can Play Poker And Beat Humans At It - Digital Information World

Survival of the Fattest: Macheide and Superman – TheArticle

When it comes to theories of evolution, there are broadly three sensible options: if you are British, especially if you are a National Treasure, such as Sir David Attenborough, then Charles Darwin is your natural selection as the explanation for the evolutionary process. If, however, you are French, then it boils down to two choices, Lamarckian gradualism, or Cuverian catastrophism.

In the first half of the 19thcentury, the French naturalist Baron Georges Cuvier (1769-1832) developed his theory of catastrophes. In fact, in keeping with the spirit of his times, this Master of Disaster preferred the termrevolutiontocatastrophe. Cuvier was immensely fat and thereby earned his nickname The Mammoth, coincidentally a pachydermic palaeolontogical research sphere in which he excelled. Britains greatest chess player, Nigel Short, once reduced me to helpless laughter by describing a certain, strikingly rotund, Soviet chess grandmaster as spherical. By all accounts, Baron Cuvier was certainly in that league.

According to Cuvier, the fossil record demonstrates that species, both plant and animal, aredestroyed time and time again by volcanic eruptions, meteoric bombardments, giant deluges and countless other shocks and natural cataclysms. In Cuviers interpretation of the life cycle of the planet, newspecies evolve only after each catastrophe or revolution has been completed. Obvious examples are the Permian and Cretaceous mass extinctions, also the igneous convulsions of the Deccan Traps from the early Palaeogene geological period.

A trenchant critic of Cuviers theory of cataclysmswas his one time mentor, fellow French academic, Jean-Baptiste Lamarck(1744 -1829) who held that all living thingshad originated from simple organisms, and were thus inextricably related to each other. So far, so good.

The varying species were, according to Lamarck, simply the outcome of disparate environmental conditions: in Lamarcks view, intensive use of certain body parts would result intheir reinforcement andipso facto singular growth;their neglect, on the contrary, would lead to a reversal of their development, and eventual disappearance. The propertiesthus acquired, in the growth scenario, would be passed on to offspring.

Lamarck illustrated his theory by using the example of theevolution of giraffes. Giraffes must clearly have once had short-neckedancestors whose goal was to reach forthe juiciest and highest leaves of certain trees.By constantly stretchingtheir necks, those necks grew longerand longer, a property whichthey passed on to their offspring.The extended neck of the giraffe is said by Lamarck to have come into existence in this manner over manygenerations.

Unfortunately, Lamarcks explanation sounds too close to the controversial theory of the inheritance of acquired characteristics. The dramatic story about the viability of this explanation of evolution, with its various pros and cons, is related in Arthur Koestlers book The Case of the Midwife Toad (1971). Austrian scientist Paul Kammerer (1880-1926) sought to justify the theory that organisms may pass to their offspring characteristics acquired in their lifetime. Tragically, Kammerer committed suicide when his experiments were found to have been fraudulent, with Indian ink injected into certain parts of the anatomy of the amphibian in question, which, to add insult to injury, actually turned out to be a frog and not a toad at all.

If one is not convinced by Cuvier or Lamarck, that leaves only one winner, namely the British scientist who championed the survival of the fittest by the process of natural selection.

Charles Darwin (1809-1882) was convinced that features, such as the long neck of Artiodactyl ruminants, are the result of natural selection, according to conditions of existence. Accordingly, certain giraffes would have had longer necksthrough purechance, and thereby enjoyed an advantage over other membersof their species, in being able to reach formerly inaccessible sources of food.The animals passed onthis accidental by-product of nature to their offspring, who, for their part,were better able to surviveperiods of food scarcity. Over geological time, the longer neckedgiraffes survived and flourished. The shorter necks, unable to compete, died out.

Charles was influenced by his versatile grandfather Erasmus Darwin (1731-1802)a proto-evolutionist, physician, poet and slave trade abolitionist, whose surviving portraits indicate, like Cuvier, a man who, if not exactly spherical, was certainly of impressive girth.

Chess, too, has developed its own theory of evolution, principally through the work of the chess champion and philosopher Emanuel Lasker.Lasker was an intellectual titan, long reigning World Chess Champion and friend of Albert Einstein (who wrote the foreword to Dr J. Hannaks biography of Lasker). Lasker developed his own concept of evolution, which he construed as strictly teleological. His independence of thought was admirable and even led him to challenge Einstein about the theory of relativity, much as Goethe had challenged Newton over light and colour in his Zur Farbenlehre.

Lasker (1868-1941) was one of the most dominant champions, and he is still regarded as one of the strongest players ever to grace a chessboard. In 1906 he published a booklet titledKampf(Struggle), in which he attempted to create a general theory, relevant to all competitive activities, including chess, business and warfare. Laskers philosophy can be summed up in this quotation from his writings: By some ardent enthusiastschess has been elevated into a science or an art. It is neither; but itsprincipal characteristic seems to be whathuman nature mostly delightsin a fight.

Lasker advanced his idea of evolution based on struggle, by postulating the possibility of the macheide,meaning son of battle, a being whose attributes are so sharpened by evolutionary struggle, that it always chooses the best and most efficient method of perpetuating its own success.

On the chessboard, for example, the macheidewould always make the best move, which would result (as one chess masterremarked) in the sad result that after the first game between twomacheides, chess would cease to exist.

What the macheide chiefly invokes to my mind is comparison with the evolutionary thought experiments of a slightly earlier German philosopher Friedrich Nietzsche (1844-1900).In a particularly freezing winter at the end of 1969, after the end of the official term, I decided to stay on in my rooms at Trinity College Cambridge, adjacent to those of my fellow student H.R.H. The Prince of Wales, and readall ofNietzsches works in the original German. What follows are the conclusionsI reached from my snow bound ivory tower.

In my mind, Jackboot or Genius? was the chief question. Nietzschehas sufferedfromabad press in England.This is in part due to his unfortunate, thoughunintentional, andcertainlyunpremeditated association with 1930s National Socialism in Germanyand consequentlytoan excoriating onslaught by Bertrand Russell in his A History of Western Philosophy.

Published soon after the SecondWorld War, Russells book unequivocally identified Nietzsche as a potentsource for the ideological inspiration underpinning the beliefs of Adolf Hitlerandthe tenetsof the Nazi party itself. Tellingly adopting the homely simileof a cricket match (what could be moreEnglish), Russellclaimed that Nietzsche and his followers had had their innings, but were nowto be ineluctably swept from the field.

Russell was not only misguided in savagingNietzsches reputation, he was also motivated by acrudely political agenda, which deliberately distorted thesubtle message behind Nietzsches philosophy. Russell was presumably, blissfully unaware of the irony that during World War I, Hitler had formed his own cricket team to play against British prisoners of war.

Nietzsche expressedhimself in metaphors, and his chief metaphorwas the assertion that God is dead. In the Old Testament it is writtenthat, without vision the people perish. This is normally interpretedeither as prophecy or as an aspirational nostrum, centring on goaldriven targets.As Virgil put it in The Aeneid, book V:Hos successusalit, possuntquiaepossevidentur.For those will conquer who believe they can, in Poet LaureateJohn Drydens concise translation.

From its context, however,inthe Book of Proverbs it is clear,to me at least, that the word vision is infact a mistranslationfor supervision. The powerful Biblicalidea which I believe is expressed here, is that human society cannot function properly,unless it is well and firmly governed;and if the celestial governorhas abdicated responsibility, oreven worse, perished, then the worldis in deep trouble.

Nietzsche did indeed fear that his world was in precisely that kind ofdeep trouble. With the erosion of belief in any kind of just or ruling deity,swept aside by 18thand 19thadvances inscience and the creed ofpseudo-rationality which reached its apogee with the FrenchRevolution,what was to prevent great and terrible wars, horrific injusticesand mans inhumanity to man from running rampant?

Indeed,with two world wars, a Nazi-orchestrated genocidal holocaust, and a coupleof thermonuclear devices dropped in anger, within the next halfcentury to come, who could deny Nietzsches prophesy?

Nietzsches second great metaphor was the EternalRecurrence (Ewige Wiederkehr). Nietzsche proposed a model of the universe that repeats itself identically in infinite iterations of recycling. What has happenednow has happened an infinite number of times before and will continue toreplicate itself infinitely into the future. Whether this theory is scientificallytenable is not the point the metaphor is designed to convey meaninglessness and futility on a cosmic scale. What meaning can there be in an everlastingly recyclable universe, with no God supervising what is just and what is right?Evidently none all issmoke,mirrors and pygmy hominid delusions of fakegrandeur and puffed upself-importance.

Enter the Ubermensch or Superman! Nietzsches Superman, his prototype version of Laskers Macheide, is not some jack-booted fascist,oppressing lesser breeds who have failed a crudely misinterpretedtest ofDarwins aphoristic formulation Survivalof the Fittest, but that person who can, with full spiritual conviction, continue to act andfunction,as if there were meaning, but in a meaningless universe. It is the power of the brain,not of the boot, which Nietzsche extols, and Russell was surely aware of the stickiness of his wicket, when he hypocritically consigned Nietzsche to the historicalscrapheap ofrejectedand defeated Nazi detritus.

The tarnishing of Nietzsche by association with the Nazis was originally the work of his sister. Elizabeth Foerster Nietzschewasan opportunisticright wing fanatic who had tried, and failed, to establish a German extremist colony in the jungles of South America. It wasshewhotook control ofNietzsches literary heritage after his death and perverted the messageof the Superman to fit the deranged fantasies of the new would be worldcolossus, Adolf Hitler. The Great Dictator often visited the NietzscheMuseum to greet the sister of the Great Philosopher, but it is doubtful whether Hitler ever read a word of Nietzsches,or would have understood it,even if he had.

The influences on Nietzsche are manifold andcomplex.In terms, though,of literary antecedents it seems to me that the grand soliloquy in Shakespeares Macbeth:Tomorrow and tomorrow and tomorrow, creeps in this petty pace from day to daywhere the eponymous anti-hero fantasises on the futilityofexistence,but nevertheless decides to soldier on, must have playedsome part in Nietzsches philosophical grounding. Even more likely is thatGoethes Faustepic, predicated on the morally, scientifically and religiouslyvariegatedcareerofa man whoachieves salvation through personal striving (Werimmerstrebendsich bemueht, den koennen wirerloesen) must have underpinned Nietzsches concept of the Superman, as one who imbues existence with meaning, solely by the exertion of iron will and personal effort.

Nietzsche lived alone for much of his life,and died in mentalisolation, immobile, insensate, and cared for by asister who loathed him but loved the potential wealth and prestigeconcealed in his writings.The time has come, therefore, for those who have, at least, made some effort to understand him, to refurbish his reputation, andensure that this, at any rate, no longer stands alone against artificiallymanufactured tides of prejudice and wilfulmisinterpretation.

Chess has now acquired its own evolutionary Macheide and Superman in the shape of Demis Hassabiss AlphaZero program. How long did it take for this to evolve into a chess super brain? Just eight hours, during the course of which it grew in strength by playing billions of games against itself at the speed of light. This sounds more like arevolution of the rotund Baron Cuvier to me, rather than the alternative species of evolutionary gradualism promulgated by Darwin or Lamarck.

Take this weeks game Alpha Zero (computer) vs Stockfish (computer)to which I also referred in my column, Arise Sir Demis. Far from killing off chess, this reborn Macheide of the sixty-four squares has opened up remarkable new vistas for creative potential. Is it a surprise, therefore, thatthe wordsMacheideandmachineappearto share the same root in Ancient Greek?

AlphaZero wins by breaking all human rules. It invests material for vague compensation; its queen dashes around the board, with illusory aimlessness, even visiting that Ultima Thule of the chessboard, h1, one would have thought the least promising square from which a Queen might launch an attack. Finally, while serious material in arrears, it positively encourages exchange of pieces, ostensibly a suicidal decision. The final diabolical blow comes in this variation:

Diagram after Whites 36th move.

Here is the critical diagram for the Alpha Zero win. Black, who has been a knight ahead for a long time, developed its extra piece with Blacks 36th move Nd7 when the following move, Whites 37th, Rd1 pinning the Knight, wins for White. But what if Black defends, instead of apparently blundering, by choosing an alternative and seeminglymuch safer knight move?

1 Na6

But now comes the supremely cunning manoeuvre:

2 Qe5+ Qf6

And the death blow.

3 Rh7+

Winning Blacks Queen and thus winning the game.

Evolution on the chessboard? If the Macheide or Uebermensch comes in the form of an AI program, capable of such rich beauty and astonishing depth, then I am all for it.

Game Changer(published by New in Chess) is the book by Matthew Sadler and Natasha Regan, which I extolled earlier this year. It is unrivalled as an account of the adventure of the creation of AlphaZero and it has gone on, deservedly, to scoop the Book of the Year awards from both the English and World Chess Federations.

Original post:
Survival of the Fattest: Macheide and Superman - TheArticle

Facebook develops AI algorithm that learns to play poker on the fly – VentureBeat

Facebook researchers have developed a general AI framework called Recursive Belief-based Learning (ReBeL) that they say achieves better-than-human performance in heads-up, no-limit Texas holdem poker while using less domain knowledge than any prior poker AI. They assert that ReBeL is a step toward developing universal techniques for multi-agent interactions in other words, general algorithms that can be deployed in large-scale, multi-agent settings. Potential applications run the gamut from auctions, negotiations, and cybersecurity to self-driving cars and trucks.

Combining reinforcement learning with search at AI model training and test time has led to a number of advances. Reinforcement learning is where agents learn to achieve goals by maximizing rewards, while search is the process of navigating from a start to a goal state. For example, DeepMinds AlphaZero employed reinforcement learning and search to achieve state-of-the-art performance in the board games chess, shogi, and Go. But the combinatorial approach suffers a performance penalty when applied to imperfect-information games like poker (or even rock-paper-scissors), because it makes a number of assumptions that dont hold in these scenarios. The value of any given action depends on the probability that its chosen, and more generally, on the entire play strategy.

The Facebook researchers propose that ReBeL offers a fix. ReBeL builds on work in which the notion of game state is expanded to include the agents belief about what state they might be in, based on common knowledge and the policies of other agents. ReBeL trains two AI models a value network and a policy network for the states through self-play reinforcement learning. It uses both models for search during self-play. The result is a simple, flexible algorithm the researchers claim is capable of defeating top human players at large-scale, two-player imperfect-information games.

At a high level, ReBeL operates on public belief states rather than world states (i.e., the state of a game). Public belief states (PBSs) generalize the notion of state value to imperfect-information games like poker; a PBS is a common-knowledge probability distribution over a finite sequence of possible actions and states, also called a history. (Probability distributions are specialized functions that give the probabilities of occurrence of different possible outcomes.) In perfect-information games, PBSs can be distilled down to histories, which in two-player zero-sum games effectively distill to world states. A PBS in poker is the array of decisions a player could make and their outcomes given a particular hand, a pot, and chips.

Above: Poker chips.

Image Credit: Flickr: Sean Oliver

ReBeL generates a subgame at the start of each game thats identical to the original game, except its rooted at an initial PBS. The algorithm wins it by running iterations of an equilibrium-finding algorithm and using the trained value network to approximate values on every iteration. Through reinforcement learning, the values are discovered and added as training examples for the value network, and the policies in the subgame are optionally added as examples for the policy network. The process then repeats, with the PBS becoming the new subgame root until accuracy reaches a certain threshold.

In experiments, the researchers benchmarked ReBeL on games of heads-up no-limit Texas holdem poker, Liars Dice, and turn endgame holdem, which is a variant of no-limit holdem in which both players check or call for the first two of four betting rounds. The team used up to 128 PCs with eight graphics cards each to generate simulated game data, and they randomized the bet and stack sizes (from 5,000 to 25,000 chips) during training. ReBeL was trained on the full game and had $20,000 to bet against its opponent in endgame holdem.

The researchers report that against Dong Kim, whos ranked as one of the best heads-up poker players in the world, ReBeL played faster than two seconds per hand across 7,500 hands and never needed more than five seconds for a decision. In aggregate, they said it scored 165 (with a standard deviation of 69) thousandths of a big blind (forced bet) per game against humans it played compared with Facebooks previous poker-playing system, Libratus, which maxed out at 147 thousandths.

For fear of enabling cheating, the Facebook team decided against releasing the ReBeL codebase for poker. Instead, they open-sourced their implementation for Liars Dice, which they say is also easier to understand and can be more easily adjusted. We believe it makes the game more suitable as a domain for research, they wrote in the a preprint paper. While AI algorithms already exist that can achieve superhuman performance in poker, these algorithms generally assume that participants have a certain number of chips or use certain bet sizes. Retraining the algorithms to account for arbitrary chip stacks or unanticipated bet sizes requires more computation than is feasible in real time. However, ReBeL can compute a policy for arbitrary stack sizes and arbitrary bet sizes in seconds.

See the original post:
Facebook develops AI algorithm that learns to play poker on the fly - VentureBeat

AlphaZero | Papers With Code

Convex Regularization in Monte-Carlo Tree Search Tuan Dam Carlo D'Eramo Jan Peters Joni Pajarinen 2020-07-01 Aligning Superhuman AI and Human Behavior: Chess as a Model System | Reid McIlroy-Young Siddhartha Sen Jon Kleinberg Ashton Anderson 2020-06-02 Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning Thomas M. Moerland Anna Deichler Simone Baldi Joost Broekens Catholijn M. Jonker 2020-05-15 Neural Machine Translation with Monte-Carlo Tree Search | Jerrod Parker Jerry Zikun Chen 2020-04-27 Warm-Start AlphaZero Self-Play Search Enhancements Hui Wang Mike Preuss Aske Plaat 2020-04-26 Accelerating and Improving AlphaZero Using Population Based Training Ti-Rong Wu Ting-Han Wei I-Chen Wu 2020-03-13 Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games Edward Hughes Thomas W. Anthony Tom Eccles Joel Z. Leibo David Balduzzi Yoram Bachrach 2020-02-27 Polygames: Improved Zero Learning Tristan Cazenave Yen-Chi Chen Guan-Wei Chen Shi-Yu Chen Xian-Dong Chiu Julien Dehos Maria Elsa Qucheng Gong Hengyuan Hu Vasil Khalidov Cheng-Ling Li Hsin-I Lin Yu-Jin Lin Xavier Martinet Vegard Mella Jeremy Rapin Baptiste Roziere Gabriel Synnaeve Fabien Teytaud Olivier Teytaud Shi-Cheng Ye Yi-Jun Ye Shi-Jim Yen Sergey Zagoruyko 2020-01-27 Three-Head Neural Network Architecture for AlphaZero Learning Anonymous 2020-01-01 Self-Play Learning Without a Reward Metric Dan Schmidt Nick Moran Jonathan S. Rosenfeld Jonathan Rosenthal Jonathan Yedidia 2019-12-16 Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Julian Schrittwieser Ioannis Antonoglou Thomas Hubert Karen Simonyan Laurent Sifre Simon Schmitt Arthur Guez Edward Lockhart Demis Hassabis Thore Graepel Timothy Lillicrap David Silver # 1 ATARI GAMES ON ATARI 2600 ROBOTANK 2019-11-19 Multiplayer AlphaZero | Nick Petosa Tucker Balch 2019-10-29 Exploring the Performance of Deep Residual Networks in Crazyhouse Chess | Sun-Yu Gordon Chi 2019-08-25 Performing Deep Recurrent Double Q-Learning for Atari Games Felipe Moreno-Vera 2019-08-16 Multiple Policy Value Monte Carlo Tree Search Li-Cheng Lan Wei Li Ting-Han Wei I-Chen Wu 2019-05-31 Learning Compositional Neural Programs with Recursive Tree Search and Planning Thomas Pierrot Guillaume Ligner Scott Reed Olivier Sigaud Nicolas Perrin Alexandre Laterre David Kas Karim Beguir Nando de Freitas 2019-05-30 Deep Policies for Width-Based Planning in Pixel Domains | Miquel Junyent Anders Jonsson Vicen Gmez 2019-04-12 Improved Reinforcement Learning with Curriculum Joseph West Frederic Maire Cameron Browne Simon Denman 2019-03-29 Hyper-Parameter Sweep on AlphaZero General | Hui Wang Michael Emmerich Mike Preuss Aske Plaat 2019-03-19 -Rank: Multi-Agent Evaluation by Evolution Shayegan Omidshafiei Christos Papadimitriou Georgios Piliouras Karl Tuyls Mark Rowland Jean-Baptiste Lespiau Wojciech M. Czarnecki Marc Lanctot Julien Perolat Remi Munos 2019-03-04 Accelerating Self-Play Learning in Go | David J. Wu 2019-02-27 ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero | Yuandong Tian Jerry Ma Qucheng Gong Shubho Sengupta Zhuoyuan Chen James Pinkerton C. Lawrence Zitnick 2019-02-12 The Entropy of Artificial Intelligence and a Case Study of AlphaZero from Shannon's Perspective Bo Zhang Bin Chen Jin-lin Peng 2018-12-14 Assessing the Potential of Classical Q-learning in General Game Playing | Hui Wang Michael Emmerich Aske Plaat 2018-10-14 ExIt-OOS: Towards Learning from Planning in Imperfect Information Games | Andy Kitchen Michela Benedetti 2018-08-30 Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization | Alexandre Laterre Yunguan Fu Mohamed Khalil Jabri Alain-Sam Cohen David Kas Karl Hajjar Torbjorn S. Dahl Amine Kerkeni Karim Beguir 2018-07-04 Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm | David Silver Thomas Hubert Julian Schrittwieser Ioannis Antonoglou Matthew Lai Arthur Guez Marc Lanctot Laurent Sifre Dharshan Kumaran Thore Graepel Timothy Lillicrap Karen Simonyan Demis Hassabis # 1 GAME OF SHOGI ON ELO RATINGS 2017-12-05

Here is the original post:
AlphaZero | Papers With Code

AlphaZero learns to play the game at the highest level

A Group of scientists from the group of DeepMind and University College London have developed artificial intelligence, able to self-learn the game and improve in three challenging Board games. In his work, published in the journal Science, the researchers describe their new system and explained why I think it is a big step towards the development of future AI systems.

20 years have Passed since then, as the supercomputer Deep Blue defeated the world chess champion Gary Kasparov and showed the world how far advanced calculations in the field of AI. Since computers became smarter and today beat people in games such as chess, Shogi and go. However, each of these programs is tuned specifically to become a master in a single game. In his new work, the researchers described the creation of artificial intelligence that is not only good in a few games, but also to teach this to improve yourself.

The New system is called AlphaZero is a system of reinforcement learning, that is learning, repeatedly playing the game and learning from their experiences. This, of course, very similar to the process of teaching people. Specifies a basic set of rules and the computer plays a game with himself. He even does not need partners. He plays with himself a lot of times, noting the good and victorious moves. Over time it gets better and better, is superior not only people, but other AI systems designed for Board games. This system also used a technique called "search tree search Monte-Carlo". The combination of two technologies has allowed the system to learn how to improve in the game. Scientists tested the strength of the program, and providing a large capacity 5000 tensor of processors and is paired with a large supercomputer.

At the moment AlphaZero has mastered chess, Shogi and go. The next step will be the popular video games. As for performance, AI, in, for example, AlphaZero beat legendary AlphaGo in 30 hours.

What do you think, when the blast of artificial intelligence? Tell us in our

See the article here:
AlphaZero learns to play the game at the highest level