English

How does AlphaZero learn chess? - Chess.com

  • Home
  • Article
  • How does AlphaZero learn chess? - Chess.com
How does AlphaZero learn chess? - Chess.com
Images
  • By electronics-phone
  • 389 Views

AlphaZero's learning process is, to some extent, similar to that of humans. A new article from DeepMind, including a contribution from the 14th world chess champion Vladimir Kramnik, provides strong evidence for the existence of human-understandable concepts in AlphaZero's network, even though AlphaZero has never seen the least human chess game.

How does AlphaZero learn chess? Why does he play certain moves? What values ​​does he give to concepts such as the king's security or activity? How does it learn openings, and how does that differ from how humans developed theory?

Questions like these are addressed in a fascinating new article from DeepMind, titled Acquisition of Chess Knowledge in AlphaZero. It was written by Thomas McGrath, Andrei Kapishnikov, Nenad Tomasev, Adam Pearce, Demis Hassabis, Been Kim and Ulrich Paquet, in collaboration with Kramnik. This is the second cooperation between DeepMind and Kramnik, following their research last year when they used AlphaZero to explore the design of different chess variants, with new rules.

Encoding human conceptual knowledge

In their latest paper, the researchers tried a method of encoding human conceptual knowledge, to determine how well the AlphaZero network represents human chess concepts. Examples of such concepts are the bishop pair, the material (im)balance, the king's activity or security. These concepts have in common that they are pre-specified functions that encapsulate a particular piece of specific knowledge.

Some of these concepts have been carried over from Stockfish 8's evaluation feature, such as material, imbalance, activity, king security, threats, passed pawns, and space. Stockfish 8 uses them as sub-functions giving individual scores that lead to an "overall" rating, itself exported as a continuous value, such as "0.25" (a slight advantage for white) or "-1.48" ( a big advantage for black). Note that newer versions of Stockfish developed Alpha-Zero type neural networks but were not used for this article.

The third type of concepts encompasses more specific lower-level characteristics, such as the existence of forks, pins, or the contestation of columns, as well as a series of characteristics concerning the pawn structure.

After establishing this wide range of human concepts, the next step for the researchers was to try to find them in AlphaZero's network, for which they used a sparse linear regression model. After that, they started visualizing the learning of human concepts using what they call "what-when-where" graphs: what concept is learned, when in training time, where in the network .

According to the researchers, AlphaZero actually develops tightly-knit representations of a number of human concepts during training, including accurate stance evaluation, potential strikes and their consequences, and stance-specific characteristics. .

An interesting result concerns material imbalance. As demonstrated in Matthew Sadler and Natasha Regan's award-winning book, Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI (New In Chess, 2019), AlphaZero appears to view material imbalance differently than Stockfish 8. The article gives empirical evidence that this is the case at its highest level: AlphaZero initially 'follows' the evaluation of Stockfish 8's hardware as it learns, before turning away from it at some point.

Value of coins & hardware

The next step for the researchers was to relate human concepts to AlphaZero's value function. One of the first concepts they looked at was the value of pieces, a concept that a beginner learns when starting to play chess. Typical values ​​are nine for a queen, five for a rook, three for bishop and knight, and one for a pawn. The left figure below (taken from the article) shows the evolution of coin weight during AlphaZero training, with coin values ​​converging towards commonly accepted values.

The image on the right shows that while training AlphaZero, hardware becomes increasingly important in the early stages of chess learning (consistent with human learning) but it plateaus at a certain point, the values ​​of more subtle concepts such as mobility and king security become more important while the importance of hardware actually decreases.

Training AlphaZero Vs. Human History Knowledge

How does AlphaZero learn chess? - Chess.com

Another part of the document is devoted to the comparison between the training of AlphaZero and the progression of human knowledge throughout history. The researchers point out that there is a stark difference between the progression of AlphaZero's move preferences through the history of its training stages, and what is known about the progression of human understanding of chess since the 15th century :

AlphaZero starts out with a uniform opening book, allowing it to explore all options equally, and narrows down plausible options greatly over time. Human games recorded over the past five centuries indicate an opposite pattern: an overwhelming initial preference for 1.e4, with an expansion of plausible options over time.

Researchers compare the games AlphaZero plays against itself with a large sample taken from ChessBase's mega-database, starting with games from the year 1475 through the 21st century.

Initially, humans played 1.e4 almost exclusively, but 1.d4 was slightly more popular in the early 20th century, soon to be followed by the growing popularity of more flexible systems like 1.c4 and 1.Nf3. AlphaZero, on the other hand, tries a wide range of opening shots early in its training before it begins to value "main" shots.

The Berlin Variation of the Ruy Lopez

A more specific example concerns the Berlin Variation of the Ruy Lopez (the move 3...Nf6 after 1.e4 e5 2.Nf3 Nc6 3.Bb5) , which only became popular at the top level at the start of the 21st century, after Kramnik successfully used it in his World Championship match against Garry Kasparov in 2000. Prior to that, she was considered somewhat passive and slightly better for White, the move 3...a6 being preferable.

The researchers write:

Looking back, it took some time for human opening theory to fully appreciate the advantages of the Berlin defense and to establish effective ways to play Black in this position. On the other hand, AlphaZero develops a preference for this line of play quite quickly, after having mastered the basic concepts of the game. This already highlights a noticeable difference in the evolution of the openings between humans and the machine.

Remarkably, when different versions of AlphaZero are formed from scratch, half of them strongly prefer 3...a6, while the other half strongly prefer 3...Nf6! This is interesting because it means that there is no "one-size-fits-all" good chess player. The following table shows the preferences of four different AlphaZero neural networks:

AZ version 1AZ version 2AZ version 3AZ version 4
3…Nf65.50%92.80% 88.90%7.70%
3…a689.20%2.00% 4.60%85.80%
3… Bc50.70%0.80%< /td>1.30%1.30%

AlphaZero network preferences after 1. e4 e5 2 Nf3 Nc6 3. Bb5, for four different training cycles of the system (four different versions of AlphaZero). Anteriority is given after a million training steps. Sometimes AlphaZero converges to become a player who prefers 3...a6, and sometimes AlphaZero converges to become a player who prefers to respond with 3...Nf6.

In a similar vein, AlphaZero develops its own "theory" of openings for a much wider range during its training. At some point, 1.d4 and 1.e4 are discovered to be good starting moves and are quickly adopted. Similarly, AlphaZero's preferred straight after 1.e4 e5 is determined in another short time window. The figure below illustrates how 2.d4 and 2.Nf3 are quickly learned as reasonable moves for White, but 2.d4 is then abandoned almost as quickly in favor of 2.Nf3 as the standard response.

Kramnik's qualitative assessment

Kramnik's contribution to the document is a qualitative assessment, as an attempt to identify themes and differences in AlphaZero's playstyle at different stages of its formation. The 14th world champion was given sample games at four different stages to review.

According to Kramnik, early in its training, AlphaZero has "a crude understanding of the value of hardware and fails to properly value hardware in complex positions. This leads to potentially unwanted swap sequences, and eventually to parts lost for material reasons". In the second phase, AlphaZero appears to have “a strong understanding of hardware value, allowing it to take advantage of the weakness in hardware valuation” of the first release.

In the third step, Kramnik believes that AlphaZero has a better understanding of king security in unbalanced positions. This manifests as the second version "potentially underestimating the third version's long-range attacks and sacrifices, as well as the second version overestimating its own attacks, resulting in losing positions."

In its fourth stage of training, AlphaZero has a "much deeper understanding" of which attacks will succeed and which will fail. Kramnik remarks that he sometimes accepts the sacrifices played by the "third version", proceeds to a good defense, keeps the material advantage, and ends up winning.

Another point made by Kramnik, which is similar to how humans learn chess, is that tactical skills seem to precede positional skills when learning AlphaZero. By generating self-contained games on distinct opening sets (e.g., the Berlin or Queen's Gambit Declined in the "positional" set and the Najdorf and East Indian in the "tactical" set), researchers manage to provide circumstantial evidence but note that further work is needed to understand the order in which skills are acquired.

Implications outside of chess

For a long time it was believed that machine learning systems learned uninterpretable representations that had little in common with human understanding of the domain over which they were trained. In other words, how the AI ​​teaches itself and what it learns is mostly gibberish to humans.

In their latest paper, the researchers provided strong evidence for the existence of human-understandable concepts in an AI system that has not been exposed to human-generated data. AlphaZero's network shows the use of human concepts, even though AlphaZero has never seen a human chess game.

This could have implications outside of the chess world. The researchers conclude:

The fact that human concepts can be found even in a superhuman self-trained system expands the range of systems in which we should expect to find human-understandable concepts. We believe that the ability to find human-understandable concepts in the AZ network indicates that closer examination will reveal more.

Co-author Nenad Tomasev commented to Chess.com that for him personally it was really curious to consider if there is such a thing as a "natural" progression of chess theory:

Even in the human context - if we were to "start over" history, go back in time - would chess theory have evolved in the same way? There were a number of important schools of thought in terms of an overall understanding of chess principles and midgame positions: the importance of dynamism versus structure, material attacks versus sacrificial attacks, material imbalance, the importance of space in relation to the hypermodern school which invites excessive extension in order to counter-attack, etc. This also influenced the overtures that were played. Looking at this progression, what remains uncertain is whether it would happen the same way again. Perhaps certain elements of chess knowledge and certain perspectives are simply easier and more natural for the human mind to grasp and formulate? Maybe the process of refining and expanding this knowledge has a linear trajectory, or not? We can't really start the story over again, so we can only speculate what the answer might be.

However, when it comes to AlphaZero, we can retrain it many times and compare the results to what we've seen previously in human gaming. So we can use AlphaZero as a petri dish for this question, looking at how it acquires knowledge about the game. It turns out that there are both similarities and dissimilarities in the way it constructs its understanding of the game in relation to human history. Also, while there is some level of stability (results agree between different workouts), it's by no means absolute (sometimes workout progression looks a little different, and different lines of opening end up being preferred).

This is by no means a definitive answer to what is, to me personally, a fascinating question. There is still a lot to think about. However, we hope that our results offer some interesting perspective and allow us to start thinking a little deeper about how we learn, grow, improve - about the very nature of intelligence and how it changes from 'a blank slate to what is a deep understanding of a very complex field like chess.

Kramnik commented for Chess.com:

“There are two main things that we can try to find out with this work. The first is: how does AlphaZero learn chess, how does it improve? one day we come to fully understand it, maybe we can interpret it in the process of human learning.

Secondly, I think it's quite fascinating to discover that there are certain patterns that AlphaZero finds meaningful that actually make little sense to humans. This is my impression. This is actually a subject for further investigation, in fact I think it could easily be proven that we are missing some very important patterns in chess, because after all AlphaZero is so strong that if he uses those patterns, I guess they make sense. It's actually also a very interesting and fascinating subject to understand, if perhaps our way of learning chess, of getting better at chess, is actually quite limited. We can expand that a bit with the help of AlphaZero, understanding how he sees failure."