Based on a discussion of surveys by M. Kandori (on evolution) and R. Marimon (on learning) presented at the 1995 World Econometric Society Meetings in Tokyo, Japan. Extensive references to the research discussed below can be found in those surveys.

Remarks on Evolution and Learning

David K. Levine

September 14, 1995

Kandori presented an extensive survey on evolution; Marimon a survey on learning. In this discussion, I want to try to connect the two topics. To begin with any model of learning has to face a basic problem with the question of causality. If other people are learning or evolving, a patient player should try to teach them to play the way he wants them to. This issue of causality is related to what Marimon calls "feedback." Ordinarily the issue of causality is finessed by assuming a large population that interacts anonymously. Fortunately, this is the most relevant case in economics. However I should emphasize the tremendous scope for research in the non-anonymous case; some work in this direction is discussed by Marimon when he talks about the pattern recognition problem.

By their nature evolutionary models deal only with the large anonymous case. So I will focus the remainder of the discussion on this case. The first half of Kandori's survey focuses on evolutionarily stable strategies and related notions. Two major dynamical notions are discussed: the replicator and best-response dynamic. The difference between the two dynamics helps demonstrate the difference between the evolutionary and learning models. The evolutionary model deals directly with the behavior of entire populations, without inquiring why individuals in the population are behaving the way they are, or whether their behavior makes sense. The replicator dynamic is prototypical of such a model: it assumes that any strategy that is doing better than average grows, even if it does only slightly better than the average. From an individual perspective, this does not make a great deal of sense: if a player is going to change his strategy, why switch to something mediocre? Even from this crude perspective, the best-response dynamic seems more sensible, as at least players are switching to the best available strategy.

It is possible to defend the replicator dynamic. The most sensible argument is that information is local in the sense that new players inherit information from a limited number of old players. However, the exact information received from old players is important: if new players receive information about how old players opponents played, they should construct an estimate of the population frequency of opponents play and play a best response to it, not simply adopt the best strategy that was used by the old players, for example.

The point is not that there is no rationalization of the replicator dynamic. The issue is really about the right way of doing research. Do we start with the replicator dynamic, and argue backwards to try to find stories that give rise to it? Or do we take a more positive approach, and begin by specifying the primitives of the economy: the players in the game, their information, their behavior, and so forth, and then try to derive conclusions about the resulting dynamics. My sense is that economics in the last 20 years has moved strongly away from the former strategy to the latter, and with good reason.

By way of contrast, the best-response dynamic arises rather naturally from a particular economic model of learning, and in my view is more important and significant for economics. We can begin from a learning perspective by observing that fictitious play (or some slightly randomized version of it) has some good properties as a method of learning from an individual perspective: players do as well as if they knew the frequency of opponents' play in advance. When all players play this way, in the long-run, if we measure time in the appropriate way, the dynamical system asymptotes to the best-response dynamic. While the standard view in evolution is that the replicator and best-response dynamic are more or less equal, I find from the perspective of an economist that the best-response dynamic is far more interesting.

I should point out that many results are robust to the particular dynamic, since in all cases we agree that strategies that are doing poorly should die out. This offsets the criticism somewhat. For this reason, and because much of the discussion revolves around the best-response dynamic, many of the results Kandori discusses are quite relevant for economics, even though evolutionarily stable strategies have not established themselves as being of terribly great interest to economists. Of particular importance in Kandori's discussion are the issues of single versus multiple populations, as this arises in any large anonymous model, regardless of the behavior of the players.

The second half of Kandori's survey moves away from deterministic models to examine explicitly random models. In fact most of these models examine the best-response dynamic. More to the point, I argue that random behavior is very important in economics. To Kandori's discussion I would add the importance of considering the sources of randomness. I would not view random behavior as a "mutation." There are, however, three strong rationales for players playing randomly. First, there is the random utility model used in the theory of purifying mixed strategy equilibrium. This random utility model has also been widely used in empirical work. A second source of random play by individual players is in an extensive form setting where patient players must experiment with different strategies to explore how their opponents will respond in different parts of the game tree. Finally, random play can serve as insurance against manipulation by a clever opponent.

Once again the point I would like to emphasize is that the particular source of randomness is important in modeling. In other words, I do not think it reasonable to say "there are three good reasons why play might be random, and this justifies my sticking a random error term into the model, without reference to any particular model of random play." One of the primitives of the model should be the behavior of players, and this may be random if there is good reason.

I want to conclude by asking what we might hope to learn from theories of learning and evolution, and what we have learned so far. I think it is unrealistic to think we can learn much useful about the details of dynamic adjustment procedures, and indeed neither Kandori nor Marimon emphasize such results. What we can hope to do is to learn more about equilibrium. In Marimon's terminology, what we can hope to have is a theory of which equilibria are learnable. I would emphasize two questions about equilibrium:

can we narrow down (refine) the range of equilibria that learning procedures will converge to?
can we broaden the range of equilibria to allow for incomplete information and non-convergence?

I turn first to the issue of narrowing the range of equilibria. As Kandori has emphasized, traditional refinements based on backwards and forwards induction cannot select between strict equilibria. Considerations from "mutation" models can. Here I would emphasize the results that have proven to be extremely robust, especially the strong criterion of ½ dominance. This has the implication that inefficient equilibria may arise if they are less risky than efficient equilibria. To offset this somewhat, is the fact that in general increasing the efficiency of an equilibrium improves its risk dominance as well.

I would like to emphasize one weakness in current models in this area. There is a problem with the length of the long-run in the mutation model. In the large anonymous population case, the time to convergence is too long to be interesting. Kandori point out that in a model of local interaction the rate of convergence is much faster, although the equilibrium selected is similar. However, this local interaction model runs into the problem of causality discussed above: if you interact with only a few neighbors, shouldn't you try to teach them to play the game the way you want them to?

Finally, let me wrap up by emphasizing that while one accomplishment of the theory of learning/evolution is in enhancing our understanding of the importance of risk-dominance, another has been to suggest weaker criterion for equilibria. There is an extensive discussion of this in Marimon. The highlights are that in extensive form games in the short-run we may not expect to see players rationally anticipating opponents play off the equilibrium path, leading to a theory of self-confirming or subjective equilibrium. More broadly, if the learning procedure does not converge at all, but has some weak properties such as universal consistency or calibration, the dynamic path must remain near the set of either marginal best-response distributions (universal consistency) or correlated equilibrium (calibration).