Robert Axelrod, The Evolution of Cooperation


Preface

Tit for tat is the most efficient strategy for iterated prisoner's dilemma.

It starts by cooperating, and then does whatever the other player did in the previous move.

Chapter 1

For me, a typical case of the emergence of cooperation is the development of patterns of behavious in a legislative body such as the United States senate. Each senator has an incentive to appear effective to his or her constituents, even at the expense of conflicting with other senators who are trying to appear effective to THEIR constituents. But this is hardly a situation of completely opposing interests, a zero-sum game. On the contrary, there are many opportunities for mutually rewarding activities by two senators. These mutually rewarding actions have led to the creation of an elaborate set of norms, or folkways, in the Senate. Among the most important of these is the norm of reciprocity – a folkway which involves helping out a colleague and getting repaid in kind. It includes vote trading but but extends to so many types of mutually rewarding behaviour that it “is not an exaggeration to say that reciprocity is a way of life in the Senate”. p. 5.

Cooperation theory in this book – indviduals pursue their own self-interest without the aid of a central authority to force them to cooperate.

Prisoner's dilemma is characterized by the payoff structure:

T – Temptation to defect (defecting, while the other player cooperates)
R – Reward for mutual cooperation
P – Punishment for mutual defection
S – sucker's payoff (cooperating, while the other player defects)

T > R > P > S


This book is interested in the iterated prisoner's dilemma, when the game is played several times.

Discount parameter

Payoff in the next move counts less than payoff in current move. Discount parameter w tells how important future payoffs are.

This parameter is using for calculating cumulative value of a sequence of actions.



Chapter 3

Collective stability. Imagine there is a society, where every individual use TIT FOR TAT. TIT FOR TAT is a strategy, which yields a certain payoff. A new strategy is said to invade a native strategy, if the newcomer gets a higher payoff with a native than a native gets with another native. A strategy is collectively stable, if no strategy can invade it.

Note: Appendix B of the book shows, which collectively stable strategies exist for iterated prisoner's dilemma. We will need it for analyzing Russia's virtual economy.

No rule can invade TIT FOR TAT, if the discount parameter is sufficiently large.

TIT FOR TAT has a memory of one move. There are two alternatives to it: DC (defect-cooperate) and DD (always defect).

DC gets T at the first move and sucker's payoff in the second (retaliation of TIT FOR TAT).

DD gets T at the first move, and P in the second (punishment for mutual defection).

That means, DC or DD can invade TIT FOR TAT, if the future is not important.

TIT FOR TAT is collectively stable if and only if, w is large enough. This critical value of w is a function of the four payoff parameters, T, R, P and S.


One specific implication is that if the other player is unlikely to be around much longer because of apparent weakness, then the perceived value of w falls and the reciprocity of TIT FOR TAT is no longer stable. We have Caesar's explanation of why Pompey's allies stopped cooperating with him. «They regarded his [Pompey's] prospects as hopeless and acted according to the common rule by which a man's friends become his enemies in adversity». . 59


Live and let live. In the trench warfare of the World War I, the French fired two shots for every one, which came over, but never fired first. The soldiers behaved in this way between large battles (in those, they obeye their commands).


Any strategy, which may be the first to cooperate can be collectively stable only when w is sufficiently large.

The reason is that for a strategy to be collectively stable it must protect itself from invasion by any challenger, including the strategy, which always defects. If the native strategy ever cooperates, ALL D will get T on that move. On the other hand, the population average among the natives can be no greater than R per move. So in order for the population average to be no less than the score of the challenging ALL D, the interaction must last long enough for the gain from the temptation to be nullified over future moves.

For a nice (the one, which never defects first) strategy to be collectively stable, it must be provoked by the very first defection of the other player.

The reason is simple enough. If a nice strategy were not provoked by a defection on move n, then it would not be collectively stable because it could be invaded by a rule, which defected onl y on move n.

ALL D (defects always) is always collectively stable.

But ALL D can be invaded, if the newcomers arrive in small clusters (and not one-by-one).

Then, TIT FOR TAT cooperates with ALL D and with TIT FOR TAT. The profits of cooperations between two TIT FOR TATs can be higher than the losses of cooperation between TIT FOR TAT and ALL D.

Pairing of the invididuals must not be random (if it is random, TIT FOR TAT has no chance of meeting another TIT FOR TAT).

A strategy is maximally discriminating if it will eventually cooperate even if the other has never cooperated yet, and once it cooperates will never cooperate again with ALL D but will always cooperate with another player using the same strategy as it uses.

The strategies which can invade ALL D in a cluster with the smallest value of p are those which are maximally discrimination, such as TIT FOR TAT.

If a nice strategy cannot be invaded by a single individual, it cannot be invaded by any cluster of individuals together.

Chapter 4

The live and let live system was destroyed by the institution of raids (attacking the enemy in his own trenches). Large battles and raids could be monitored, therefore no cooperation established.

So long as the interactions between small military units were not monitored by the high command, live and let live functioned properly.

Note: monitor to destroy cooperation.

The cooperation between troops was supported by demonstrations of retaliatory ability – the troops shot sporadically (without hurting the enemy) in order to show that they HAD the power to kill, but didn't use it. -> this was a RITUAL (use of small arms and deliberately harmless use of artillery at the same time of day/week, predictable for the enemy).

These rituals conveyed two messages: a) aggression (to the high command)
b) peace (to the enemy)

Chapter 5

Evolutionarily stable strategy

A strategy is evolutionarily stable, if a population of individuals using that strategy cannot be invaded by a rare mutant adopting a different strategy.

We ignore this chapter because we are not interested in biology.


Chapter 6 – How to choose effectively

Follow four rules, when you are in a Prisoner's dilemma:

1) Don't be envious
2) Don't be the first to defect
3) Reciprocate both cooperation and defection
4) Don't be too clever




Don't be envious

People tend to resort to the standard of comparison that they have available – and this standard is often the success of the other player relative to their own success. This standard leads to envy. And envy leads to attempts to rectify any advantage the other player has attained. In this form of Prisoner's Dilemma, rectification of the other's advantage can only be done by defection. But defection leads to more defection and to mutual punishment. So envy is self-destructive. p. 111.

Asking how well you are doing compared to how well the other player is doing is not a good standard unless your goal is to destroy the other player.

A better standard of comparison is how well you are doing relative to how well someone else could be doing in your shoes.

TIT FOR TAT won the tournament because it did well in its interactions with a wide variety of other strategies. On average, it did better than any other rule with the other strategies in the tournament. Yet TIT FOR TAT never once scored better in a game than the other player!

So in a non-zero-sum world you do not have to do better than the other player to do well for yourself.

Congress provides a good example. Members of Congress can cooperate with each other without providing threats to each other's standing at home. The main threat to a legislator is not the relative success of another legislator from another part of the country, but from someone who might mount a challenge in the home district. Thus there is not much point in begrudging a fellow legislator the success that comes from mutual cooperation.

Don't be the first to defect

It pays to defect, if the future has no value. Gypsies (people who move from one place to another) do not pay doctor's bills (because there are plenty of doctors in town), but pay bills of the garbage service (because there is only one garbage collection service in town, and because they visit the town often).

Short interactions are not the only condition which would make it pay to be the first to defect.

The other possibility is that cooperation will simply not be reciprocated.

Don't be too clever

The very sophisticated rules did not do better than the simple ones (in the computer tournament).

Some of these complex rules did not take into account that MY decision affects the decision of the other player.

Permanent retaliation is also a too clever (too harsh) strategy.

Probabilistic strategies may be so complex that the other player will regard them as random, chaotic decisions. A good strategy must be understood by the other player.

Of course, in many human situations a person using a complex rule can explain the reasons for each choice to the other player. Nevertheless, the same problem arises. The other player may be dubious about the reasons offered when they are so complicated that they appear to be made up especially for that occasion. In such circumstances, the other player may well doubt that there is any responsiveness worth fostering. The other player may thus regard a rule that appears to be unpredictable as unreformable. This conclusion will naturally lead to defection.

Once again, there is an important contrast between a zero-sum game like chess and a non-zero sum game like the iterated Prisoner' Dilemma. In chess, it is useful to keep the other player guessing about your intentions. The more the other player is in doubt, the less efficient will be his or her strategy. Keeping one's intentions hidden is useful in a zero-sum setting where any inefficiency in the other player's behaviour will be to your benefit.

Chapter 7 – How to promote cooperation

Enlarge the shadow of the future

Wedding is a public act designed to celebrate and promote the durability of a relationship.

Make interactions more frequent




A good way to increase the frequency of interactions between two given individuals is to keep others away. Any form of specialization tending to restrict interactions to only a few others would tend to make the interactions with those few more frequent. This is one reason why cooperation emerges more readily in small towns than in large cities. Frequent interactions help promote stable cooperation.

Hierarchy and organization are especially effective at concentrating the interactions between specific individuals.

Concentrating the interactions so that each individual meets often with only a few others has another benefit besides making cooperation more stable. It also helps get cooperation going. Even a small cluster of individuals can invade a large population of meanies (ALL D). The members of the cluster must have a nontrivial proportion of their interactions with each other, even though the majority of their interactions may be with the general population.

Concentrating the interactions is one way to make two individuals meet more often. In a bargaining context, another way to make their interactions more frequent is to break down the issues into small pieces. Example: arms control or disarmament treaty. Henry Kissinger arranged for the Israeli disengagement from the Sinai after the 1973 war to proceed in stages that were coordinated with Egyptian moves leading to normal relationships with Israel. Businesses prefer to ask for payment for large orders in phases, as the deliveries are made, rather than to wait for a lump sum at the end.

Change the payoffs

“There ought be a law against this sort of thing”.

Paying taxes – costs are so direct, benefits are so diffuse. Government changes the costs of non-paying taxes by punishing the person with jail.

Original prisoner's dilemma with two gangsters: If they belonged to an organized gang, they could anticipate being punished for squealing. This might lower the payoffs for double-crossing their partner so much that neither would confess – and both would get the relatively light sentence that resulted from the mutual cooperation of their silence.

Even a small transformation of the payoffs might help make cooperation based on reciprocity stable, despite the fact that the interaction is still a Prisoner's Dilemma. The reason is that the conditions for stability of cooperation are reflected in the relationship between the discroung parameter w, and the four outcome payoffs, T, R, S and p. What is needed is for w to be large enough relative to these payoffs.

Teach people to care about each other

Be altruistic to anyone at first, but then only to those, who reciprocate.

Teach reciprocity

The trouble with TIT FOR TAT is that once a feud gets started, it can continue indefinitely. Indeed, many feuds seem to have just this property. For example, in Albania and the Middle East, a feud between families sometimes goes on for decades as one injury is repaid by another, and each retaliation is the start of the next cycle. A better strategy might be to return only nine-tenths of a tit for a tat.

Improve recognition abilities

In human affairs, limits on the scope of cooperation are often due to the inability to recognize the identity or the actions of the other players. This problem is especially acute for achievement of effective international control of nuclear weapons. The difficulty here is verification: knowing with an adequate degree of confidence what move the other player has actually made.

Chapter 8 – The social structure of cooperation

Four factors are examined, which can give rise to interesting types of social structure: labels, reputation, regulation and territoriality.

Label = fixed characteristic of a player, which can be observed by the other player
Reputation = information about the strategy that the player used in the past (and/or is likely to use now)



Regulation = relationship between a government an the governed
territoritality = players interact with their neighbors rather than with just anyone

Labels, stereotypes and status hierarchies

Labels make other people think about how a new (unknown) player will behave. They assume that he belongs to a group because he has some label.

Self-confirming stereotypes. A population of TIT FOR TAT, which consists of blue and green individuals. An individual is nice (not the first to defect) to individual of the same group, but not nice to the individual of another group.

Stereotypes can be stable even when they are not based on any objective differences.

This kind of stereotyping has two unfortunate consequences: one obvious and one more subtle. Subtle: everyone is doing worse than necessary because mutual cooperation between the groups could have raised everyone's score. Subtle: If one of the groups is larger than the other, then the smaller group suffers more than the larger. Therefore minorities often seek defensive isolation.

The minority suffers more because its members interact most of the time with the other group (non-cooperating group), while the majority interacts with majority group most of the time (which results in cooperation).

Labels can support status hierarchies.

Bullies

Suppose everyone uses the following strategy when meeting someone beneath them: alternate defection and cooperation unless the other player defects once, in which case never cooperate again. This is being a bully in that you are often defecting, but never tolerating a defection from the other player. And suppose that everyone uses the following strategy when meeting someone above them: cooperate unless the other defects twice in a row, in which case never cooperate again. This is being meek in that you are tolerating being a sucker on alternating moves, but it is also being provocable in that you are not tolerating more than a certain amount of exploitation.

This pattern of behaviour sets up a status hierarchy based on the obervable characteristic. The people near the top do well because they can lord it over nearly everyone. Coversely, the people near the bottom are doing poorly because they are being meek to almost everyone. It is easy to see why someone near the top is happy with the social structure, but is there anything someone near the bottom can do about it acting alone?

Actually there isn't. The reason is that when the discount parameter is high enough, it would be better to take one's medicine every other move from the bully than to defect and face unending punishment. Therefore a person at the bottom of the social structure is trapped. He o she is doing poorly, but would do even worse by trying to buck the system.

The futility of isolate revolt is a concsequence of the immutability of the other player's strategies. A revolt by a low-status player would actually hurt both sides. If the higher-status players might alter their behaviour under duress, then this fact should be taken into account by a lower-status player contemplating the revolt. But this consideration leads the higher-status players to be concerned with their reputation for firmness. To study this type of phenomena, one needs to look at the dynamics of reputations.

Reputation and deterrence

How much value does the information about the other's strategy have?

What is the value (or cost) of other players knowing YOUR strategy? It's bad, when other know that you are exploitable. If you use TIT FOR TAT, it is good for you, when other players know that you use this strategy.

Best reputation = repuation of being a bully (squeezing the most out of the other player while not tolerating any defections at all from the other). The way to squeeze the most out of the other is to defect so often that the other player just barely prefers cooperating all the time to defecting all the time.



And the best way to encourage cooperation from the other is to be known as someone who will never cooperate again if the other defects even once.

To become known as a bully you have to defeat a lot, which means retaliation of many players (until the reputation is established).

Other player may also want to establish a certain reputation.

Each side has an incentive to pretend not to be noticing what the other player is trying to do. This is not possible with TIT FOR TAT due to its simplicity.

One purpose of having a reputation is to enable you to achieve deterrence by means of a credible threat. You try to commit yourself to a response that you really would not want to make if the occasion actually arose.

Vietnam had just such a meaning to the American government. Definition of US aims in Vietnam:
70 percent – to avoid humiliating US defeat (to our reputation as a guarantor)
20 percent – to keep SVN (and adjacent) territory from the Chinese hands
10 percent – to permit the people of SVN to enjoy a better, freer way of life

The Government and the Governed

Examples:

*) state, which prosecutes violations of its law (these investigations cost much more than the actual violation of the law)
*) Empire and its provinces
*) Monopoly, which prevents other firms to enter the market (price wars)

The government is securing its reputation (of firmness).

Territoriality

Individuals interact more with their neighbors than with those, who are far away.

Territoriatlity can take several forms:
*) geography and physical space
*) abstract space of characteristics (market niche, political values)

Inviduals can imitate their neighors.

Colonization – successful strategies spread from place to place.

Chapter 9

In business, cooperation is based upon the expectation that one has to deal with each other a long time. If something comes up you get the other man on the telephone and deal with the problem. You don't read legalistic contract clauses at each other if you ever want to do business again. This attitude is so well established that when a large manufacturer of packaging materials inspected its records it found that it had failed to create legally binding contracts in two-thirds of the orders from its customers. The fairness of the transactions is guaranteed by the anticipation of mutually rewarding transactions in the future, not by a legal threat.

Perhaps the most common type of business contracts case fought all the way to the appellate courts is an action for a wrongful termination of a dealer's franchise by a parent company. Once a franchise is ended, there is no prospect for further mutually rewarding transactions between the franchiser and the parent company. Cooperaton ends, and costly court battles are often the result.