For our team’s final project we decided to build off of the in class lecture on the Prisoner’s Dilemma. We were excited by the in class battle of strategies, where the teams that scored the most points against the other teams won. When deciding what direction to take our project we decided to use the Genetic Programming we leaned in class and apply it to creating strategies for Prisoner’s Dilemma. In class we learned about Genetic Algorithms leading to the development of successful strategies in checkers, and we decided that a similar approach could be used for Prisoner’s Dilemma.
Prisoner’s Dilemma is explained thoroughly at http://en.wikipedia.org/wiki/Prisoner's_dilemma, for our game the payoffs are: (1, 1) (5, 0) (0, 5) (3, 3).
For the purposes of our project we decided that a game will consist of 20 rounds where the prisoner can decide to cooperate or defect. On the first round the prisoner will not know what the other prisoner is doing, on the second round the prisoner will know what the other prisoner did in the first round, on the third he will know the results of the first and second round, and on the fourth he will know the results of the first, second, and third round. For every round after that through 20, the prisoner will only remember the results of the most recent three rounds.
The strategy employed by the prisoner will be hard coded before the start of the game, so a strategy describe the action (cooperate or defect) of the prisoner for the 1st, 2nd, 3rd, and 4th-20th rounds depending on the actions of the other prison in the previous rounds. This strategy will be stored in a binary string that has a 0 or 1 (cooperate or defect) for the 1st round, 2nd round depending on the 1st, 3rd depending on the 1st and 2nd, and the action to take for the rest of the rounds depending on the past three rounds.
Our program creates 100 random binary strings that represent strategies and then those strategies are placed on a 2-dimensional grid, represented by colored squares. Each square or strategy plays a 20 round Prisoner’s Dilemma with the squares bordering it. The total points accumulated in the games it plays are stored and then compared to all the other strategies. Top performing strategies are kept, and new strategies are created by the mutation and cross-over of the top strategies.
Lastly each square is given a “niceness” value depending on the total number of scenarios the strategy will cooperate in. This number then corresponds to a color which the strategy’s square is colored in on the grid; nicer strategies are lighter colors while meaner strategies (more likely to defect) are darker colors. This creates a visualization so that users of the program can see dominant strategies win and take over parts of the grid (with their color) as the program runs.
I think our group project was a great success and really showed off how much we had learned about genetic algorithms as well as the Prisoner’s Dilemma. I think that one weakness of our final project was that after a while most of the strategies become similar and therefore are playing predominately similar versions of themselves. An easy fix to this would be to randomly insert newly generated strategies into the grid every so often to test the winning strategies against vastly different strategies.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment