Today I will tell and show how to make a Genetic Algorithm (GA) for a neural network so that it can play different games with it. I tried it on the game Pong and Flappy bird. It showed itself very well. I advise you to read it if you haven’t read the first article: “Creating a simple and efficient genetic algorithm for a neural network with Python and NumPy”, since I have modified my code that was shown in that article.
I divided the code into two scripts, in one the neural network plays a game, in the other it learns and makes decisions (the genetic algorithm itself). The code with the game is a function that returns a fitness function (it is needed to sort neural networks, for example, how long it lasted, how many points it earned, etc.). Therefore, the code with the games (there are two of them) will be at the end of the article. The genetic algorithm for the neural network for the game Pong and the game Flappy Bird differ only in parameters. Using the script I wrote and described in the previous article, I created a heavily modified genetic algorithm code for the game Pong, which I will describe most of all, since it was what I relied on when I created the GA for Flappy Bird.
First, we need to import modules, lists, and variables:
import numpy as np
import random
import ANNPong as anp
import pygame as pg
import sys
from pygame.locals import *
pg.init()
listNet = {}
NewNet = []
goodNet = []
timeNN = 0
moveRight = False
moveLeft = False
epoch = 0
mainClock = pg.time.Clock()
WINDOWWIDTH = 800
WINDOWHEIGHT = 500
windowSurface = pg.display.set_mode((WINDOWWIDTH, WINDOWHEIGHT), 0, 32)
pg.display.set_caption('ANN Pong')
AnnPong is a script with a game
listNet, NewNet, goodNet - lists of neural networks (we'll go into more detail later)
timeNN - fitness function
MoveRight, moveLeft - select neural network where to move
epoch - epoch counter
def sigmoid(x):
return 1/(1 + np.exp(-x))
class Network():
def __init__(self):
self.H1 = np.random.randn(6, 12)
self.H2 = np.random.randn(12, 6)
self.O1 = np.random.randn(6, 3)
self.BH1 = np.random.randn(12)
self.BH2 = np.random.randn(6)
self.BO1 = np.random.randn(3)
self.epoch = 0
def predict(self, x, first, second):
nas = x @ self.H1 + self.BH1
nas = sigmoid(nas)
nas = nas @ self.H2 + self.BH2
nas = sigmoid(nas)
nas = nas @ self.O1 + self.BO1
nas = sigmoid(nas)
if nas[0] > nas[1] and nas[0] > nas[2]:
first = True
second = False
return first, second
elif nas[1] > nas[0] and nas[1] > nas[2]:
first = False
second = True
return first, second
elif nas[2] > nas[0] and nas[2] > nas[1]:
first = False
second = False
return first, second
else:
first = False
second = False
return first, second
def epoch(self, a):
return 0
class Network1():
def __init__(self, H1, H2, O1, BH1, BH2, BO1, ep):
self.H1 = H1
self.H2 = H2
self.O1 = O1
self.BH1 = BH1
self.BH2 = BH2
self.BO1 = BO1
self.epoch = ep
def predict(self, x, first, second):
nas = x @ self.H1 + self.BH1
nas = sigmoid(nas)
nas = nas @ self.H2 + self.BH2
nas = sigmoid(nas)
nas = nas @ self.O1 + self.BO1
nas = sigmoid(nas)
if nas[0] > nas[1] and nas[0] > nas[2]:
first = True
second = False
return first, second
elif nas[1] > nas[0] and nas[1] > nas[2]:
first = False
second = True
return first, second
elif nas[2] > nas[0] and nas[2] > nas[1]:
first = False
second = False
return first, second
else:
first = False
second = False
return first, second
The sigmoid is used as the activation function.
In the Network class we define the parameters of the neural network, and in the predict function it tells us where to move in the game. (nas is short for Network answer), the epoch function returns the era of appearance of this AI for generation zero, since a separate variable is set for this in the Network1() class.
for s in range (1000):
s = Network()
timeNN = anp.NNPong(s)
listNet.update({
s : timeNN
})
listNet = dict(sorted(listNet.items(), key=lambda item: item[1]))
NewNet = listNet.keys()
goodNet = list(NewNet)
NewNet = goodNet[:10]
listNet = {}
goodNet = NewNet
anp.NPong(NewNet[0])
print(str(epoch) + " epoch")
print(NewNet[0].epoch)
print('next')
anp.NPong(NewNet[1])
print(NewNet[1].epoch)
print('next')
anp.NPong(NewNet[2])
print(NewNet[2].epoch)
print('next')
anp.NPong(NewNet[3])
print(NewNet[3].epoch)
print('next')
anp.NPong(NewNet[4])
print(NewNet[4].epoch)
print('next')
anp.NPong(NewNet[5])
print(NewNet[5].epoch)
print('next')
anp.NPong(NewNet[6])
print(NewNet[6].epoch)
print('next')
anp.NPong(NewNet[7])
print(NewNet[7].epoch)
print('next')
anp.NPong(NewNet[8])
print(NewNet[8].epoch)
print('next')
anp.NPong(NewNet[9])
print(NewNet[9].epoch)
print('that is all')
Here we run neural networks with randomly created weights and select the 10 worst ones from them, so that the genetic algorithm takes on all the work of raising them))) and shows them.
More details:
The fitness function returned from the game code is written to timeNN, then we add the AI and its timeNN value to the listNet. After the cycle, we sort the list, write the neural networks from listNet into NewNet, then we form a list and leave only ten.
for g in range(990):
parent1 = random.choice(NewNet)
parent2 = random.choice(NewNet)
ch1H = np.vstack((parent1.H1[:3], parent2.H1[3:])) * random.uniform(-2, 2)
ch2H = np.vstack((parent1.H2[:6], parent2.H2[6:])) * random.uniform(-2, 2)
ch1O = np.vstack((parent1. O1[:3], parent2. O1[3:])) * random.uniform(-2, 2)
chB1 = parent1.BH1 * random.uniform(-2, 2)
chB2 = parent2.BH2 * random.uniform(-2, 2)
chB3 = parent2.BO1 * random.uniform(-2, 2)
g = Network1(ch1H, ch2H, ch1O, chB1, chB2, chB3, 1)
goodNet.append(g)
NewNet = []
Here crossing and mutation occur. (Such points were described in more detail in the first article)
while True:
epoch += 1
print(str(epoch) + " epoch")
for s in goodNet:
timeNN = anp.NNPong(s)
listNet.update({
s : timeNN
})
goodNet =[]
listNet = dict(sorted(listNet.items(), key=lambda item: item[1], reverse=True))
goodNet = list(listNet.keys())
NewNet.append(goodNet[0])
goodNet = list(listNet.values())
for i in listNet:
a = goodNet[0]
if listNet.get(i) == a:
NewNet.append(i)
goodNet = list(NewNet)
listNet = {}
try:
print(NewNet[0].epoch)
anp.NPong(NewNet[0])
print('next')
print(NewNet[1].epoch)
anp.NPong(NewNet[1])
print('next')
print(NewNet[2].epoch)
anp.NPong(NewNet[2])
print('next')
print(NewNet[3].epoch)
anp.NPong(NewNet[3])
print('next')
print(NewNet[4].epoch)
anp.NPong(NewNet[4])
print('next')
print(NewNet[5].epoch)
anp.NPong(NewNet[5])
print('next')
print(NewNet[6].epoch)
anp.NPong(NewNet[6])
print('next')
print(NewNet[7].epoch)
anp.NPong(NewNet[7])
print('next')
except IndexError:
print('that is all')
for g in range(1000 - len(NewNet)):
parent1 = random.choice(NewNet)
parent2 = random.choice(NewNet)
ch1H = np.vstack((parent1.H1[:3], parent2.H1[3:])) * random.uniform(-2, 2)
ch2H = np.vstack((parent1.H2[:6], parent2.H2[6:])) * random.uniform(-2, 2)
ch1O = np.vstack((parent1. O1[:3], parent2. O1[3:])) * random.uniform(-2, 2)
chB1 = parent1.BH1 * random.uniform(-2, 2)
chB2 = parent2.BH2 * random.uniform(-2, 2)
chB3 = parent2.BO1 * random.uniform(-2, 2)
g = Network1(ch1H, ch2H, ch1O, chB1, chB2, chB3, epoch)
goodNet.append(g)
print(len(NewNet))
print(len(goodNet))
NewNet = []
Here we are already repeating ourselves, so I will only explain what has not been said before:
Here we take the first one on the list, that is, one of the best in the era, and compare its results with the rest, since very often there are several AIs that have achieved the same success. And these equal leaders will participate in mutations, we use the try method, since there may be less than 10 best in this era. And we also throw these neural networks into the next era without changes, since the descendants may be worse than their ancestors, that is, so that they do not degrade.
This is all according to the first code!
Let's move on to the game code. Here I will only explain what concerns AI training (I will post a link to the disk).
In the game Pong, the neural network played twice: the first time the ball bounces to the left, the second time - to the right
*whGo is a variable in the code (short for "where to go")
We return the time as a fitness function. The game has two almost identical functions, but in the second one we show everything on the screen, this is necessary so that we can see the progress after each era and when the neural network has completed the game, we determine this if it lasted more than 8000 updates in the first one.
After months of work and improvements, I managed to create a learning algorithm for the Pong game, but to be sure, I decided to test the AI not on my game, but on one created by another person (test for omnivorousness)))), I chose the Flappy Bird game on pygame from this video: https://youtu.be/7IqrZb0Sotw?feature=shared
Having slightly changed the game for the neural network, for example, I added variables for the distance from the bird to the pipe. There are 3 by 3, since we need to know the height of each pipe (y) and the distance by x, and there were no more than three pairs of pipes on the screen, so there are three by three (nine in total). Also after the collision the function was restarted and the third parameter, which is called rep of the function, was passed what kind of restart it was, if it was equal to three, then the game returned the fitness function to the Genetic Algorithm, and if it was zero, then we assign the value 0 to the time variable. Also, I did not write two very similar functions, but simply checked if the checkNN variable is True, then the screen needs to be updated. I also modified the training code
while True:
for event in pg.event.get():
if event.type == KEYDOWN:
if event.key == K_1:
showNN = True
epoch += 1
print(str(epoch) + " epoch")
if epoch < 10:
for s in goodNet:
timeNN = anp.NPong(s, False, 0, 0)
listNet.update({
s : timeNN
})
if epoch >= 10:
for s in goodNet:
timeNN = anp.NPong(s, False, 0, 1)
listNet.update({
s : timeNN
})
After the tenth epoch, due to the last parameter, which we change to one (in the game code I called this parameter varRe from the words variant of return), the game returns not the time, but the number of pipes before the collision (this way the neural network learns better)
howALot = 1000 - len(NewNet)
if howALot < 40:
howALot = 40
These three lines of code are needed if in the previous era of AI there were very, very many with the same result and the algorithm may stop learning, since it will have nothing to learn :-).
Afterwards I updated and accelerated my GA for FlappyBird, now all birds are launched simultaneously, so training accelerated from ~3-5 hours to 5-10 minutes when launched on CPU, that is, 50 times! How it works - I suggest you see for yourself: a small useful repetition of what has been covered!
That's all, if you have any questions, write in the comments, bye!
There is still a lot of new things ahead of us, this is the basis for the next ones: now I am working on the implementation of full AI with the help of evolutionary algorithms in an artificial environment, it will be interesting!
Top comments (0)