DEV Community

loading...
Cover image for Trex Game with CNN

Trex Game with CNN

Erol
I've been doing stuff with machine learning
Updated on ・7 min read

Trex Game with CNN

In this article, I created a model that plays Trex Game. I need to explain that, this model is not a Reinforcement Learning (RL) model. It is a simple CNN model that predicts objects and keyboard actions.

Content:

1) Getting Data
2) Train a CNN Model
3) Test the Trained Model in Game

Getting Data

First I started with getting data. In order to do that, I used some libraries;

import keyboard
import uuid
import time
from PIL import Image
from mss import mss
Enter fullscreen mode Exit fullscreen mode

Using the library called keyboard I saved my action for example "up", "down" and "right".
To import that, first, the library must be installed:

pip install keyboard
Enter fullscreen mode Exit fullscreen mode

Using this library called mss, I can record my screen for the game. When I run all lines, I'll switch the game screen and the model that is trained with the data I got, will predict action according to images. Also, this library (mss) helps me to cut off some areas in the screen. Thus, the model can only focuses on determined area. But at the first, this library must be installed like others. For that:

pip install mss
Enter fullscreen mode Exit fullscreen mode

After import libraries, I will set coordinates of the game, because in the screen there are a lot of useless stuff, in order to remove them I will determine some coordinates using Paint :)

size

mon = {"top":370,
       "left":700,
       "width":200,
       "height":145}
Enter fullscreen mode Exit fullscreen mode

Using the library called mss, I can easily cut off the area that I want the model to see only. Therefore I defined mss

sct = mss()
Enter fullscreen mode Exit fullscreen mode

Now, I will create a function that I will use for recording.

i = 0
def record_screen(record_id, key):
    global i # I will use this i inside and outside of the function. (I have an other i)

    i += 1
    print(f"{key}, {i}") #key: char of keyboard, i: num of press for char
    img = sct.grab(mon)
    im = Image.frombytes("RGB", img.size, img.rgb)
    im.save(f"data/img/{key}_{record_id}_{i}.png")
Enter fullscreen mode Exit fullscreen mode

After this record function, I need to define an other function for the exit. When I want to exit in recording, I will press "esc" and then the exit function will be called.
For that:

is_exit = False

def exit():
    global is_exit
    is_exit = True

keyboard.add_hotkey("esc", exit)
Enter fullscreen mode Exit fullscreen mode

After all, I can set last stuff. Here I will define

record_id = uuid.uuid4()
while True:

    if is_exit: break

    try:
        if keyboard.is_pressed(keyboard.KEY_UP):
            record_screen(record_id=record_id, key="up")
            time.sleep(0.1)

        elif keyboard.is_pressed(keyboard.KEY_DOWN):
            record_screen(record_id=record_id, key="down")
            time.sleep(0.1)

        elif keyboard.is_pressed("right"):
            record_screen(record_id=record_id, key="right")
            time.sleep(0.1)

    except RuntimeError: continue
Enter fullscreen mode Exit fullscreen mode

Now, I am ready to get data in game.
get_data


2) Train a CNN Model

In this content, I will train a CNN model using the data I got. The model will predict keyboard actions according to images. For example, if the model recognizes a cactus, the action will be "UP", or if the model recognizes a bird, the action will be "DOWN" or "UP". In order to take place that, first I start with that import libraries.

import glob
import os
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from PIL import Image
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

after libraries, I'll define path of images:

imgs = glob.glob("data/img/*.png")
Enter fullscreen mode Exit fullscreen mode

I'll resize my images

width = 125
height = 50
Enter fullscreen mode Exit fullscreen mode

Before training, I need to apply some operations like resize, normalization.
Here our labels are keyboard actions like "UP", "DOWN" "RIGHT". To reach this actions:
First of all I need to find file names, this line gives me that:

filename = os.path.basename(img)
Enter fullscreen mode Exit fullscreen mode

OUTPUT: "down_022f78bc-435f-4978-8524-ff1ea1a40d9a_1.png"

Then I'll use split method and separate them as "_", after that the first index will be our label.

label = filename.split("_")[0]
Enter fullscreen mode Exit fullscreen mode

OUTPUT: "down"

Also, you can check the record_screen function to how to save the images in the first section.
After determined labels, I'll resize and normalization images

im = np.array(Image.open(img).convert("L").resize((width, height)))
Enter fullscreen mode Exit fullscreen mode

Here is the all collectively steps what I told above;

X = [] # images ("cactus", "bird")
y = [] # labels ("up", "right", "down")

for img in imgs:

    filename = os.path.basename(img)
    label = filename.split("_")[0] # up, down, right
    im = np.array(Image.open(img).convert("L").resize((width, height)))
    im = im/255 # normalization
    X.append(im)
    y.append(label)
Enter fullscreen mode Exit fullscreen mode

For slipt data as train and test, I must convert them to array

X = np.array(X)
X = X.reshape(X.shape[0], width, height, 1)
Enter fullscreen mode Exit fullscreen mode

Now, I will apply firstly Label Encoding and then One Hot Encoding to Y (labels).
Thus, firstly labels will be numeric for example;
UP --> 0
DOWN --> 1
RIGHT --> 2
After Label Encoding I will apply One Hot Encoding
Thus, labels will be Binary Value, for example,
0 --> 000
1 --> 010
2 -->001
In order to do that, I define a function:

def one_hot_labels(values):

    # Label Encoding -> One Hot Encoding

    label_encoder = LabelEncoder()
    integer_encoded = label_encoder.fit_transform(values)
    integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)

    onehot_encoder = OneHotEncoder(sparse=False)
    onehot_encoded = onehot_encoder.fit_transform(integer_encoded)

    return onehot_encoded

# One Hote Encoding
Y = one_hot_labels(y)
Enter fullscreen mode Exit fullscreen mode

Using X and Y, I will split my data in 0.25 rate, the part of %75 will be train, other %25 will be test size.

# train test split
train_X, test_X, train_y, test_y = train_test_split(X, Y , test_size = 0.25, random_state = 2)
Enter fullscreen mode Exit fullscreen mode

After Split Data, I will create Convolutional Neural Network

# CNN Model
model = Sequential()
model.add(Conv2D(32, kernel_size = (3,3), activation = "relu", input_shape = (width, height, 1)))
model.add(Conv2D(64, kernel_size = (3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation = "relu"))
model.add(Dropout(0.4))
model.add(Dense(3, activation = "softmax"))

model.compile(loss="categorical_crossentropy",
              optimizer="Adam",
              metrics=["acc"])

# training            
model.fit(train_X, train_y, epochs = 35, batch_size = 64)              
Enter fullscreen mode Exit fullscreen mode

After training, I will evaluate Train and Test score

score_train = model.evaluate(train_X, train_y)
print("Training Score: %", score_train[1]*100)

score_test = model.evaluate(test_X, test_y)
print("Test Score: %", score_test[1]*100)
Enter fullscreen mode Exit fullscreen mode

training
To use trained model in the game, I need to save it.

# save weights
open("trex_model.json","w").write(model.to_json())
model.save_weights("trex_weight.h5")
Enter fullscreen mode Exit fullscreen mode

and finally, I have a Trained CNN Model.


3) Test the Trained Model in Game

In this section, I will test my model in the game as real-time.

For that, first I will import my libraries

from keras.models import model_from_json
import numpy as np
from PIL import Image
import keyboard
import time
import os
from mss import mss
Enter fullscreen mode Exit fullscreen mode

For screen, I'll set up my sizes

mon = {"top":370,
       "left":700,
       "width":200,
       "height":145}

sct = mss()

# size of images
width = 125
height = 50
Enter fullscreen mode Exit fullscreen mode

Now, I can upload my Trained CNN Model

# load model
model = model_from_json(open("trex_model.json", "r").read())
model.load_weights("trex_weight.h5")
Enter fullscreen mode Exit fullscreen mode

I will determine my labels again because the model predicts images, I make it choose in this list according to output of the model. I do not touch the keyboard, the model does.

#down:0, right:1, up:2
labels = ["Down", "Right", "Up"]
Enter fullscreen mode Exit fullscreen mode
framerate_time = time.time()
counter = 0
i = 0
delay = 0.4
key_down_pressed = False
while True:

    img = sct.grab(mon)
    im = Image.frombytes("RGB", img.size, img.rgb)
    im2 = np.array(im.convert("L").resize((width, height)))
    im2 = im2 / 255  # normalization

    X = np.array([im2])
    X = X.reshape(X.shape[0], width, height, 1)
    r = model.predict(X)

    result = np.argmax(r)

    if result == 0: #down: 0
        keyboard.press(keyboard.KEY_DOWN)
        key_down_pressed = True

    elif result == 2: #up: 2

        if key_down_pressed:
            keyboard.release(keyboard.KEY_DOWN)
            time.sleep(delay)

        keyboard.press(keyboard.KEY_UP)

        if i < 1500:
            time.sleep(0.3)

        elif 1500 < i and i < 5000:
            time.sleep(0.2)

        else:
            time.sleep(0.17)

        keyboard.press(keyboard.KEY_DOWN)
        keyboard.release(keyboard.KEY_DOWN)

    counter += 1

    if (time.time() - framerate_time) > 1:

        counter = 0
        framerate_time = time.time()

        if i <= 1500:
            delay -= 0.003

        else:
            delay -= 0.005

        if delay < 0:
            delay = 0

        print("----------------")
        print(f"Down: {r[0][0]}\nRight: {r[0][1]}\nUp: {r[0][2]}")

        i += 1
Enter fullscreen mode Exit fullscreen mode

Now, I need to explain what this while loop does.

First, I got an area according to the pixel (mon), and then I converted it. After Converting, I applied resize and normalization

    img = sct.grab(mon)
    im = Image.frombytes("RGB", img.size, img.rgb)
    im2 = np.array(im.convert("L").resize((width, height)))
    im2 = im2 / 255  # normalization
Enter fullscreen mode Exit fullscreen mode

After these steps, I turned it array and reshaped it. Because the input is known for model.
The np.argmax() returns the indices of the maximum values along an axis. I use it due to find the label maximum probability.

    X = np.array([im2])
    X = X.reshape(X.shape[0], width, height, 1)
    r = model.predict(X)

    result = np.argmax(r)
Enter fullscreen mode Exit fullscreen mode

If the result is 0 it means DOWN, using keyboard library, model will press DOWN.
And after that, I changed it as key_down_pressed = True

    if result == 0: #down: 0
    keyboard.press(keyboard.KEY_DOWN)
    key_down_pressed = True
Enter fullscreen mode Exit fullscreen mode

If the result is 2 it means UP, again using keyboard library model will press UP
Here, i<1500 is random number. (1500 is frame)

    elif result == 2: #up: 2

        if key_down_pressed:
            keyboard.release(keyboard.KEY_DOWN)
            time.sleep(delay)

        keyboard.press(keyboard.KEY_UP)

        if i < 1500:
            time.sleep(0.3)

        elif 1500 < i and i < 5000:
            time.sleep(0.2)

        else:
            time.sleep(0.17)
Enter fullscreen mode Exit fullscreen mode

After that, the dinosaur downs again but I must release it because, if I don't, the dinosaur keeps to stay down.

        keyboard.press(keyboard.KEY_DOWN)
        keyboard.release(keyboard.KEY_DOWN)
Enter fullscreen mode Exit fullscreen mode

Now here, I set up delay

    counter += 1

    if (time.time() - framerate_time) > 1:

        counter = 0
        framerate_time = time.time()

        if i <= 1500:
            delay -= 0.003 # 3 msec

        else:
            delay -= 0.005 # 5 msec

        if delay < 0:
            delay = 0

        print("----------------")
        print(f"Down: {r[0][0]}\nRight: {r[0][1]}\nUp: {r[0][2]}")

        i += 1
Enter fullscreen mode Exit fullscreen mode

ezgif com-gif-maker


You can find the project link:
https://github.com/ierolsen/Trex-Game-with-CNN

Discussion (0)