Ben Link

Posted on Nov 14, 2024

The Adventures of Blink S2e10: Artificially Intelligent Hangman (Season Finale!)

#ai #docker #opensource #python

Hey friends! Welcome to the Season Finale of The Adventures of Blink! Season 2 ends with today's post... if you're just now discovering me, you probably want to go back and start with S2e1 and work your way forward so this will make more sense!

A brief aside before we get to work

Thank you, fellow adventurers!

By the time this episode releases, this blog will pass TEN THOUSAND followers. Now I know for a fact that doesn't mean there are 10k of you actively adventuring with me (view counts are much, much lower than that- metrics are weird! 🤪) but I am grateful to have that level of exposure within the community! If you're a regular reader and you've enjoyed the series, I would love to hear from you - whether that's a comment or a ❤️ on the blog, or a comment or 👍🏻 on the youtube channel... I started The Adventures of Blink hoping to have some two-way conversation. Come interact and let's be friends!

Youtube

The Finale: AI Integration

I mean, it's 2024. If you haven't done AI integration, have you even created a product? 🤣 I chose to do this because it's a fun, practical use for LLM: Hangman is a game where you have to guess the contents of a phrase. But if you just have a database of phrases... your game has finite replayability. At some point you'll get a phrase from the database that you've played before, and the game loses some of its fun value.

Adding AI here increases that replayability - because you can create a non-deterministic expansion of your data. We don't know what the LLM is going to say in response to our query each time - so we can continually add on to our game board collection.

Architecture of our AI add-on

We're going to revisit a tool that we explored back in season 1: Ollama. As it turns out, ollama has been published to DockerHub as a container you can download! This will fit perfectly into our Hangman architecture - we just spin up a separate container to hold our LLM and then write some code to call it when we need it to give us a new game board.

Putting a Llama in a Container

We can't just put the ollama container into our code, though... because ollama by itself doesn't do much. You have to load a model into it, remember?

In the non-Docker version, we run

# at the time of this writing, 3.2 is the 
# latest version of the llama model
ollama pull llama3.2

in order to add our model so that Ollama can use it... so we're going to need to tell our container to do some work on startup.

Here's our Dockerfile for Ollama:

# Start with the official Ollama base image
FROM ollama/ollama:latest

# We needed curl available in the container
# for our next step...
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Copy a startup script into the container
COPY start_model.sh /usr/local/bin/start_model.sh
RUN chmod +x /usr/local/bin/start_model.sh

# Expose the necessary port for Ollama
EXPOSE 8080

# Run the startup script when the container starts
ENTRYPOINT ["/usr/local/bin/start_model.sh"]

Notice that we're starting our container with a script... start_model.sh. Here's what that entails:

#!/bin/bash

echo "Starting Ollama serve in the background..."
ollama serve &
serve_pid=$!

if [ $? -ne 0 ]; then
    echo "Error starting Ollama server"
    exit 1
fi

echo "Waiting for Ollama server to be ready..."
wait_time=5
max_retries=10
while true; do
    if curl -s http://localhost:11434/ > /dev/null; then
        break
    fi
    echo "Ollama server not ready, retrying in $wait_time seconds..."
    sleep $wait_time
    wait_time=$((wait_time * 2))
    ((max_retries--))
    if [ $max_retries -eq 0 ]; then
        echo "Ollama server failed to start"
        exit 1
    fi
done

echo "Ollama server is ready."

if ! ollama list | grep -q "llama3.2"; then
    echo "Model llama3.2 not found. Downloading..."
    ollama pull llama3.2
    if [ $? -ne 0 ]; then
        echo "Error downloading llama3.2 model"
        exit 1
    fi
    echo "Model llama3.2 download complete."
else
    echo "Model llama3.2 already downloaded."
fi

trap 'kill -SIGTERM $serve_pid' SIGINT SIGTERM

echo "Ollama server is now running in the background."

while true; do
    sleep 60
done

When your container starts, you check to see if the llama3.2 model is loaded. If it isn't, you download it. Then you have an infinite loop to keep the container active until it's manually stopped.

This creates a container that we can add to our docker-compose.yml:

  llm:
    build:
      context: ./llm
      dockerfile: Dockerfile
    container_name: ollama-container
    restart: unless-stopped
    env_file:
      - .env
    ports:
      - "11434:11434"
    volumes:
      - ./llm:/app

And now you can reach your ollama on its standard port of 11434 to interact with the model: you can request embeddings of input text, or you can simply interact by sending it a prompt and getting the response.

APIs, APIs everywhere

Architecturally, we should treat the LLM like we treated the database... it's a backend component that feeds information to the frontend app. Thus we should build an API for it, just as we did with our MongoDB.

Since we used Flask for our other API, I copy-pasted the setup from that one and changed a few things:

from flask import Flask, jsonify
import requests
from prometheus_client import generate_latest, Counter, Histogram
import os, json

app = Flask(__name__)

# Ollama connection configuration
llm_uri = f"{os.getenv('LLM_URI')}/api/generate"
model_id = os.getenv("MODEL_ID")

REQUEST_COUNT = Counter('llm_requests_total', 'Total number of requests to the llm')
REQUEST_LATENCY = Histogram('llm_request_latency_seconds', 'Latency of requests to the llm')

@app.route('/metrics')
def metrics():
    return generate_latest()


# /getpuzzle is our LLM endpoint - we pass in the prompt 
# (which in hangman is a constant value for now) and then
# it responds with a puzzle and its associated hint.
@app.route('/getpuzzle', methods=['GET'])
def get_puzzle():
    REQUEST_COUNT.inc()
    with REQUEST_LATENCY.time():
        try:
            prompt = "Suggest a Hangman puzzle I can use to defeat my friend. You may choose from these categories: 'thing', 'place', 'phrase', or 'food and drink'. Your puzzle must be more than three words long and less than ten words long.  You are required to respond with only the completed puzzle solution. What puzzle string should I use?"

            payload = {
                "model": model_id,
                "prompt": prompt
            }

            response = requests.post(llm_uri, json=payload)
            # Accumulate chunks for the puzzle solution
            full_response = ""
            for line in response.iter_lines():
                if line:
                    chunk = json.loads(line.decode('utf-8'))
                    full_response += chunk.get("response", "")
                    if chunk.get("done", False):
                        break

            # Parse the accumulated response as the puzzle solution
            result = full_response.strip().replace("\"", "").replace("'","")

            # Prepare a second request for the category hint
            payload_2 = {
                "model": model_id,
                "prompt": f"Given the possible categories of 'thing', 'place', 'phrase', or 'food and drink', What would be the most relevant category for the following hangman puzzle: << {result} >> ? You are required to answer with only the category."
            }

            response_2 = requests.post(llm_uri, json=payload_2)
            full_response_2 = ""
            for line in response_2.iter_lines():
                if line:
                    chunk = json.loads(line.decode('utf-8'))
                    full_response_2 += chunk.get("response", "")
                    if chunk.get("done", False):
                        break

            # Parse the accumulated response as the category hint
            result_2 = full_response_2.strip().replace("\"", "").replace("'","")

            # Create the final JSON response
            final_response = {
                "hint": result_2,
                "phrase": result
            }
            return jsonify(final_response), 200

        except json.JSONDecodeError as e:
            print("Error decoding JSON:", e)
            return jsonify({"error": "JSON decoding error"}), 500
        except Exception as e:
            print('Unable to connect to ollama port:', e)
            return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5002)

You'll notice that we actually make 2 ollama calls in this endpoint. Why? In my original testing, I found that the model struggled with combining the activities of (a) generating the data and (b) formatting it exactly how I wanted it. So I simplified it - I asked the model for the two bits of data I wanted (a puzzle and a hint) and then I assembled them manually into the proper response format. This way I can ensure that the data I'm using is well-formatted and able to be used by the game.

Also of note: I built this API to run in its own Docker container, just like the database API. I'm not providing the code here because it's almost exactly the same as the database API's container - I changed the port number and what folder it builds from. It's all in the git repository if you'd like to see!

Some Glue

Just like with our database API, we want to have an interface within our code to centralize the use of the API so we loosely couple our app to the LLM. Here's what that looks like:

import requests

class LLM_Integration:
    def __init__(self):
        ROOT_URL = 'http://localhost:5002'
        self.puzzle_url = f"{ROOT_URL}/getpuzzle"

    def getpuzzle(self):
        response = requests.get(self.puzzle_url)
        if response.status_code == 200:
            return response.json()
        else:
            return response

We can create an instance of this class and use it in any place that needs it, but only have to update our constant values in one place.

Now we're ready to add AI to our game!

With all of this in place, we're ready to actually implement the AI feature into our game experience!

I elected to add it as a new button on the phrase editor screen. Here is the code we're adding:

        # Add this button to the __init__ method of the 
        # WordEditorApp class, where the other buttons are defined.
        self.ai_button = tk.Button(self, text="AI Generate Word", command=self.ai_word_popup)
        self.ai_button.grid(row=2, column=0)

    # Then add this method to the class to handle the button click
    def ai_word_popup(self):
        """Popup for AI Generation of a new word"""
        response = self.llm.getpuzzle()
        hint = response['hint']
        phrase = response['phrase']
        self.edit_popup("Add Phrase", word=phrase, hint=hint, save_callback=self.add_word)

The structure of our popups (basically always using a single edit_popup) makes this very easy to plug in - all we do is ensure that we run the AI call, get the response out of it, and pass it to edit_popup.

An Aside: Major Changes happened along the way

You might notice that the WordEditorApp class got a bit of an overhaul in between weeks here. When I tried to start up the app, I realized it wasn't working right and had to revisit how windows are defined in Tkinter. All of these changes were pushed up together in the S2E10 branch. Suffice to say there's always space for refactoring in your projects!

Wrapping up Season 2

Friends, this season has been an absolute delight for me as a programmer to work through things and show you how I did it. I hope you've had a fun time, but more importantly, I hope you started to get a feel for how some of these DevOps principles and tools work together to produce software. I'm grateful for each of you that's read or watched along, and I hope you have a holiday season full of warmth and happiness.

As for me, I'll be taking a couple of months off to relax, as well as plan out Season 3. Look for an announcement somewhere around the New Year. Take care, and I can't wait to see what you build!

DEV Community

The Adventures of Blink S2e10: Artificially Intelligent Hangman (Season Finale!)

A brief aside before we get to work

Youtube

The Finale: AI Integration

Architecture of our AI add-on

Putting a Llama in a Container

APIs, APIs everywhere

Some Glue

Now we're ready to add AI to our game!

An Aside: Major Changes happened along the way

Wrapping up Season 2

Top comments (0)

Read next

Docker All in one 1️⃣

Primeros pasos con AWS PartyRock

Overcoming Challenges in Generative AI POCs: Strategies for Success

GhubScan osint tool