Joe

Posted on Nov 4, 2023 • Updated on Jan 12

Elastic D&D - Update 11 - Veverbot - Data Vectorization

#python #elasticsearch #ai

Last week we talked about audio transcription changes. If you missed it, you can check that out here!

Veverbot

Veverbot is my own custom AI assistant that aims to help players get quick answers about things that happened during their campaign so far. This is absolutely a work-in-progress, but even the first iteration of him is very cool.

This is a fairly involved process, so today I will be talking about what needs to be done from the logging / Elastic configuration side of things in order for Veverbot to work.

Elastic Configuration

For Veverbot to work, we simply need to add/adjust the mappings of index templates. Currently, I am using two templates: one for the "dnd-notes-*" indices, and another for an index named "virtual_dm-questions_answers". The second index contains the questions that players ask Veverbot, as well as the responses that Veverbot provides back to the players.

dnd-notes-* component template

{
      "name": "dnd-notes",
      "component_template": {
        "template": {
          "mappings": {
            "properties": {
              "@timestamp": {
                "format": "strict_date_optional_time",
                "type": "date"
              },
              "session": {
                "type": "long"
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "finished": {
                "type": "boolean"
              },
              "message": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "message_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              }
            }
          }
        }
      }
    }

virtual_dm-questions_answers component template

{
      "name": "virtual_dm-questions_answers",
      "component_template": {
        "template": {
          "mappings": {
            "properties": {
              "question_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              },
              "answer": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "question": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                }
              },
              "answer_vector": {
                "dims": 1536,
                "similarity": "cosine",
                "index": "true",
                "type": "dense_vector"
              }
            }
          }
        }
      }
    }

NOTE:

The mappings and templates are automatically created via the docker-compose file! This is simply educational, a user will not have to deal with the creation of any of this.

Logging

With the mappings in place, we can now ingest logs with a dense_vector field. If you recall, this step happens on the note input page of Streamlit and is applied to every note that gets sent to Elastic.

Audio Note

st.session_state["message_vector"] = api_get_vector_object(st.session_state.transcribed_text)

Text Note

st.session_state["message_vector"] = api_get_vector_object(st.session_state.log_message)

The function that gets called simply makes a get request to the FastAPI that was talked about in the week 9 blog post!

def api_get_vector_object(text):
    # returns vector object from supplied text

    fastapi_endpoint = "/get_vector_object/"
    full_url = fastapi_url + fastapi_endpoint + text
    response = requests.get(full_url)

    try:
        message_vector = response.json()
    except:
        message_vector = None
        print(response.content)

    return message_vector

The API accepts the text as a variable, creates an embedding via OpenAI, and returns the vector object from the embedding. This vector object is what will allow Veverbot to compare user questions to player notes and return an answer.

@app.get("/get_vector_object/{text}")
async def get_vector_object(text):
    import openai

    openai.api_key = "API_KEY"
    embedding_model = "text-embedding-ada-002"
    openai_embedding = openai.Embedding.create(input=text, model=embedding_model)

    return openai_embedding["data"][0]["embedding"]

The log is indexed as normal, now with dense_vector field. This field is what will allow Veverbot to compare user questions to player notes and return an answer, which we will talk about next week!

Closing Remarks

As previously stated, next week I will be talking about Veverbot from the Streamlit side. I will essentially walk through the user experience and what is happening in the background to produce the "conversation" that happens on the front end.

Check out the GitHub repo below. You can also find my Twitch account in the socials link, where I will be actively working on this during the week while interacting with whoever is hanging out!

GitHub Repo
Socials

Happy Coding,
Joe

DEV Community

Elastic D&D - Update 11 - Veverbot - Data Vectorization

Veverbot

Elastic Configuration

dnd-notes-* component template

virtual_dm-questions_answers component template

Logging

Audio Note

Text Note

Closing Remarks

Top comments (0)

Read next

This Week In Python

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

New Research Breaks Through AI Language Model Safeguards, Exposing Security Risks

New AI Revolution: Designing a Global Multi-Agent Network with Large Language Models