By now everyone knows and love ChatGPT and GenAI has taken the world by storm but do you know now you can build and run your own custom AI chatbot on your machine.
YES! Let's take a look at the ingredients for this recipe.
Python
If you are someone who is looking to dig deep into the AI / ML then you need to learn Python which is the go to programming language in this space. If you already know it then you are all set here otherwise i would suggest going through a python crash course or whatever suits you best and also make sure that you have python3 installed on your system.
Ollama
Ollama is an awesome open source package which provides a really handy and easy way to run the large language models locally. We would be using this package to download and run the 8B version of Llama3.
Gradio
Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it.
Ok so now lets start!!
Step1: Installing the Ollama
Download and install the Ollama package on your machine. Once installed run the below command to pull the Llama3 8B version.
ollama pull llama3
By default it downloads the 8B version if you want to run other version like 70B then simply append it after the name e.g llama3:70b. Check out the complete list here.
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Step2: Creating custom model from Llama3
Open up a code editor and create a file name modelfile
and paste the below content in it.
FROM llama3
## Set the Temperature
PARAMETER temperature 1
PARAMETER top_p 0.5
PARAMETER top_k 10
PARAMETER mirostat_tau 4.0
## Set the system prompt
SYSTEM """
You are a personal AI assistant named as Ultron created by Tony Stark. Answer and help around all the questions being asked.
"""
Parameters
Parameters dictates how your model responds and learn.
temperature: The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)
top_p: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)
top_k: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)
mirostat_tau: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)
Check out all the available parameters and their purpose here.
System prompt
Here you can play around and give any name and personality to your chatbot.
Now let's create the custom model from the modelfile by running the below command. Provide a name of your choice e.g ultron.
ollama create ultron -f ./Modelfile
ollama run ultron
You would see ultron running and ready to accept input prompt. Ollama has also REST API for running and managing the model so when you run your model it's also available for use on the below endpoint
http://localhost:11434/api/generate
We will be using this api to integrate with our Gradio chatbot UI.
Step3: Create the UI for chatbot
Initialize a python virtual environment by running the below commands.
python3 -m venv env
source env/bin/activate
Now install the required packages
pip install requests gradio
Now create a python file app.py and paste the below code in it.
import requests
import json
import gradio as gr
model_api = "http://localhost:11434/api/generate"
headers = {"Content-Type": "application/json"}
history = []
def generate_response(prompt):
history.append(prompt)
final_prompt = "\n".join(history) # append history
data = {
"model": "ultron",
"prompt": final_prompt,
"stream": False,
}
response = requests.post(model_api, headers=headers, data=json.dumps(data))
if response.status_code == 200: # successful
response = response.text
data = json.loads(response)
actual_response = data["response"]
return actual_response
else:
print("error:", response.text)
interface = gr.Interface(
title="Ultron: Your personal assistant",
fn=generate_response,
inputs=gr.Textbox(lines=4, placeholder="How can i help you today?"),
outputs="text",
)
interface.launch(share=True)
Now let's launch the app, run your python file python3 app.py
and your chatbot would be live on the below endpoint or similar. Please note that the response time may vary according to the your system's computing power.
http://127.0.0.1:7860/
There you have it! Your own chatbot running locally on your machine, you can even turn off the internet it would still work. Please share in the comments what other cools apps you are making with AI models.
Top comments (0)