CROUCHING TIGER, HIDDEN DRAGON
- It all began with my curiosity for PrivateGPT.
- As things turn out, the real MVPs are 2 libraries, LangChain and Transformers.
- Here is how to build a super simple local chatbot using Transformers only.
PART 1) REQUIREMENTS
- Python : 3.9 ~ 3.10 seems to work just fine at the time of writing.
- Microsoft C++ Build Tools
- A decent Nvidia Graphics card.
PART 2) PROJECT SETUP
- Create a project folder. E.G.
C:\CHATBOT
- Open terminal, navigate to project folder.
cd C:\CHATBOT
- Create a virtual environment and activate it.
virtualenv venv
- Windows -
venv\Scripts\activate
- Linux/Mac -
venv/bin/activate
- Install transformers -
pip install transformers optimum auto-gptq
- Install PyTorch.
- Head over to PyTorch Get Started Locally, get the "correct pip install".
- Yes, PyTorch with CPU/GPU support are different.
- E.G. For Windows with CUDA support -
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
PART 3) SCRIPT
simple.py
# (A) LOAD MODULES
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# (B) MODEL + TOKENIZER
model_name = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype = torch.float16,
device_map = "auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# (C) PIPE
pipe = pipeline(
task = "text-generation",
model = model,
tokenizer = tokenizer,
do_sample = True,
max_new_tokens = 1000
)
# (D) RUN QUERY
while True:
query = input("\nEnter a query: ")
if query == "exit":
break
if query.strip() == "":
continue
print(pipe(query))
- (A) Load PyTorch and Transformers.
- (B) Load the model and tokenizer - we will use a simple Wizard-Vicuna model.
- (C) Put the model and tokenizer into a
pipe
. - (D) Endless loop, pass a
query
into thepipe
and get response from the AI... Enterexit
to stop.
PART 4) RUN!
python simple.py
- Transformer will automatically download your selected model... So be warned, that will be a few gigabytes and will take some time.
- If you want to change where the model is downloaded, add these right at the very top:
import os
os.environ["TRANSFORMERS_CACHE"] = "PATH\TO\MODELS"
USE A DIFFERENT AI MODEL
- Head over to Hugging Face, choose a model.
-
GGML
Models optimized for CPU. -
GPTQ
Models optimized for GPU. -
GGUF
Newer version/replacement forGGML
. -
CHAT
Models with "chat" in the name are tuned to do chat. -
CODE
Models to provide coding assistance. -
MATH
To do calculations. -
7B 13B 34B 70B
Number of parameters. The more, the "smarter"... Technically speaking. But more parameters also = Need more system resources.
-
- In any case, the transformers library seems to only support GPTQ libraries and some specific ones like
meta-llama
(at the time of writing). - A few popular models/devs:
- Once you have chosen a model, just replace
model_name
with the URL path/suffix. E.G.TheBloke/vicuna-7B-v1.5-GPTQ
.
THE END
Congrats! You have created a LOCAL AI chatbot with 30 lines of code. But AI is a lot more capable than that, if you want to learn more - Here is the detailed tutorial on my blog and the GIST.
Top comments (0)