- It all began with my curiosity for PrivateGPT.
- As things turn out, the real MVPs are 2 libraries, LangChain and Transformers.
- Here is how to build a super simple local chatbot using Transformers only.
- Python : 3.9 ~ 3.10 seems to work just fine at the time of writing.
- Microsoft C++ Build Tools
- A decent Nvidia Graphics card.
- Create a project folder. E.G.
- Open terminal, navigate to project folder.
- Create a virtual environment and activate it.
- Windows -
- Linux/Mac -
- Install transformers -
pip install transformers optimum auto-gptq
- Install PyTorch.
- Head over to PyTorch Get Started Locally, get the "correct pip install".
- Yes, PyTorch with CPU/GPU support are different.
- E.G. For Windows with CUDA support -
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
# (A) LOAD MODULES import torch from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline # (B) MODEL + TOKENIZER model_name = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype = torch.float16, device_map = "auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # (C) PIPE pipe = pipeline( task = "text-generation", model = model, tokenizer = tokenizer, do_sample = True, max_new_tokens = 1000 ) # (D) RUN QUERY while True: query = input("\nEnter a query: ") if query == "exit": break if query.strip() == "": continue print(pipe(query))
- (A) Load PyTorch and Transformers.
- (B) Load the model and tokenizer - we will use a simple Wizard-Vicuna model.
- (C) Put the model and tokenizer into a
- (D) Endless loop, pass a
pipeand get response from the AI... Enter
- Transformer will automatically download your selected model... So be warned, that will be a few gigabytes and will take some time.
- If you want to change where the model is downloaded, add these right at the very top:
os.environ["TRANSFORMERS_CACHE"] = "PATH\TO\MODELS"
- Head over to Hugging Face, choose a model.
GGMLModels optimized for CPU.
GPTQModels optimized for GPU.
GGUFNewer version/replacement for
CHATModels with "chat" in the name are tuned to do chat.
CODEModels to provide coding assistance.
MATHTo do calculations.
7B 13B 34B 70BNumber of parameters. The more, the "smarter"... Technically speaking. But more parameters also = Need more system resources.
- In any case, the transformers library seems to only support GPTQ libraries and some specific ones like
meta-llama(at the time of writing).
- A few popular models/devs:
- Once you have chosen a model, just replace
model_namewith the URL path/suffix. E.G.
Congrats! You have created a LOCAL AI chatbot with 30 lines of code. But AI is a lot more capable than that, if you want to learn more - Here is the detailed tutorial on my blog and the GIST.