Ever since the launch of ChatGPT, every tech company is investing and integrating AI into their products and no surprise that everyone is jumping on the AI bandwagon but majority of the people who are using ChatGPT are not machine learning or AI experts so if you are just an average software engineer like me who is eager to learn more about this hype then by the end of this post you can also become that cool kid in your friend's group.
What are LLMs:
LLMs stands for "Large Language Models." These are advanced artificial intelligence models designed to understand and generate human-like text. Large Language Models are trained on vast amounts of text data, allowing them to learn patterns, syntax, semantics, and context of language. They have the capability to generate coherent and contextually relevant responses to input text or prompts.
ChatGPT is also built upon LLM (GPT 3.5, GPT4 etc) by OpenAI but GPT models are not open source and in order to integrate them in your applications you would have to use paid OpenAI API keys and also you can't further fine tune the GPT models but its not just OpenAI that is building LLMs other companies like Facebook and Google are also building their own models in this AI arms race.
How do LLMs operate?
The foundational task underpinning the training of most cutting-edge LLMs revolves around word prediction, predicting the probability distribution of the next word given a sequence.
For instance, when presented with the sequence "Listen to your ____," potential next words might include: heart, gut, body, parents, grandma, and so forth. This is typically represented as a probability distribution.
Some prominent open source LLMs are:
Llama by Facebook
Gemma by Google
Mistral by Mistral AI
Few key point about LLMs
When coming across LLMs you would often see something like Llama3 70B, Llama3 8b etc. That digits at the end is actually the number of parameters this model has so in the given case its 40 billion and 8 billions, yes BILLIONS!! Thats why they are so efficient with natural language processing.
Model size
The model size is the number of parameters in the LLM. The more parameters a model has, the more complex it is and the more data it can process. So for example Llama3 70B is 40GB while Llama3 8b is 4.5GB. Larger models are also more computationally expensive to train and deploy and you would need high performing GPUs in order to run them.
Training data
The training data is the dataset that the LLM is trained on. The quality and quantity of the training data has a significant impact on the performance of the model.
Hyperparameters
Hyperparameters are settings that control how the LLM is trained. These settings can be fine-tuned to improve the performance of the model according to your specific needs.
HuggingFace
HuggingFace has an AI community and it's a very cool platform where you can find tons of open source models for different categories. On huggingFace you would find:
Models: Open source LLMs
Datasets: Publicly available datasets to custom train your LLM)
Spaces: AI Apps developed by the community and deployed on HuggingFace
Transformers Library
The Transformers library developed by Hugging Face is a powerful and versatile open-source library. Transformers provides APIs and tools to easily download and train pretrained models. This is a very extensive package and offers a ton of functionality out of the box so definitely check it out.
To give you a glimpse of it, below is the python code snippet which will use "microsoft/codereviewer" model from huggingFace and review a javascript code file and share the suggestions.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
model_id = "microsoft/codereviewer"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
def review_javascript_code(file_path):
pipe = pipeline("text2text-generation", model="microsoft/codereviewer")
with open(file_path, "r") as file:
code = file.read()
result = pipe(code, max_length=512, num_return_sequences=1)[0]["generated_text"]
print(result)
javascript_file_path = "test.js"
review_javascript_code(javascript_file_path)
What's Next?
In the next articles we will see how we can configure and run a LLM locally on our machine and how do we custom train them for our own specific tasks. Please share what other cool tools you came across in this space.
Related:
Build and run your own AI chatbot
Top comments (0)