DEV Community

0xkoji
0xkoji

Posted on

Run Gemma on Google Colab Free tier

What is Gemma?

Gemma is a family of 4 new LLM models by Google based on Gemini. It comes in two sizes: 2B and 7B parameters, each with base (pretrained) and instruction-tuned versions. All the variants can be run on various types of consumer hardware, even without quantization, and have a context length of 8K tokens

https://huggingface.co/blog/gemma

In this post, we will try to run Gemma on the Google Colab Free tier. To do that, we will need to use the quantized model since gemma-7b requires 18GB GPU RAM.

requirements

  • HuggingFace account
  • Google account

Step 1. Get access to Gemma

We can use Gemma with Transformers 4.38 but to do that first we need to get a grant to access the model.

https://huggingface.co/google/gemma-7b

Once you get a grant, you will see the below in the above page.

gemma model

Step 2. Add HF_TOKEN to Google Colab

We need to add HF_TOKEN to Google Colab to access gemma via Transformers.

First we need to get a token from Huggingface.
https://huggingface.co/settings/tokens

Then click the key icon in the sidebar on Google Colab like below.

Image description

Step 3. Install packages

!pip install -U "transformers==4.38.1" --upgrade
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes
Enter fullscreen mode Exit fullscreen mode

Step 4. Write Python code to run Gemma

We can use gemma-7b model via transformers.

from transformers import AutoTokenizer, pipeline
import torch

model = "google/gemma-7b-it"
# use quantized model
pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={
        "torch_dtype": torch.float16,
        "quantization_config": {"load_in_4bit": True}
    },
)


messages = [
    {"role": "user", "content": "Tell me about ChatGPT"},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
Enter fullscreen mode Exit fullscreen mode

Result

The following is the result of the above code.
As you can see the output is wrong unfortunately. So at this moment , Gemma is missing the latest data or not a good model. 🥲

ChatGPT is a large language model (LLM) developed by Google. It is a conversational AI model that can engage in a wide range of topics and tasks, including:

Key Features:

  • Natural Language Processing (NLP): ChatGPT is able to understand and generate human-like text, including code, scripts, poems, articles, and more.
  • Information Retrieval: It can provide information on a vast number of topics, from history to science to technology.
  • Conversation: It can engage in natural language conversation, answer questions, and provide information.
  • Code Generation: It can generate code in multiple programming languages, including Python, Java, C++, and more.
  • Task Completion: It can complete a variety of tasks, such as writing stories, summarizing text, and translating languages.

Additional Information:

  • Large Language Model: ChatGPT is a large language model, trained on a massive amount of text data, making it able to learn complex relationships and patterns.
  • Transformer-Based: ChatGPT uses a transformer-based architecture, which allows it to process language more efficiently than traditional language models.
  • Open-Source: ChatGPT is open-sourced, meaning that anyone can contribute to its development

Top comments (0)