Google Gemma first try

#ai #gemm #llm #kaggle

Kia Ora! Everyone!

Yesterday, Google released Hugging Face's Gemma Large Language Model, which is Gemini, but now developers can use Hugging Face to do more research or development, such as fine-tuning the model or calling RAGs and other ways to customize the LLM output.

Personally, I don't think Gemma is targeting ChatGPT, but more like Llama2 as the main competitor to capture the market of low configuration or thin and light office locally deployed large language modeling assistants.
Without further ado, today we're using Kaggle Note book for online deployment (since I'm currently using the M1 Mac Pro as my primary development environment, which doesn't currently have support for GPU acceleration, etc.).

First we find the Gemma homepage on Kaggle, then log in and accept the license for Gemma.
According to the information on the Kaggle homepage, we can see that Google has sent out four models, two 2B and two 7B. we directly choose 'New Notebook' on the top of the page.
After entering the Kaggle Notebook, on the right side of the page we set the parameters of the Gemma model:

After the setup is complete, we tap 'add model' to see the model information and path of Gemma2B (we will use it later). For GPU acceleration, Kaggle provides us with 30 hours of free usage per month.

If you don't have the option "ACCELERATOR", it may be that your Kaggle hasn't been verified by your cell phone, after verification, you can see this option~.
So far our Notebook setup is over, and now we have started to install the required libraries.

First of all the code in the first cell is automatically assigned to us by Kaggle, and in my experience, it's best to run it (although we don't use these functions).
These three libraries are the main ones we need to use

!pip install -U transformers accelerate bitsandbytes

After calling them, we can set the mod to use different precisions on the GPU, such as 8 or 16, 4, etc.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=False) #load_in_4bit

Note that to start here, we need to set the path to the model, on the right-hand side of the page select the three dots to the right of 'Gemma V2' and then select 'copy directory path'. We assigned it as 'model_id'

model_id = "your directory path"

Because we need to use Hugging Face's Tokenizer, here we need to log into the Hub to get access()：

!huggingface-cli login --token Your_huggingface_api

Remember to accept the license on HuggingFace's Gemma page as well.

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=dtype,
    quantization_config = quantization_config
)

Take Care！

If the error is reported consistently here:

ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.

Then most likely Kaggle Notebook needs to be restarted once after the transformers are updated.
When you see the familiar HuggingFace download progress bar you've done it!

It's only been a day since Gemma was released, and I can already see a lot of incredibly talented developers on Kaggle making a very large number of attempts at RAG, fine-tuning, and so on. Looking forward to more LLM form changes!

DEV Community

Google Gemma first try

Kia Ora! Everyone!

Take Care！

Top comments (0)

Read next

Kernel Memory document ingestion

Kernel Memory with Azure OpenAI, Blob storage and AI Search services

Day 40: Constrained Decoding with LLMs

AI-Powered Solution Cuts Mixed-Integer Programming Time by 40% Using Unsupervised Learning