Elizabeth Fuentes L for AWS

Posted on Dec 6, 2023 • Originally published at community.aws

How To Choose Your LLM

#aws #bedrock #python #ai

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

✅ Original Blog: Working With Your Live Data Using LangChain

Large language models (LLM) can generate new stories, summarizing texts, and even performing advanced tasks like reasoning and problem solving, which is not only impressive but also remarkable due to their accessibility and easy integration into applications. In this blog, I will provide you with the tools to understand how LLMs work and select the optimal one for your needs.

Generative Artificial Inteligence (Generative AI) has made remarkable progress in 2022, pushing the boundaries with its ability to generate content mimicking human creativity in text, images, audio, and video.

The abilities of Generative AI stem from deep learning models (Fig. 1), which are trained using vast amounts of data. Deep learning models, after extensive training on billions of examples, become what is called "foundation models" (FM). Large language models (LLMs) are one kind of FM that leverage these foundation models for generative capabilities like reasoning, problem-solving and creative expression at a human level. They are capable of understanding language and performing complex tasks through natural conversation.

Fig 1. Where does Gen AI come from?

Over the past few decades, artificial intelligence has been steadily advancing. However, what makes recent advances in generative AI remarkable is its accessibility and easy integration into applications.

In this blog, I'll provide you with the tools to understand the workings of LLMs and select the optimal one for your needs.

How LLMs Work

There are a lot popular LLMs, some of those more advanced LLMs have been trained on far more data than others. The additional training empowers them to tackle complex tasks and engage in advanced conversations.

Nonetheless, their operation remains the same: users provide instructions or tasks in natural language, and the LLM generates a response based on what the model "thinks" could be the continuation of the prompt. (Fig. 2).

Fig 2. How LLMs Work

The art of building a good prompt is called prompt engineering. It is a discipline with specific techniques for developing and refining prompts that allow language models to have effective outputs. Prompt engineering is focused on optimizing prompts for efficient and helpful responses from language models.

With a well-designed prompt, the model's pre-trained abilities can be leveraged to serve novel queries within its scope. Two of the most well-known Prompt Engineer techniques are:

Zero-shot Learning:

For tasks that do not require prior examples to understand the context of the task that is required. For example, classification.

Example of Zero-shot Learning.

Few-shot Learning:

Zero-shot capabilities refer to the ability of large language models to complete tasks they did not train it on. However, they still face limitations when performing complex tasks with only a short initial prompt without guidance. Few-shot Learning improves model performance on difficult tasks by incorporating demonstrations or in-context learning.

📚 Tip: Put the LLM in context of what its role is, for example: "You are a travel assistant".

Example of Few-shot Learning.

Learn about prompt engineering:

How To Choose The Best LLM?

To make this decision, I am going to list some aspects that I consider to be most important:

The LLM's Mission In The Application

What will be the need that the LLM is going to solve in the application. The functionalities with the highest usage are:

Summarize
Classification
Question Answering
Code generation
Content writing
Instruccion following
Multilingual Task
Embedding: translate the text into a vector representation.

As I mentioned before, there are advanced models capable of handling complex tasks and multitasking. For Example, Llama-2-13b-chat is a powerful LLM for managing conversations, but only in English.

You can select a model that can satisfy all your requirements at once, or create decoupled applications with multiple specialized models for each task.

📚 Remember: Use prompt engineer to generate desired outputs.

The language

There are LLM specialized in certain tasks, capable of speaking one language or more than one. It’s important to define if your application will speak only one language or more than one before choosing the LLM. For example, Titan Text Express is multilingual, unlike Titan Text Lite, which only talks in English.

📚 Tip: : If the LLM you need doesn't have the desired language function, try using a multilenguial LLM for translation or Amazon Translate before sending the prompt.

Leght Of Context Window

A context window refers to the length of text an AI model can handle and reply to at once, this text, in most LLMs, is measured in tokens.

Fig 3. Context Window

Regarding tokens, are like the individual building blocks that make up words. For example:

In English, a single token is typically around 4 characters long.
A token is approximately 3/4 of a word.
100 tokens equate to roughly 75 words.

This code snippet shows how to determine the token count using Jurassic-2 Ultra with Amazon Bedrock.

import boto3
import json
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime', 
    region_name='us-east-1'
)
model_id = "ai21.j2-ultra-v1"

prompt="Hola Mundo"

kwargs = {
  "modelId": model_id,
  "contentType": "application/json",
  "accept": "*/*",
  "body": "{\"prompt\":\""+ prompt +"\",\"maxTokens\":200,\"temperature\":0.7,\"topP\":1,\"stopSequences\":[],\"countPenalty\":{\"scale\":0},\"presencePenalty\":{\"scale\":0},\"frequencyPenalty\":{\"scale\":0}}"
}

response = bedrock_runtime.invoke_model(**kwargs)

Breaking down the response:

response_body = json.loads(response.get("body").read())
completion = response_body.get("completions")[0].get("data").get("text")
print(completion)

Bonjourno! How can I assist you today?

Let's find out the token count in both the Prompt Input and Generated Output(completion):

Prompt Input:

tokens_prompt = response_body.get('prompt').get('tokens')
df_tokens_prompt = json_normalize(tokens_prompt)[["generatedToken.token"]]

Prompt Tokens count.

Generated Output:

tokens_completion = response_body.get("completions")[0].get('data')["tokens"]
df_tokens_completion = json_normalize(tokens_completion)[["generatedToken.token"]]

Completion Tokens count.

Pricing

As there are open source LLMs, there are other payments, depending on the provider, modality and model, however, they all take the number of tokens into consideration.

Referring to the modality of paid LLMs:

✅ Only Inference: When invoke the model as an API, the pricing corresponds to the number of incoming and outgoing tokens (Fig. 5). Amazon Bedrock is fully managed service offers the option to use LLMs through an API call, with a choice between on-demand or Provisioned Throughput to save costs, see pricing here and pricing examples here.

Fig 5. Only Inference Modality

✅ Customization (fine-tuning): when it is necessary to fine-tuning the model to a specific need (Fig. 6). In this type of pricing to the previous value, you must add the new training and the storage of the new model. Amazon Bedrock also offers a mode for customization (fine-tuning).

"Fig 6. Customization (fine-tuning) Modality

For those who need to experience more there is Amazon SageMaker JumpStart, which allows you, within several functionalities, to train and tune models before deployment with a jupyter notebook. Amazon SageMaker JumpStart has available this models, and check the pricing here.

Comparison of LLMs and features

Take a look at this chart of some available Amazon Bedrock models for a broader perspective when making comparisons.

Conclusion

Thank you for joining this reading where I explain how LLMs work, and how to improve response using the prompt engineering technique. You learned how to choose the best one for your application based on features such as:

The LLM’s mission in the application: what problem will the LLM help me to solve?
The language: Do I need the LLM to understand in multiple languages?
Length Of Context Window: The amount of text in the input request and generated output.
Pricing: Where I need to know the cost of the LLM that fits my needs and also ask myself: Are the LLMs available sufficient for what I need? If not, do I need to do fine-tuning?

Finally, you saw what a comparison chart built with some of the available Amazon Bedrock models looks like.

🚀 Some links for you to continue learning and building:

¡Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube
Linktr

Top comments (1)

Ranjan Dailata • Dec 6 '23

Great article on the LLM.

Careful consideration is required for the fine-tuning of models. It's recommended to think about alternate approaches such as RAG which can solve the problem. Fine-tuning is not a silver bullet, it has its complications in terms of building the instruction set and the heavy cost involved in accomplishing it.

DEV Community