DEV Community

Sasmitha Manathunga
Sasmitha Manathunga

Posted on • Updated on

A Hands-on Guide to Prompt Engineering with ChatGPT and GPT-3

Introduction to ChatGPT and GPT-3

Unless you have been living under a rock for the past couple of weeks, chances are, you have heard about ChatGPT: the AI chatbot from OpenAI that is capable of natural human-like responses to virtually any question you can throw at it. It can write emails, solve coding problems, write poems, and even tell jokes.

ChatGPT is built on top of the GPT-3.5 family of OpenAI's large language models and is fine-tuned to interact in a conversational way. It can even answer follow-up questions, admit its mistakes and correct them, challenge prompts with false premises, and reject inappropriate requests.

OpenAI also provides a series of GPT-3 models that can understand and generate natural language and code for various tasks. The most powerful of these models is the text-davinci-003 model, which outperforms the other models in terms of the quality of responses. However, this model is the slowest and most expensive one.

You can fiddle around with the GPT-3 models at OpenAI Playground or with ChatGPT free of charge (for a limited time.)

The GPT-3 models have a few parameters that can be tweaked to control the responses, however, it lacks the ability to follow up on questions like ChatGPT. These two tools can be used for different purposes depending on the task, and in this guide, we'll be using both of them to learn about "Prompt Engineering."

What is Prompt Engineering?

If you have played around with ChatGPT (or the GPT-3 models), you probably have noticed that the quality of the responses depends on how you ask the question. Usually, detailed prompts with exact instructions give better results, but there is a certain "art" to engineering the perfect prompt. If you just started using ChatGPT and the GPT-3 models and have no idea what I am talking about, here is an example.

Note: I'll be using text-davinci-003 for this example as its temperature, i.e., how consistent the results are, can be changed. I set the temperature to 0, which is best suited for Q&A questions. I'll explain this later in the article. You can use ChatGPT for this, but results may vary.

Let's first ask text-davinci-003 a simple question. (The prompt is given in bold text and the rest is the completion)

What is 8x9?
72

Obviously, it gives the correct answer. Now, let's ask it a harder question.

What is 696x87?
59952

Oops, it got it wrong. The answer should be 60552. With some little "prompt engineering" we can extract the right answer.

What is 696x87? Let's think step by step.
Step 1: 696 x 80 = 55,680
Step 2: 696 x 7 = 4,872
Step 3: 55,680 + 4,872 = 60,552

Answer: 696 x 87 = 60,552

We see that encouraging the model to explain its reasoning leads to more accurate results. This method doesn't give 100% accurate answers all the time. For example:

What is the result of 542x98? Let's think step by step.

Step 1: Multiply the hundreds: 5 x 9 = 45
Step 2: Multiply the tens: 4 x 8 = 32
Step 3: Multiply the ones: 2 x 8 = 16
Step 4: Add the results: 45 + 32 + 16 = 93

The result of 542 x 98 is 53,236.

Wow, it completely messed up. Let's give some examples to help it with the reasoning step.

Q: What is 696x87? Let’s think step by step.
A:
Step 1: 696 x 80 = 55,680
Step 2: 696 x 7 = 4,872
Step 3: 55,680 + 4,872 = 60,552
696 x 87 = 60,552

Q: What is 542x98? Let’s think step by step.
A:

Step 1: 542 x 90 = 48,780
Step 2: 542 x 8 = 4,336
Step 3: 48,780 + 4,336 = 53,116
542 x 98 = 53,116

Ah ha! it arrived at the answer with correct reasoning. This still isn't 100% accurate, but we've significantly improved its accuracy by changing how we ask the question.

What we've done just now is called Prompt engineering: the process of creating prompts that will produce the optimal results for our task.

Now, let's dig deeper into some useful techniques of prompt engineering.

Techniques for Crafting the Perfect Prompt and Getting Accurate Results

Let's explore some specific prompt formats and rules of thumb that work particularly well and align better with the tasks at hand, as well as some general guidelines to keep in mind when crafting a prompt.

1. Choose the Right Model

While you can use ChatGPT for pretty much any natural language task, the GPT-3 models provided by OpenAI have more flexibility and control over the resulting output completions. We won't go over everything here, but here are a few models for comparison.

  1. text-davinci-003: Most capable model of the GPT-3 series. Gives higher quality results, longer completions, and is better at instruction-following. However, this model requires more compute resources so it will cost more per request and will be slower than the rest.
  2. text-curie-001: Very capable and extremely fast with lower costs than text-davinci-003. Performs well on language translation, summarization, and complex classification tasks.
  3. text-ada-001: Fastest and model. Not as complex as the other models but can perform simple tasks with a fraction of the cost. Good for parsing text, finding keywords, and simple classification.
  4. code-davinci-002: Suitable for translating natural language to code.

2. Tweak the Temperature Setting

There are several settings you can change to control the behavior of the responses. The most important setting is the temperature. You can tweak this setting in all the models except ChatGPT.

You may have noticed that if you submit the exact same prompt multiple times, then the resulting completions can vary for each request. This is what the temperature setting controls, which is essentially how confident the model should be when making predictions. Lower temperatures will give more accurate and deterministic outputs while higher temperatures will provide more diverse responses.

For tasks where the output is well-defined, such as data extraction and Q&A, the temperature should be set to 0. The result of the math problem we discussed above is well-defined and the steps to arrive at an answer don't vary much. So we set the temperature to 0

On the other hand, for tasks where the answer is open-ended or subjective, a higher temperature may be more appropriate.

The temperature can be thought of as a way to control the randomness or diversity of the model's output.

3. Specificity is the Key

Giving specific instructions such as the length, format, style, context, tone, etc., of the response will yield better results.

❌ Not specific enough

Write about pollution

✅ Give detailed instructions.

Write a 500-word essay about pollution, focusing on its negative effects on the environment and human health. Use a formal academic tone and should address the various forms of pollution. Organize the essay in an introduction, body, and conclusion format, and include a clear thesis statement in the introduction.

You can even ask the model to provide how to be more specific about the prompt:

I'm trying to get and AI chatbot to write an essay. Can you rewrite the following prompt to give brief, specific instructions such as the length, format, style, context, tone, etc., for its response:

Write about pollution

4. Prompt the Model to Explain Its Reasoning

One of the tricks we used to improve the accuracy of the responses was by adding the magic words "Let's think step by step." Asking the model to give an explanation before arriving at the final answer can increase the chance that its final answer is accurate and consistent. I'll give text-davinci-003 a simple math question to demonstrate this phenomenon.

❌ Without reasoning the model responds with the wrong answer.

A juggler can juggle 16 balls. Half of the balls are golf balls, and half of the golf balls are blue. How many blue golf balls are there?

Eight blue golf balls.

✅ Uses step by step reasoning to arrive at the correct answer.

A juggler can juggle 16 balls. Half of the balls are golf balls, and half of the golf balls are blue. How many blue golf balls are there? Let's think step by step:

  1. There are 16 balls in total.
  2. Half of the balls are golf balls, so there are 8 golf balls.
  3. Half of the 8 golf balls are blue, so there are 4 blue golf balls.

Now, let's ask the same question from ChatGPT. It can provide the correct answer because it's fine-tuned to give verbose and detailed explanations to questions:

A juggler can juggle 16 balls. Half of the balls are golf balls, and half of the golf balls are blue. How many blue golf balls are there?

There are 16 / 2 = 8 golf balls.
There are 8 / 2 = 4 blue golf balls.

5. Put Instructions at the Beginning and Separate the Instruction and the Context.

The GPT-3 models may sometimes have difficulty in separating the instruction from the context because they are not clearly separated. Put the instruction at the beginning of the prompt and use """ to separate the instruction and the prompt.

❌ Less effective

Use the following text to answer the given question.

{text here}

{question here}

✅ More effective

Use the following text to answer the given question.

Text:
"""
{text here}
"""

Question: {question here}

6. Give Examples to Show the Format of the Final Result.

Providing several examples of how the final result should be formatted gives better results. Here's an example to extract the essential ingredients from a recipe as a comma-separated list:

✅ Provide a couple of examples.

Extract the essential ingredients from the recipe given below.
Recipe:
"""
1.)Preheat your oven to 350°F (180°C). Grease and flour a 9-inch round cake pan.
{...}
8.) Serve the cake as is or top with your favorite frosting or glaze. Enjoy!
"""
Ingredients: flour, sugar, baking powder, baking soda, salt, milk, eggs, butter, vanilla extract.

Recipe:
"""
1.)In a medium bowl, whisk together the flour, sugar, baking powder, and salt.
{...}
8.)Serve the pancakes warm, with optionally butter and maple syrup.
"""
Ingredients
: flour, sugar, baking powder, salt, egg, milk, butter.

7. Some General Guidelines to Improve Accuracy

We've discussed some specific techniques to get better results from our model. Now, here are some general guidelines you should follow when crafting prompts.

  • Make sure your prompt doesn't have spelling or grammar errors, or inconsistent formatting.
  • To improve the quality of the output, generate many completions and take the best one.
  • If a prompt doesn't give the intended response, try paraphrasing the prompt. You can even ask the model the rephrase the prompt for better instruction-following.
  • Break down complex tasks into simpler ones to make it easy for the model to understand.
  • Avoid generic statements when trying to solve a specific task. For example, rather than asking "Answer the following question" for a math problem ask it "Calculate the answer to the following problem."

Failure of GPT-3 at Complex Reasoning Tasks

The methods we discussed can significantly improve the quality of the responses, however, they aren't perfect. There are some tasks that are just too complex for these types of models. For example, GPT-3 is not good at understanding and remembering the spatial relationships and locations of objects, a skill known as spatial reasoning. Here's a question that the model fails to answer correctly. (Use the techniques we discussed above to try and get the right answer. If you are successful, leave a comment below.)

There are four books stacked on top of each other on the surface of a table. The books are labeled A, B, C, and D. Book A is on top of book D and they are touching each other. Book B is touching the table surface. Book C is touching two books. Now the following sequence of events happens:
1.) Jack takes a book from the top.
2.) Anna takes a book from the top.
3.) Mary takes a book from the top.
4.) Jack puts his book on top of the stack of books.
5.) Anna puts her book on top of the stack of books.

Now, give the order of the books starting from the bottom to the top after the sequence of events.

March 2023 Update: GPT-4 can correctly solve this question!

Final Thoughts

Prompt Engineering is an important aspect when it comes to how we interact with large language models like GPT-3. Although I've only covered the basics of Prompt Engineering, there are many more techniques to fine-tune prompts to get optimal results.

Prompt Engineering is similar to using Google Search because the quality of a query can greatly affect the search results we get.

While these techniques can be a little bit fuzzy and unreliable, there are many efforts to systemize the process of crafting optimal prompts.

As large language models such as GPT-3 will be integrated into many upcoming products, the ability to create effective prompts will be an important skill in the near future.

References and Useful Sites

🙌 Hey! If you enjoy my content and want to show some love, feel free to buy me a coffee. Each cup helps me create more useful content for incredible developers like you!

Top comments (0)