TL;DR
While fine-tuning is our ultimate weapon to teach LLMs specific tasks, properly setting up the context in your prompts could help achieving results that are "Good Enough".
This article shows an experiment on how to teach chatGPT (GPT-4) a new programming language. If you feel inclined to do so, you can directly jump to the Conclusions.
The Article
"You can't teach new tricks to an old dog", they say. But this is not necessarily true.
In the world of LLM AI, given a pre-trained model, you can further teach it new stuff by fine-tuning it. You create datastes with new material you want it to learn and train it on them.
It's a quite slow and expensive process (but you can use techniques like LORA to make it faster and cheaper) and the risk is to have the model forgetting something it previously knew.
Also, it can be quite daunting to create the dataset, train the model, check the results, rinse an repeat.
Is there any other option?
Well, I thought, if I need something simple and very specific, couldn't I use just set up the proper context before having the LLM firing up its neurons?
This article is about a couple of tests I've done on this idea, I'd love to get some suggestions and feedback.
I wanted to be sure the LLM really learns something new, so I decided to teach it a simple programming language that I just invented for this purpose:
The Pushy programming language is a stack-based language. Each value on the stack is a floating point decimal value. It has the following instructions:
PSH val # push `val` on top of the stack
ADD # sum the first two values on top of the stack. Removes them from the stack and push the result on the stack. If the stack has not enough elements, use 0 by default.
SUB # subtract the first value on the stack from the second value on the stack. Removes them and push the result on the stack. If the stack has not enough elements, use 0 by default.
DIV # divide the second value on the stack by the first value on the stack. Removes them and push the result on the stack. If the stack has not enough elements, use 0 by default.
MUL # multiply the first two values on top of the stack. Removes them and push the result on the stack. If the stack has not enough elements, use 0 by default.
INT # calculate the integer part of the value on top of the stack. Removes it and push the result on the stack.
CMP # Compare the first two elements in the stack and removes them. if they are equal, push 0. If the first is greater than the second, push -1, otherwise push 1.
DRP # Removes the first element on the stack, if there is one.
DUP # duplicate the first element on the stack
SWP # swap the first two elements on the stack
LBL trg # define a label `trg` to be used for changing the flow of execution
JMP trg # jump to the instruction following the label `trg`
JGT trg # if the top of the stack is greater than 0, jump to target. removes the top element from the stack.
JLT trg # if the top of the stack is smaller than 0, jump to target. removes the top element from the stack.
JEQ trg # if the top of the stack is equal to 0, jump to target. removes the top element from the stack.
PRT # prints the first value on the stack and removes it from the stack.
GET # gets a number from the user and push it on the stack
END # terminates the program.
Example 1:
# Prints the numbers from 10 to 1
PSH 10
LBL loop
DUP
PRT
SUB 1
JGT loop
END
Example 2:
# Check if a number is odd
PSH 2
DIV
DUP
INT
CMP
JEQ even
PSH 1
JMP odd
LBL even
PSH 0
LBL odd
END
Minimal, but Turing complete.
I went to the current heavy-wheight champion of LLMs: GPT-4. I provided the context above and then asked:
Explain what this program does:
GET
PSH 2
DIV
INT
PRT
END
The result has been not bad at all:
Let's break down the given program step-by-step:
GET
: This will read a number from the input and push it onto the stack. Let's assume this number isN
.PSH 2
: This pushes the value2
onto the stack. Now the stack hasN
on the bottom and2
on top.DIV
: This divides the second value on the stack (N
) by the first value on the stack (2
) and then pushes the result. The result is ( \frac{N}{2} ) orN/2
. Now the stack has the valueN/2
on top.INT
: This takes the integer part of the value on top of the stack. IfN
was a whole number, thenN/2
will also be a whole number and there will be no change. IfN
was an odd number, thenN/2
will be of the formX.5
, and theINT
instruction will push the valueX
(i.e., the integer part) onto the stack.PRT
: This prints the value on top of the stack. So, the program will output the integer division result ofN
by2
.END
: This terminates the program. In summary, the program reads a number from the input, divides it by 2, and then prints the integer part of the division result. In essence, it's a floor division by 2 operation. If you input an even number, you'll get half that number as output. If you input an odd number, you'll get half that number rounded down. For example, if the input is5
, the program will print2
.
You can see that the model has correctly understood how to handle the instrucions I gave it. It's intersting the summarization at the end!
Let's give it another task:
Write a program to print the first 10 numbers in ascending order.
Note that the context has an example that counts from 10 to 1. My intent was to check if the LLM gets confused by that and tries to please me by repeating the code that it has already seen.
But GPT-4 outsmarted me:
To print the first 10 numbers in ascending order using the Pushy language, we'll start by initializing the stack with the number 1 and then incrementing it in a loop until we reach 10:
PSH 1 # Initialize with 1 LBL start # Label for our loop DUP # Duplicate the top of the stack to have a value to print and another for the increment and comparison PRT # Print the current value PSH 1 # Push 1 onto the stack for incrementation ADD # Add 1 to the current number to move to the next one DUP # Duplicate the current value for comparison PSH 10 # Push 10 onto the stack to compare CMP # Compare the current number to 10 JLT start # If the current number is less than 10, jump to the start of our loop END # End the program
This program will print numbers from 1 to 10 in ascending order.
Not the best commenting style, but it showed that it put the instructions there for a reason.
Too bad there is a bug in the code. It prints the numbers from 1 to 9! So I asked GPT-4 to correct it:
There is a bug that makes it only printing from 1 to 9. Please correct it.
And it answered:
Apologies for that oversight. The issue arises from our conditional jump which checks if the number is less than 10. Instead, we should check if it's less than or equal to 10.
We can modify the program to fix this by using a two-step comparison. First, we check if the value is greater than 10. If it is not, then we jump back to the loop label.
Here's the corrected code:
Followed by the correct code. I liked the fact that it not only corrected it but expained the error and what the fix would have been!
One final task:
Write a program that gets a value N from the input and calculates N mod 3.
And here is the answer:
To calculate
N mod 3
in the Pushy language, we can use the division operation and integer truncation. The formula for modulo is:
[ N \mod 3 = N - 3 \times \left\lfloor \frac{N}{3} \right\rfloor ]
Here's the program:GET # Get N from the user input DUP # Duplicate N for future use PSH 3 # Push 3 onto the stack DIV # Divide N by 3 INT # Get the integer part of the result DUP # Duplicate the integer part for multiplication PSH 3 # Push 3 onto the stack MUL # Multiply the integer part by 3 SUB # Subtract the result from the original N to get the modulo PRT # Print the result END # End the program
This program will take an input
N
, computeN mod 3
, and then print the result.
Correct, and as a bonus I got an explanation on how to compute the modulo of a number :).
Conclusions
Of course, these examples are too few to draw any definitive conclusion but it they show that we can get some reasonable results with very little effort.
Note how we leveraged the knowledge that the LLM already had. We did not have to explain what a "stack" is, nor what "ascending" or "modulo" mean. I believe this is a key point in choosing which fundational model is best to use for a specific task.
Next steps will be to stress this further to find the limits of this approach with GPT-4 (as this will set the baseline anyone should compare with) and trying doing the same with an open source LLM.
Top comments (0)
Some comments have been hidden by the post's author - find out more