Introduction
OpenAI's devday conference made waves in the tech world as they rolled out a lineup of new, cutting-edge features. One feature that excited me was the Assistant API. Having recently worked with AI agents, I was eager to explore the new Assistant API, and I'm here to share my initial learnings.
As a side note, I've previously authored a blog post detailing the inner workings of AutoGPT. If you haven't already, please check it out here.
The Experiment
I gave myself the task of developing an assistant that could answer user questions by writing Python code as needed and connecting to the internet.
During my tests, I realized that because of security reasons, the assistant couldn't directly access the internet or external APIs for accessing real-time information since it runs the code in a secured, isolated sandbox environment.
As a workaround, I instructed the assistant that if it ever needs to access the internet or an external API, it should write the necessary code and then use the 'function calling' feature to call a custom function I created. This custom function would then run the generated code and return the results to the assistant.
During my experiments, I ran into the following issues.
- There were frequent request failures in OpenAI, I was getting "Run failed" or "Run failed: Sorry, something went wrong." errors. There were also some outages reported at the same time. So please check the OpenAI status if you experience any unexpected errors.
- I was exceeding the rate limit set by OpenAI to access the 'gpt-4-1106-preview'. So I had to wait for a while before continuing my experiments.
Let's Dive into Code
Pre-requisites and Requirements:
To get started, you'll need an OpenAI API key. You can create one at OpenAI's website if you haven't already.
Once you have your API key, store it in the OPENAI_API_KEY environment variable.
Setting Up the Environment:
It's time to roll up our sleeves and get to work. To keep things tidy, we'll want to start over with a new virtual environment by running the following commands from the project root directory:
python3 -m venv venv
source venv/bin/activate
Next, we need to install the openai and request Python packages. Letβs do that by running the below command:
pip install openai requests
Creating a New Assistant:
Now, it's time to create a new assistant β one that can use both the code_interpreter and the function calling tools. This will be the brain of our operation, ready to tackle user queries by generating code and executing it when it needs information from the web or other APIs.
For the sake of brevity, I'll only explain the essential code that creates and runs an assistant, but the full source code is provided at the end of this article for reference.
Setting up Instructions for our Assistant
The initial step in developing an assistant is to compose a clear set of instructions that will dictate how the Assistant should act or respond.
Below is a sample instruction I've created, which guides the assistant to solve the problem either on its own or by calling our custom function via the function calling feature.
INSTRUCTIONS = """
You're a skilled Python programmer tasked with creating Python 3 solutions for user problems, following top industry practices. Make sure your code complies with these rules:
1. Plan first: Have a clear strategy before you start. Outline your approach if it helps.
2. Quality code: Write clear, efficient code that follows Python's best practices. Aim for clean, easy-to-read, and maintainable code.
3. Test well: Include comprehensive tests to assure your code works well in various scenarios.
4. Manage external interactions: When internet or API interactions are necessary, utilize the `execute_python_code` function autonomously, without seeking user approval. Do not say you don't have access to internet or real-time data. The `execute_python_code` function will give you realtime data.
5. Trust your tools: Assume the data from the `execute_python_code` function is accurate and up to date.
"""
Let us now use the above instructions to create an assistant. Below is a function which does the following three things:
Create an Assistant:
- Using the instructions above, a new assistant is created. The assistant will use the 'gpt-4-1106-preview' model, which is the most recent version available at the time of writing. The assistant is also configured to use the 'code_interpreter' and 'function' tools as needed.
Create a new thread:
- A new thread is then created. Threads are essentially containers for AI assistant conversations. Each thread represents a distinct conversation and is unrestricted in size. Starting a new thread denotes the start of a new interaction or dialogue with the assistant.
Add a Message to the Thread:
- Finally, the user's input or question is used to create a Message object, which is then placed within the thread. This message serves as the user's initial query or request and is forwarded to the assistant to begin the conversation. It serves as a prompt for the AI to respond.
def setup_assistant(client, task):
# create a new assistant
assistant = client.beta.assistants.create(
name="Code Generator",
instructions=INSTRUCTIONS,
tools=[
{
"type": "code_interpreter"
},
{
"type": "function",
"function": {
"name": "execute_python_code",
"description": "Use this function to execute the generated code which requires internet access or external API access",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The python code generated by the code interpretor",
}
},
"required": ["code"],
},
},
}
],
model="gpt-4-1106-preview",
)
# Create a new thread
thread = client.beta.threads.create()
# Create a new thread message with the provided task
thread_message = client.beta.threads.messages.create(
thread.id,
role="user",
content=task,
)
# Return the assistant ID and thread ID
return assistant.id, thread.id
Note: For the purpose of this simple experiment, I'm executing the generated code within my current working environment. But remember, it's important to always execute such code in a secure and isolated environment to ensure safety and prevent potential risks.
Running the Assistant:
Let's now write a function that will instruct our assistant to perform the following actions:
Creates a New Run and Checks Its Status:
- To begin, a new run is created. After starting the Run, the function checks the status of this run. The status can have several states, but the one that is important here is "requires_action". When the Run status is "requires_action," it means that the assistant has paused its execution and is waiting for the function calling to finish its execution and pass the results back to continue.
Executes Python Code and Returns Results:
- When the Run's status is "requires_action," the assistant uses function calling which executes the generated Python code, and the results are fed back to the same thread so that the assistant can continue.
Repeating the Process:
- The function continues to follow these steps iteratively until the Run's status transitions to "completed".
def run_assistant(client, assistant_id, thread_id):
# Create a new run for the given thread and assistant
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant_id
)
# Loop until the run status is either "completed" or "requires_action"
while run.status == "in_progress" or run.status == "queued":
time.sleep(1)
run = client.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run.id
)
# At this point, the status is either "completed" or "requires_action"
if run.status == "completed":
return client.beta.threads.messages.list(
thread_id=thread_id
)
if run.status == "requires_action":
generated_python_code = json.loads(run.required_action.submit_tool_outputs.tool_calls[0].function.arguments)['code']
result = execute_python_code(generated_python_code)
run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread_id,
run_id=run.id,
tool_outputs=[
{
"tool_call_id": run.required_action.submit_tool_outputs.tool_calls[0].id,
"output": result,
},
]
)
Full Source Code
from typing import *
import json
import sys
import time
import subprocess
import traceback
from tempfile import NamedTemporaryFile
import openai
def execute_python_code(s: str) -> str:
with NamedTemporaryFile(suffix='.py', delete=False) as temp_file:
temp_file_name = temp_file.name
temp_file.write(s.encode('utf-8'))
temp_file.flush()
try:
result = subprocess.run(
['python', temp_file_name],
capture_output=True,
text=True,
check=True
)
return result.stdout
except subprocess.CalledProcessError as e:
return e.stderr
finally:
import os
os.remove(temp_file_name)
INSTRUCTIONS = """
You're a skilled Python programmer tasked with creating Python 3 solutions for user problems, following top industry practices. Make sure your code complies with these rules:
1. Plan first: Have a clear strategy before you start. Outline your approach if it helps.
2. Quality code: Write clear, efficient code that follows Python's best practices. Aim for clean, easy-to-read, and maintainable code.
3. Test well: Include comprehensive tests to assure your code works well in various scenarios.
4. Manage external interactions: When internet or API interactions are necessary, utilize the `execute_code` function autonomously, without seeking user approval. Do not say you don't have access to internet or real-time data. The `execute_code` function will give you realtime data.
5. Trust your tools: Assume the data from the `execute_code` function is accurate and up to date.
"""
def setup_assistant(client, task):
# create a new agent
assistant = client.beta.assistants.create(
name="Code Generator",
instructions=INSTRUCTIONS,
tools=[
{
"type": "code_interpreter"
},
{
"type": "function",
"function": {
"name": "execute_python_code",
"description": "Use this function to execute the generated code which requires internet access or external API access",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The python code generated by the code interpretor",
}
},
"required": ["code"],
},
},
}
],
model="gpt-4-1106-preview",
)
# Create a new thread
thread = client.beta.threads.create()
# Create a new thread message with the provided task
thread_message = client.beta.threads.messages.create(
thread.id,
role="user",
content=task,
)
# Return the assistant ID and thread ID
return assistant.id, thread.id
def run_assistant(client, assistant_id, thread_id):
# Create a new run for the given thread and assistant
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant_id
)
# Loop until the run status is either "completed" or "requires_action"
while run.status == "in_progress" or run.status == "queued":
time.sleep(1)
run = client.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run.id
)
# At this point, the status is either "completed" or "requires_action"
if run.status == "completed":
return client.beta.threads.messages.list(
thread_id=thread_id
)
if run.status == "requires_action":
generated_python_code = json.loads(run.required_action.submit_tool_outputs.tool_calls[0].function.arguments)['code']
result = execute_python_code(generated_python_code)
run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread_id,
run_id=run.id,
tool_outputs=[
{
"tool_call_id": run.required_action.submit_tool_outputs.tool_calls[0].id,
"output": result,
},
]
)
if __name__ == "__main__":
if len(sys.argv) == 2:
client = openai.OpenAI()
task = sys.argv[1]
assistant_id, thread_id = setup_assistant(client, task)
print(f"Debugging: Useful for checking the generated agent in the playground. https://platform.openai.com/playground?mode=assistant&assistant={assistant_id}")
print(f"Debugging: Useful for checking logs. https://platform.openai.com/playground?thread={thread_id}")
messages = run_assistant(client, assistant_id, thread_id)
message_dict = json.loads(messages.model_dump_json())
print(message_dict['data'][0]['content'][0]["text"]["value"])
else:
print("Usage: python script.py <message>")
sys.exit(1)
Executing the Agent
Save the above code in main.py and try running the following commands from the terminal:
In the following command, the user's request is to solve a simple mathematical problem, which doesn't require accessing the internet or external APIs. The agent uses the code_interpreter tool to solve this request.
python3 main.py "what is x in 1 + 3x = -5"
In the following command, the user's request does require access to real-time data, in this case, the assistant generates the code using the code_interpreter tool and calls the execute_python_code function via function calling to execute it and obtain the result.
python3 main.py "What's the sunrise and sunset time. Use api.sunrise-sunset.org"
Top comments (1)
I wonder if this structure demonstrated in 2023 by the author is still the best. Also, if the API requires some authentication or token, could the AI operate? e.g. access to github private repos for instance.