Most LLMs and SMLs are not designed for calulations (not talking about OpenAI o1 or o3 models). Just imagine the following dialogue:
- Company: Today is Wednesday; you can return the delivery parcel within 24 hours.
- Client: Okay, let's do it on Tuesday.
Are you sure the next AI response will be correct? As a human, you can understand that next Tuesday is six days ahead, while 24 hours is just one day. However, most LLMs cannot reliably handle such logic. Their responses are non-deterministic.
This issue worsens as the context grows. If you have 30 rules and a conversation history of 30 messages, the AI loses focus and makes mistakes easily.
Common Use-Case
- You're developing an AI scheduling chatbot or AI agent for your company.
- The company has scheduling rules that are frequently updated.
- Before scheduling, the chatbot must validate customer input parameters.
- If validation fails, the chatbot must inform the customer.
What Can We Do?
Combine traditional code execution with LLMs. This idea is not new but remains underutilized:
- OpenAI integrates this feature into its Assistant API, but not in Complitions API.
- Google recently introduced code interpreter capabilities in Gemini 2.0 Flash.
Our Solution Tech Stack
- Docker (Podman)
- LangGraph.js
- Piston
Code Interpreter Sandbox
To securely run generated code, the most popular cloud code interpreters are e2b, Google, and OpenAI as I mentioned before.
However, I was looking for an open-source, self-hosted solution for flexibility and cost-effectiveness. So, 2 good options:
- Piston
- Jupyter
I chose Piston for its ease of deployment.
Piston Installation
It took me a while to understand how to add python execution environment to Piston.
0. Enable cgroup v2
For Windows WSL, this article was helpful.
1. Run a Container
docker run --privileged -p 2000:2000 -v d:\piston:'/piston' --name piston_api ghcr.io/engineer-man/piston
2. Checkout the Piston Repository
git clone https://github.com/engineer-man/piston
3. Add Python Support
Run the following command:
node cli/index.js ppman install python
By default, this command uses your container API running on localhost:2000
to install Python.
Example Code Execution
Using the Piston Node.js Client:
import piston from "piston-client";
const codeInterpreter = piston({ server: "http://localhost:2000" });
const result = await codeInterpreter.execute('python', 'print("Hello World!")');
console.log(result);
AI Agents Implementation
We're going to use some advanced techniques:
- Graph and subgraph architecture
- Parallel node execution
- Qdrant for storage
- Observability via LangSmith
- GPT-4o-mini, a cost-efficient LLM
Refer to the LangSmith trace for a detailed overview of the flow:
https://smith.langchain.com/public/b3a64491-b4e1-423d-9802-06fcf79339d2/r
Step 1: Extract datetime-related scheduling parameters from user input
Example: "Tomorrow, last Friday, in 2 hours, at noon time."
We use code interpreter to ensure reliable extraction, as LLMs can fail even with current date-time contextual information.
Example Prompt for Python Code Generation:
Your task is to transform natural language text into Python code that extracts datetime-related scheduling parameters from user input.
## Instructions:
- You are allowed to use only the "datetime" and "calendar" libraries.
- You can define additional private helper methods to improve code readability and modularize validation logic.
- Do not include any import statements in the output.
- Assume all input timestamps are provided in the GMT+8 timezone. Adjust calculations accordingly.
- The output should be a single method definition with the following characteristics:
- Method name: \`getCustomerSchedulingParameters\`
- Arguments: None
- Return: A JSON object with the keys:
- \`appointment_date\`: The day of the month (integer or \`None\`).
- \`appointment_month\`: The month of the year (integer or \`None\`).
- \`appointment_year\`: The year (integer or \`None\`).
- \`appointment_time_hour\`: The hour of the day in 24-hour format (integer or \`None\`).
- \`appointment_time_minute\`: The minute of the hour (integer or \`None\`).
- \`duration_hours\`: The duration of the appointment in hours (float or \`None\`).
- \`frequency\`: The recurrence of the appointment. Can be \`"Adhoc"\`, \`"Daily"\`, \`"Weekly"\`, or \`"Monthly"\` (string or \`None\`).
- If a specific value is not found in the text, return \`None\` for that field.
- Focus only on extracting values explicitly mentioned in the input text; do not make assumptions.
- Do not include print statements or logging in the output.
## Example:
### Input:
"I want to book an appointment for next Monday at 2pm for 2.5 hours."
### Output:
def getCustomerSchedulingParameters():
"""Extracts and returns scheduling parameters from user input in GMT+8 timezone.
Returns:
A JSON object with the required scheduling parameters.
"""
def _get_next_monday():
"""Helper function to calculate the date of the next Monday."""
current_time = datetime.utcnow() + timedelta(hours=8) # Adjust to GMT+8
today = current_time.date()
days_until_monday = (7 - today.weekday() + 0) % 7 # Monday is 0
return today + timedelta(days=days_until_monday)
next_monday = _get_next_monday()
return {
"appointment_date": next_monday.day,
"appointment_month": next_monday.month,
"appointment_year": next_monday.year,
"appointment_time_hour": 14,
"appointment_time_minute": 0,
"duration_hours": 2.5,
"frequency": "Adhoc"
}
### Notes:
Ensure the output is plain Python code without any formatting or additional explanations.
Step 2: Fetch Rules from Storage
And then transform them into Python code for validation.
Step 3: Run Generated Code in Sandbox:
const pythonCodeToInvoke = `
import sys
import datetime
import calendar
import json
${state.pythonValidationMethod}
${state.pythonParametersExtractionMethod}
parameters = getCustomerSchedulingParameters()
valiation_errors = validateCustomerSchedulingParameters(parameters["appointment_year"], parameters["appointment_month"], parameters["appointment_date"], parameters["appointment_time_hour"], parameters["appointment_time_minute"], parameters["duration_hours"], parameters["frequency"])
print(json.dumps({"validation_errors": valiation_errors}))`;
const traceableCodeInterpreterFunction = await traceable((pythonCodeToInvoke: string) => codeInterpreter.execute('python', pythonCodeToInvoke, { args: [] }));
const result = await traceableCodeInterpreterFunction(pythonCodeToInvoke);
Potential Improvements
- Implement an iterative loop for LLMs to debug and refine Python code execution dynamically.
- Human in the loop for validation method code generation.
- Caching generated code.
Final Thoughts
Bytecode execution and token-based LLMs are highly complementary technologies, unlocking a new level of flexibility. This synergistic approach has a bright future, for example AWS's recent "Bedrock Automated Reasoning", which appears to offer a similar solution within their enterprise ecosystem. Google and Microsoft also will show us something similar very soon.
Top comments (0)