Table of Contents
-
Introducing Calorie GPT: Track your Calories using LLMs - Part 1
- Table of Contents
- Introduction
- Building Calorie GPT with LangDB
- Step 1: Create a Prompt
- Step 2: Validating the Data
- Invalid Example
- Valid Example
- Step 3: Create a model with tools
- Step 3: Introducing Human-in-the-Loop Mechanism
- Step 4: Storing & Accessing Data
- Running queries against a user
- Weekly & Daily Statistics
- Upcoming Articles in the Series
NOTE: Original article can be found at https://app.langdb.ai/share/apps/319346f1-4616-42a9-bf35-57dc28d41c06
Introduction
I have always wanted to figure out my intake without having to fill up information manually. So here is a solution for all the lazy people out there. Calorie GPT will get the composition using LLM VIsion APIs. Through this we can also explore various ways of improving accuracy. This article will be the first of the series. I'm currently testing this for myself, with plans to make it available for general usage later on. In this article, we'll explore how to build this solution using LangDB, which allows to build full LLM applications just using SQL. Let's dive in.
At the end of the exercise we can do the following
1) Extract calorie information from images
2) Get uploads by user (THis could be integrated through an API)
3) Weekly and Daily Statistics
For example, here is a query to get weekly analytics.
WITH
(SELECT
toStartOfDay(toTimeZone(now(), 'Asia/Singapore')) AS today) AS today
SELECT
date,
total_calories,
groupArray((nutrient_name, total_amount, unit)) AS nutrients,
user_id
FROM (
SELECT
date,
SUM(total_calories) AS total_calories,
nutrient_name,
ceil(SUM(amount), 1) AS total_amount,
MAX(unit) AS unit,
user_id
FROM (
SELECT
toStartOfDay(toTimeZone(created_at, 'Asia/Singapore')) AS date,
calories_per_serving AS total_calories,
tupleElement(n, 1) AS nutrient_name,
tupleElement(n, 2) AS amount,
tupleElement(n, 3) AS unit,
user_id
FROM
food_activity
ARRAY JOIN nutrition_items AS n
WHERE
created_at >= today - INTERVAL 5 DAY
)
GROUP BY
date, nutrient_name, user_id
)
GROUP BY
date, total_calories, user_id
ORDER BY
date DESC
LIMIT 5;
date | total_calories | nutrients | user_id |
---|---|---|---|
2024-08-06 00:00:00 | 1268 | [["fat",43,"g"],["vitamin A",428.5,"mg"],["protein",67,"g"]] | hgOzDg4yXCOnk3leagMV7onEARO2 |
Building Calorie GPT with LangDB
Our journey begins with building a LangDB model that utilizes vision APIs to analyze food items and determine their nutritional content. To ensure accuracy and reliability, we'll experiment with various providers later in the series.
This model will be responsible for identifying the food items and extracting nutritional information such as protein, fat, vitamins, and total calories. We'll start with a single provider and validate the data to ensure it meets our required format.
We ll do the following
- Create a prompt with the instructions
- Create a validator
- Create a model with tools
Step 1: Create a Prompt
FIrst lets create a prompt and instruct to return JSON
in a format that we are interested.
Step 2: Validating the Data
Data validation is crucial to ensure the information returned by the LLMs is accurate and in the format we need. We'll create a validator that checks the data for consistency and correctness.
Here we can simply use SQL to validate our JSON response.
CREATE PROMPT nutrient_info_prompt (
system '
You are a model that processes images of meals to determine the nutrient content of each food item.
Upon receiving an image, analyze it to identify each distinct food item, estimate the quantity, and calculate the corresponding nutrient content.
= Always Return the result in a valid JSON format.
- Always use the test_calorie_json tool to check if the format is right.
- Only respond json and nothing else.
- Units can only be in mg and g.
[{"name": "item","quantity": <quantity>,"default_serving": <usual_serving>,"unit": <unit>,"nutrients": [ {"type": "protein", "amount": <amount>, "unit": "g"}, {"type": "fat", "amount": <amount>,"unit": "g"}, {"type": "vitamin A", "amount": <amount>,"unit": "mg"}],"calories_per_serving": <calories>}]
Example:
[{"name": "apple","quantity": 1,"default_serving": 10,"unit": "g","calories_per_serving": 95,"nutrients": [ {"type": "protein", "amount": 30, "unit": "g" }, {"type": "fat", "amount": 20, "unit": "g"}, {"type": "vitamin A", "amount": 0.54, "unit": "mg"}]}]
',
human $input
)
CREATE VIEW test_calorie_json (
json_data string "Input must be a JSON array string"
) AS(
SELECT
(COUNT(JSONExtractArrayRaw($json_data)) > 0) AS valid
WHERE
isValidJSON($json_data) AND JSONExtractString($json_data, '') IS NOT NULL
)
Lets test with one sample json output
Invalid Example
SELECT * from test_calorie_json('[{
"name": "apple",
"quantity": 1,
"default_serving": 10,
"unit": "g",
"calories_per_serving": 95,
}]')
valid |
---|
0 |
Valid Example
SELECT * from test_calorie_json('[{
"name": "apple",
"quantity": 1,
"default_serving": 10,
"unit": "g",
"calories_per_serving": 95,
"nutrients": [
{"type": "protein", "amount": 30, "unit": "g" },
{"type": "fat", "amount": 20, "unit": "g"},
{"type": "vitamin A", "amount": 0.54, "unit": "mg"}
]
}]')
valid |
---|
1 |
Step 3: Create a model with tools
CREATE MODEL calorie_tracker (
input
)
USING open_ai_provider(model_name='gpt-4o-mini')
PROMPT nutrient_info_prompt
TOOLS (
test_calorie_json comment 'tests for valid calorie response or returns an error'
)
We can already using this to model to get calories. Lets try with a couple of images.
CREATE TABLE calorie_images (
name String,
link String
) Engine = Memory()
INSERT into calorie_images(name, link) values
('Pasta', 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAYAAABccqhmAAAAAXNSR0IArs4c6QAAIABJREFUeF7kvWmTLOd13/nLPbMqa'),
('Burrito','data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBwgHBgkIBwgKCgkLDRYPDQwMDRsUFRAWIB0iIiAdHx8kKDQsJCYxJx8fLT0tMTU3Ojo6Iys')
Lets visualize the images using python
%export df%
select * from calorie_images
Variable df exported
%python%
from IPython import display
from base64 import b64decode
html = ''
for image_url in df['link']:
html = html + f'<img style="width:200px" src="{image_url}" />'
display.HTML(html)
Tracing
If we look at the traces, we can notice the following
- It took about
20
seconds to analyize the picture - First tool call failed with the wrong json
- Second call worked and responded with the json
Step 3: Introducing Human-in-the-Loop Mechanism
In the mobile app, we also introduce a human in the loop screen for the human to fix the numbers if LLMs are producing incorrect information.
Step 4: Storing & Accessing Data
Once the data is validated and refined, we'll store it in our database. We'll use the uploads
table for raw data storage and the food_activity
table for structured data. The food_activity
table schema includes fields for user ID, food item, nutritional information, total calories, and timestamps, enabling efficient tracking and querying.
CREATE TABLE IF NOT EXISTS uploads
(
upload_id UUID,
user_id String,
image Nullable(String),
created_at DateTime DEFAULT now()
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (created_at, upload_id);
CREATE TABLE IF NOT EXISTS food_activity
(
id UUID DEFAULT generateUUIDv4(),
upload_id UUID,
user_id String,
name String,
quantity Float32,
calories_per_serving UInt32,
unit String,
nutrition_items Array(Tuple(String, Float32, String)), -- Array of nutrients like protein, fat, vitamin A, etc., with their units
created_at DateTime DEFAULT now()
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (created_at, id);
Running queries against a user
Uploads by user and upload_id
SELECT
user_id,
u.upload_id,
COUNT(*) AS no_items,
SUM(f.calories_per_serving) AS total_calories,
u.created_at AS created_at,
groupArray((name,quantity,calories_per_serving,unit,nutrition_items)) AS items
FROM
food_activity AS f
JOIN
uploads AS u
ON
f.upload_id = u.upload_id
GROUP BY
u.upload_id, u.created_at, u.image, user_id
ORDER BY
u.created_at DESC;
user_id | u.upload_id | no_items | total_calories | created_at | items |
---|---|---|---|---|---|
hgOzDg4yXCOnk3leagMV7onEARO2 | a11a0b1e-3663-4a06-8704-0d4c39b48fa1 | 3 | 268 | 2024-08-06 14:14:49 | [["spaghetti",150,158,"g",[["protein",5,"g"],["fat",1,"g"],["vitamin A",0,"mg"]]],["fried egg",1,90,"unit",[["protein",6,"g"],["fat",7,"g"],["vitamin A",270,"mg"]]],["bell pepper",50,20,"g",[["protein",1,"g"],["fat",0,"g"],["vitamin A",157,"mg"]]]] |
hgOzDg4yXCOnk3leagMV7onEARO2 | c4a1a434-fba7-4b45-ab02-537d18265b72 | 1 | 500 | 2024-08-06 14:11:30 | [["chicken_burrito",1,500,"piece",[["protein",30,"g"],["fat",20,"g"],["vitamin A",0.5,"mg"]]]] |
hgOzDg4yXCOnk3leagMV7onEARO2 | db8d380b-3677-417f-a134-3d140208ddf6 | 1 | 500 | 2024-08-06 13:39:22 | [["chicken burrito",1,500,"g",[["protein",25,"g"],["fat",15,"g"],["vitamin A",1,"mg"]]]] |
Weekly & Daily Statistics
WITH
(SELECT
toStartOfDay(toTimeZone(now(), 'Asia/Singapore')) AS today) AS today
SELECT
date,
total_calories,
groupArray((nutrient_name, total_amount, unit)) AS nutrients,
user_id
FROM (
SELECT
date,
SUM(total_calories) AS total_calories,
nutrient_name,
ceil(SUM(amount), 1) AS total_amount,
MAX(unit) AS unit,
user_id
FROM (
SELECT
toStartOfDay(toTimeZone(created_at, 'Asia/Singapore')) AS date,
calories_per_serving AS total_calories,
tupleElement(n, 1) AS nutrient_name,
tupleElement(n, 2) AS amount,
tupleElement(n, 3) AS unit,
user_id
FROM
food_activity
ARRAY JOIN nutrition_items AS n
WHERE
created_at >= today - INTERVAL 5 DAY
)
GROUP BY
date, nutrient_name, user_id
)
GROUP BY
date, total_calories, user_id
ORDER BY
date DESC
LIMIT 5;
date | total_calories | nutrients | user_id |
---|---|---|---|
2024-08-06 00:00:00 | 1268 | [["fat",43,"g"],["vitamin A",428.5,"mg"],["protein",67,"g"]] | hgOzDg4yXCOnk3leagMV7onEARO2 |
Upcoming Articles in the Series
Our journey doesn't end here. In the next articles, we'll delve deeper into the following topics:
Evaluating Multiple Providers: We'll assess various vision API providers to measure accuracy and reliability, helping us choose the best options for our system.
Q&A: Using SQL generation and metadata tables we can power workflows of providing interesting insights to the user with personalisation
Integrating with React Native: Learn how to integrate Calorie GPT with React Native or other applications, making it accessible on mobile devices.
Incorporating RAG Techniques: We'll explore Retrieval-Augmented Generation (RAG) techniques to improve the accuracy of our models further.
Building a Copilot Experience: Discover how to enhance Calorie GPT with automatic notifications and personalized suggestions, modeling user behavior for a more interactive and supportive experience.
Stay tuned as we continue to refine and expand Calorie GPT, transforming the way we track and manage our daily calorie intake. With LangDB at the core, we're excited to bring you a powerful tool that combines the latest in AI and database technology for effortless calorie tracking.
Top comments (0)