DEV Community

loading...
Cover image for I analyzed my six years of food deliveries

I analyzed my six years of food deliveries

shubhamjain profile image Shubham Jain ・5 min read

Before food delivery apps came along, ordering food online was a pain. I started using Swiggy, the Indian food delivery app, in 2016 and because it made it way too easy to get food delivered, my orders just went through the roof. I thought it will be fun to analyze how much and what have I ordered.

There are two sections of this post: a) How I did it? b) The numbers and my analysis. Depending on your interest, you can skip either.

How I did it?

Swiggy doesn't have an API to get your personal data. However, it has a page where it lists your past orders. Trouble is, it will only show five orders at once, and you have to click "Show more orders" to go further back. Naturally, you have to automate it somehow.

Figuring out the source of data

Order Page XHR

Order Page XHR

I first logged in to the web version of the app since it will be comparatively easier to analyze the requests there. It seems the orders page makes an XHR request to the URL /dapi/order/all?order_id=$$ORDER_ID$$ to fetch the orders where $$ORDER_ID$$ is used for pagination.

Order Page Response

Order Page Response

The response is an array of JSON objects with a lot of fields.

We don't need to figure out which parts of response we need to use, because we can store it all and see to that later.

I will use Postgres to store this data because a) it just works and b) has great support for querying JSON. Installing Postgres locally is really simple on Macs via the Postgres app.

Here's the schema I will use:

CREATE TABLE swiggy (
    order_id text PRIMARY KEY,
    data jsonb
);
Enter fullscreen mode Exit fullscreen mode

Why do we just need two columns? Because order_id is enough to uniquely identify every order and in the data column we can dump the whole object. As I said earlier, Postgres makes it really easy to query JSON.

Getting the data

Let's get the data! I will be using bash for this. "BASH???? ARGHHH. WHY???", you may ask. I used to have a similar aversion towards bash, but in the past year, I have realized it makes it really easy to do certain things once you get the hang of it.

It's perfectly fine if you want to use a different approach as the basic ideas would be very much portable. Like, using axios instead of cURL.

First, the basic request. We need to figure out if a simple request for getting orders would work at all. The easiest way to do that is to "Copy as cURL" the request from the developer's console and run it in your terminal.

If it works, that's great! You can start removing the unneeded parameters, like headers, user agent, and tracking cookies. After I did the trimming, the basic request to get orders boiled down to this:

curl -sL "https://www.swiggy.com/dapi/order/all?order_id=$next_order_id" \
     -H "cookie: _session_tid=$SESSION_ID"
Enter fullscreen mode Exit fullscreen mode

where $SESSION_ID is the long parameter visible in the first screenshot.

Once the request works, we can parse the JSON and iterate over the objects using jq.

IFS=$'\n'
for order in $(echo "$all_orders" | jq -c '.data.orders[]' ); do
    echo "$order"
done;
Enter fullscreen mode Exit fullscreen mode

Why do we use IFS=$'\n'? Because by default bash will split tabs and spaces too, and various names in the response (like restaurant, delivery person) can have spaces.

Now, we just need to store it in the DB. And write code to fetch the next orders. In all, the whole script will be like this:

#!/bin/bash

# Your Swiggy Session ID, check screenshot #1 
SESSION_ID="__SESSION_ID__"

while true; do
    all_orders=$(curl -sL "https://www.swiggy.com/dapi/order/all?order_id=$next_order_id" \
    -H "cookie: _session_tid=$SESSION_ID")

    IFS=$'\n'
    for order in $(echo "$all_orders" | jq -c '.data.orders[]' ); do
        # We use $$ to wrap $order because it takes cares spaces in the data
        psql "postgres://shubhamjain@localhost/stats" -c "
            INSERT INTO swiggy (order_id, data) VALUES (
               '$(echo "$order" | jq -r '.order_id')',
                \$\$$order\$\$
            );
        "
    done;

    # Next order id would be order id of the last object in array
    next_order_id=$(echo "$all_orders" | jq '.[-1].order_id')
done;
Enter fullscreen mode Exit fullscreen mode

After running this, you should see something like this:

Shell Script Execution

At some point, it will start failing, which means it has reached the end and you can quit the script then.

Note: None of this is the ideal way to do things. I wasn't interested in programming properly, only hacking together something cool.

Six years in numbers

Cool! Now we have the data. Let's jump to answering questions.

Q.1: How much I have ordered?

A LOT! I knew I was ordering a ton of times, but I never put a number on it. Actually doing that, with this project, I was taken aback realizing the number of orders I have made and the money I have spent.

SELECT COUNT(*), EXTRACT('year' FROM date(data->>'order_time')::timestamp) as year FROM swiggy GROUP BY 2;
Enter fullscreen mode Exit fullscreen mode

Total orders: 915

image

Year vs. # of orders

How much I have spent?

SELECT
   SUM(CAST(data ->> 'order_total_with_tip' AS DOUBLE PRECISION)),
   EXTRACT('year' 
FROM
   DATE(data ->> 'order_time')::TIMESTAMP) 
FROM
   swiggy 
GROUP BY
   2;
Enter fullscreen mode Exit fullscreen mode

Total spent: ₹1,84,329

image

Restaurant vs. # of orders

That seems to a lot of money. Enough to buy a decent second-hand car, at least.

Q.2: Where do I order from?

Pareto Principle seems to apply very well here. Most of my orders are for a cafe called Chai Shai. I love ordering from there—it has good snacks and everything is well-packaged.

SELECT
   COUNT(*),
   data ->> 'restaurant_name' 
FROM
   swiggy 
GROUP BY
   2 
ORDER BY
   1 DESC;
Enter fullscreen mode Exit fullscreen mode

image

Restaurant vs. money spent on orders

Q.3: When do I order?

It seems I order more during the morning and afternoon than in the evening. That can be explained by the fact that I often don't feel that hungry in the evening and I prefer something light for dinner.

SELECT
   COUNT(*),
   EXTRACT('hour' 
FROM
   TO_TIMESTAMP(data ->> 'order_time', 'YYYY-MM-DD hh24:mi:ss')::TIMESTAMP) 
FROM
   swiggy 
GROUP BY
   2;
Enter fullscreen mode Exit fullscreen mode

image

Hour of the day vs. # of orders

Q.4 What do I order?

It seems I am fond of ordering for a quick bite and desserts than a full meal. Desserts are something I should definitely cut down on.

SELECT
   cuisine,
   COUNT(*) 
FROM
   (
      SELECT
         JSONB_ARRAY_ELEMENTS_TEXT(data -> 'restaurant_cuisine') AS cuisine 
      FROM
         swiggy
   )x 
GROUP BY
   1 
ORDER BY
   2 DESC;
Enter fullscreen mode Exit fullscreen mode

image

Cuisine vs. # of orders

Conclusion

It was an interesting exercise to understand my habits over the years. There are tons of ways I can dig more into the data but for now, I will stop here. Insight fatigue isn't great either.

I don't think ordering is necessarily bad but it's lazy. Knowing the scale of my habit, I know it would help me to make a little conscious effort to order less.

Discussion (5)

pic
Editor guide
Collapse
zzoukk profile image
zZouKk

A number is missing here ₹1,84,329.

Bash scripts are cool

Collapse
shubhamjain profile image
Shubham Jain Author

Missing where?

Collapse
zzoukk profile image
zZouKk

Your total spent ₹1,84,329

You misplaced a comma or forgot a number but 1,84,329 is definitely not number said my mom

Thread Thread
shubhamjain profile image
Shubham Jain Author

Haha. There's some cultural differences in India related to numbering. 100,000 is called a 'lakh' in India. But yes, hundred thousands should be more easily understood.

en.wikipedia.org/wiki/Indian_numbe...

Thread Thread
zzoukk profile image
zZouKk

Oh... my bad. Interesting and confusing