DEV Community

Gal Bashan
Gal Bashan

Posted on • Originally published at betterprogramming.pub on

How ChatGPT’s Coding Skills Got Me Drunk

ChatGPT helped me scrape cocktail websites to create a universal drink index.

This image was generated with DALL-E, obviously.

Lately, I've become a fan of making cocktails. After getting the best equipment and booze money can buy, I realized the leading blocker for me was cocktail recipes. I wanted a platform where I could input the ingredients I have at home, and the output would be a list of cocktails I could make. There are a few apps for that, but they are limited in their variety unless you pay for the app. And since the only thing I like more than cocktails is free cocktails, I was looking for an alternative.

It was about a week after ChatGPT was released, and it had already become my best friend. Could it also be my drinking buddy? I realized many recipes were available online, and I just had to index them correctly. Can ChatGPT build me a tool to index them on my own?

TLDR — yes, here is the GitHub repo. There are even ChatGPT-generated READMEs.

The Plan

I decided to give it a go. I wanted to use ChatGPT to build a simple system to index cocktails from the web. The solution would have two parts:

  1. Crawler — a process that gets a domain to crawl and outputs URLs that appear to be cocktail recipes to a queue. It also follows additional URLs on the page recursively to search for other pages with recipes. This seemed like an easy enough task to let ChatGPT code on its own
  2. Indexer — This component is meant to get a URL, determine if the page contains a cocktail recipe, and store it in a database. The problem is that blogs with cocktail recipes are highly unstructured and have a lot of unnecessary text before reaching the point. Can ChatGPT help me make sense of this mess?

With the plan in place, I set out to start building. I started by laying down the architecture and had to decide what queue to use. ChatGPT came to the rescue:

After reviewing its suggestions, I decided to go with RabbitMQ since I had some experience with it. I asked ChatGPT to lay out the foundations for me:

With the project set up, the next step was for ChatGTP to develop the crawler.

The Crawler

I asked ChatGPT to do the work for me, and boy, did it deliver:

When I ran the program, I came into two issues. First, the visited URLs were not cached, and since almost any page was linked to the home page, we ended up in an infinite loop. I asked ChatGPT to handle that, and it modified the code correctly:

Sadly I cannot recreate this response to catch it in a better resolution.

The second issue was that it started crawling other domains as well — for example, for a page containing a video, the crawler started crawling youtube as well. I asked ChatGTP to fix that as well, and it obliged:

And that was it! The crawler was ready, and the next step was to code the indexer.

Crawler output. There are some mistakes in identifying recipes — the next step will handle that.

The Indexer

As I mentioned before, parsing the content of a web page to identify if it describes a cocktail recipe is challenging. Here is an example of a cocktail recipe web page:

It's almost impossible to handle all the different use cases to parse the page and extract the ingredients. Once again — ChatGPT to the rescue!

I came across an unofficial python implementation of the ChatGPT API and decided to use it in my indexer. The idea was simple, using a well-crafted prompt, I should be able to use ChatGPT to extract the ingredients from the cocktail page. Since ChatGPT has no internet access, it couldn't code this part, but it did help me with the generic components of it. Here is the code I used:

import pika
import time
import sys
import requests
from bs4 import BeautifulSoup
from revChatGPT.revChatGPT import Chatbot
from db import CocktailRecipe, Ingredient, session
import traceback

print(sys.argv)
config = {
    "session_token": sys.argv[1],
    "cf_clearance": sys.argv[2],
    "user_agent": sys.argv[3]
}

chatbot = Chatbot(config, conversation_id=None)

# connect to RabbitMQ
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

# create a queue to consume messages from, if it does not already exist
channel.queue_declare(queue='drink_urls', durable=True)

QUESTION = """
    From the following text below, please understand if it is an article
    describing a cocktail recipe. If it is, output the following: The first
    line should be: "<cocktail name>: <ranking>". If the ranking doesn't exist
    output "-". use only lowercase letters and the generic name of the cocktail
    , without mentioning brands. the following lines contain the ingredients:
    each line should contain one ingredient and it's amount in the format
    "<ingredient>: <amount>". The ingredient part (before the semicolon)
    should contain only the ingredient name, not the amount. If it is not a
    cocktail recepie, output only the word "no". Output the ingredients in 
    their generic name, and don't include a brand. For example, instead of 
    "bacardi white rum" output "white rum". All of the output should be lower
    cased, don't capitalize any word. The text is: %s"""

# define a callback function to process incoming messages
def process_message(ch, method, properties, body):
    time.sleep(5)
    try:
        print(body)
        page = requests.get(body)
        soup = BeautifulSoup(page.content, "html.parser")

        response = chatbot.get_chat_response(QUESTION % soup.text, output="text")['message']
        if response == "no":
            print("not a cocktail")
            return
        print(response)
    except Exception:
        print(traceback.format_exc())

# consume messages from the queue, using the callback function to process them
channel.basic_consume(queue='drink_urls', on_message_callback=process_message, auto_ack=True)

# start consuming messages
channel.start_consuming()
Enter fullscreen mode Exit fullscreen mode

This worked like a charm. ChatGPT did an excellent job understanding whether or not a page contained a cocktail recipe, and if it did, the output was almost always in the correct format. I used to think coding with Python was explicit as coding could be, but it was never that close.

Now I just had to store it in a database. I asked ChatGPT to create the database for me:

Then, I gave it a sample input generated by itself and asked it to write code for inserting the information into the database. The initial implementation it provided was using psycopg2. I asked it to use SQLAlchemy as I find it easier to work with an ORM:

Note that ChatGPT coded the parsing logic of its response for me as well

I integrated this code into the indexer and finally had the cocktail database of my dreams!

Example output from the indexer process

A view of my cocktail database

Next, I will use ChatGPT to help me create an API and UI to browse the cocktails database and choose the one I want to make today.

Conclusion

This whole project took me roughly 3 hours from end to end. ChatGPT's ability to accelerate my development process blew my mind. The biggest win was the text parsing ChatGPT provided, which I could not do myself, and ChatGPT made simple.

However, the more complex a task got, the chances of ChatGPT performing it correctly decreased drastically. When pairing with ChatGPT, it is still the developer's job to break down the work into small enough tasks for it to swallow, at least for now.

I'm excited to see what comes next!

I asked ChatGPT to rewrite the conclusion. Who did it better?


Top comments (1)

Collapse
 
mjoycemilburn profile image
MartinJ • Edited

Well done Gal. I'm a bit surprised to find I'm the first one to comment on your excellent post. I think it's going to take a while for the software development world to wake up to the significance of what has just happened.

To be honest, I think it'll take me a while to get my own head round this! I've just recently started in a simple way by asking for comments on programming errors. But I've now moved on to questions like how would you write this ...bit of Firestore WebV9 code ... in Node.js. It works like a dream and will save me a heap of time and stress. Of course we've had "coding assistants" of various kinds for a while now - where would I have been without the linting in VSCode and things like the wonderful html to JSX conversion tool I've been using on a system ugrade recently? But ChatGPT takes things to a whole new level - and not just for software development either. ChatGPT has just been doing some solar energy calculations for me that tell me how big a field I would need to pave with solar panels in order to charge my electric car (about an acre, apparentl!y).

The problem now, as you say, is how to use this intelligently. As you say, breaking down big questions into smaller, verifiable subquestions seems to be the answer for now. But longer term? Will present free use continue? How will we learn to expres questions usefully at increasingly abstract levels?

I'm really looking forward to finding out!