TL;DR
In this tutorial, you'll learn how to create an AI agent that alerts you about relevant Hacker News posts tailored to your interests. The agent sends Discord notifications whenever a post matches your criteria.
We'll write the code using Python, use Beautifulsoup for web scraping, use OpenAI with Agenta for building the AI, and Novu for Slack notifications.
Goal? No more endless scrolling on Hacker News. Let the AI bring worthy posts to you!
Agenta: The Open-source LLM app builder 🤖
A bit about us: Agenta is an end-to-end open-source LLM app builder. It enables you to quickly build, experiment, evaluate, and deploy LLM apps as APIs. You can use it by writing code in Langchain or any other framework, or directly from the UI.
Here's the plan 📝
We will write a script that does the following:
- Scrape the first five Hacker News pages for post titles using Python and Beautifulsoup.
- Use Agenta and GPT3.5 to categorize posts based on your interests.
- Send compelling posts to your Slack channel.
Setting things up:
To get started, let's create a project folder and run Poetry. If you aren't familiar with Poetry, you should check it out as it provides an alternative to virtual environments that is much easier to use.
mkdir hnbot; cd hnbot/
poetry init
This command will guide you through creating your pyproject.toml config.
Would you like to define your main dependencies interactively? (yes/no) [yes] yes
Package to add or search for (leave blank to skip): novu
Add a package (leave blank to skip): beautifulsoup4
Do you confirm generation? (yes/no) [yes] yes
Follow the prompts to set up your project. Don't forget to install novu and beautifulsoup4.
Now let's create the folder for our package and initialize the poetry environement
% mkdir hn_bot; cd hn_bot
% cd hn_bot
% poetry shell
(hn-bot-py3.9) (base) % poetry install
Now, we have a local environment where:
- All our requirements are installed.
- We have a Python package called hn_bot that is in our Python lib.
This means if we have multiple files in our library, we can import them using import hn_bot.module_name
.
Scraping Hacker News Posts
Scraping the Hacker News page is straightforward since it does not use any complicated JavaScript. The pages are located at https://news.ycombinator.com/?p=pagenumber
.
To find the titles and links on the page, we just need to open the web browser and access the dev console. Once there, we can check if there are any elements we can use to locate the titles and links. Luckily, it seems that every post is a span with the class "titleline."
We can use this to extract information from a single page. Let's write a function that extracts titles and links from Hacker News.
# hn_scraper.py
from typing import Dict, List
import requests
from bs4 import BeautifulSoup
def scrape_page(page_number: str) -> List[Dict[str, str]]:
response = requests.get(f"https://news.ycombinator.com/news?p={page_number}")
yc_web_page = response.text
soup = BeautifulSoup(yc_web_page, 'html.parser')
articles = []
for article_tag in soup.find_all(name="span", class_="titleline"):
title = article_tag.getText()
link = article_tag.find("a")["href"]
articles.append({"title": title, "link": link})
return articles
We can test it by adding a print(scrape_page(1))
at the end of the script and running it on shell:
% python hn_scraper.py
(['Linux Network Performance Parameters Explained (github.com/leandromoreira)', 'Double Commander – Changes in version 1.1.0 (github.com/doublecmd)', 'If You’ve Got a New Car, It’s a Data Privacy Nightmare (gizmodo.com)', 'Ask HN: I’m an FCC Commissioner proposing regulation of IoT security updates', 'Gcsfuse: A user-space file system for interacting with Google Cloud S
Congratulations!🎉 Now we have a script that scrapes post titles from HackerNews
Creating the AI Agent 🤖
Now that we have a list of posts, we need to use OpenAI gpt models to classify whether they are relevant based on the user's interests. For this, we are going to use Agenta.
Agenta allows you to create LLM apps from code or from the UI. Since our LLM app today is quite simple, we will create it from the UI.
Agenta can be self-hosted, however to get started quickly we'll use demo.agenta.ai.
Since our LLM app today is quite simple, we will just go ahead and create it from the UI.
You can self host agenta (Check out docs for that here (https://docs.agenta.ai/installation/local-installation/local-installation) or use the cloud-hosted demo. To get started quickly we'll do the later.
Let's go to demo.agenta.ai and login.
First, let's create a new app by clicking on "Create New App".
Then we select start from template
And use a single prompt template
Doing some Prompt Engineering In Agenta 🪄 ✨
Now we have a playground for creating the app.
First, let's add the inputs for our application. In this case, we will be using "title" for the Hacker News title and "interests" for the user's interests.
Next, we need to do a little prompt engineering. Since we are using gpt3.5 (the cheapest variant in OpenAI). It takes two messages: the system message and the user message. We can use the system message to guide the language model to answer in a certain way, while the prompt prompts the human to give the parameters of the task.
In this case, I tried a simple prompt for the system that ensures the answer is either "True" or "False." For the human prompt, I just asked the system to classify. Note that we used the fstring usual format to inject the inputs that we have added into the prompt.
Now we can then test the application with some examples of Hacker News titles:
Agenta provides tools to systematically evaluate applications and optimize prompts, parameters, and workflows (in case we are using something more complex with embeddings and retrieval augmented generations). However, in this case, such evaluation is unnecessary. The app itself is very simple, and gpt3.5 is able to solve the classification problem with minimal effort.
Let's save our changes
Then deploy the application as an API.
For this we jump to the endpoints menu and copy paste the code snippet to our code.
Wrapping it up 🌯
Now, we can create a function based on this code snippet.
# llm_classifier.py
import requests
import json
def classify_post(title: str, interests: str) -> bool:
url = "https://demo.agenta.ai/64f1d1aefeebd024bbdb1ea4/hn_bot/v1/generate"
params = {
"inputs": {
"title": title,
"interests": interests
},
"temperature": 0,
"model": "gpt-3.5-turbo",
"maximum_length": 100,
"prompt_system": "You are an expert in classification. You answer only with True or False.",
"prompt_human": "Classify whether this hackernews post is interesting for someone with the following interests:\nHacker news post title: {title}\nInterests: {interests}",
"stop_sequence": "\n",
"top_p": 1,
"frequence_penalty": 0,
"presence_penalty": 0
}
response = requests.post(url, json=params)
data = response.json()
return bool(data)
Sending a Discord message 🎮
First we need to create a new channel in Discord
Next we need to create a webhook and copy the url
Now we need to setup the integration in Novu. For this we have to go to the Integration Store, click on “Add a provider”, select Discord, and don't forget to activate it!
Last, we need to create a workflow that triggers the message to be sent to our Discord. We will add the {{content}}
variable to the message which we will later inject using the code.
Write the messaging function
Now it's time to write the message that will trigger the workflow
# novu_bot
from novu.config import NovuConfig
from novu.api import EventApi
from novu.api.subscriber import SubscriberApi
from novu.dto.subscriber import SubscriberDto
NovuConfig().configure("https://api.novu.co", "YOUR_API_KEY")
webhook_url = "..." # the webhook url we got from Discord
def send_message(msg):
your_subscriber_id = "123" # Replace this with a unique user ID.
# Define a subscriber instance
subscriber = SubscriberDto(
subscriber_id=your_subscriber_id,
email="abc@gmail.com",
first_name="John",
last_name="Doe"
)
SubscriberApi().create(subscriber)
SubscriberApi().credentials(subscriber_id=your_subscriber_id,
provider_id="discord",
EventApi().trigger(
name="slackbot", # The trigger ID of the workflow. It can be found on the workflow page.
recipients=your_subscriber_id,
payload={}, # Your Novu payload goes here
)
Putting everything together
Now we're ready to assemble all the elements to get our AI assistant running.
Let's create an app.py
file in which we first call the scraper, then the LLM classifier, and finally send a message with the interesting posts.
from hn_bot import hn_scraper, llm_classifier, novu_bot
import schedule
import time
interests = "LLMs, LLMOps, Python, Infrastructure, Tennis, MLOps, Data science, AI, startups, Computational Biology"
def main():
novu_bot.send_message("Interesting posts at HackerNews:\n")
posts = hn_scraper.scrape_page("1")
for title, url in posts:
if llm_classifier.classify_post(title, interests) == "True":
novu_bot.send_message(f"{title}\n{url}")
if __name__ == "__main__":
main()
Et voila, we have all the interesting posts coming up in our Discord.
Finally let's schedule this to run each hour ⏰
We would like to run the script to check new posts each hour. For this we need to add the python library schedule
from hn_bot import hn_scraper, llm_classifier, novu_bot
from time import sleep
interests = "LLM, LLMOps, MLOps, Data science, AI, startups"
done_post_titles = []
def main():
novu_bot.send_message("Interesting posts at HackerNews:\n")
posts = []
for i in range(1, 5):
posts += hn_scraper.scrape_page(i)
for post in posts:
title = post["title"]
url = post["link"]
if llm_classifier.classify_post(title, interests) and title not in done_post_titles:
done_post_titles.append(title)
novu_bot.send_message(f"{title}\n{url}")
if __name__ == "__main__":
while True:
main()
sleep(3600)
Congratulations on making it thus far!🎉 You've now got an automated AI assistant keeping an eye on Hacker News for you.
Summary 📜
In this tutorial, we've built an AI-powered assistant to keep you in the loop with relevant Hacker News posts. You should have learned:
- How to use Beautifulsoup for scraping hackernews
- How to create an LLM app based on one prompt using Agenta and OpenAI gpt3.5
- How to send notifications on Discord using Novu
You can check the code at this https://github.com/Agenta-AI/blog/tree/main/hackernews-bot
Thanks for reading!
Top comments (5)
This is so cool!
Thanks @nevodavid !
do we need an openai key? if yes where do we put this ? also in code?
For using the demo cloud version (demo.agenta.ai) you do not need an OpenAI API key (it's provided by us). If you are using the self-installed version, you'll need to provide yours when creating a new application in the UI.
Great article! I appreciate your detailed explanation of how to build an AI-powered Discord bot for recommending HackerNews posts using OpenAI, Novu, and Agenta. It's fascinating to see how AI and NLP tools can be leveraged to enhance user experiences within chat platforms like Discord. The step-by-step guide you've provided makes it accessible for developers to create their own AI-driven bots. Looking forward to exploring the potential of these tools in more projects