As developers, we don’t have the time or patience for routine tasks. We like to get things done, and any tools that can help us automate are high on our radar.
Enter Huginn, a workflow automation server similar to Zapier or IFTTT, but open source. With Huginn you can automate tasks such as watching for air travel deals, continually watching for certain topics on Twitter, or scanning for sensitive data in your code.
Recently a post about Huginn hit the top of Hacker News. This piqued my interest, so I wanted to see why it's so popular, what it's all about, and what it's being used for.
I reached out to Huginn's creator, Andrew Cantino, to ask him why he started it.
"I started the project in 2013 to scratch my own itch—I wanted to scrape some websites to know when they changed (web comics, movie trailers, local weather forecasts, Craigslist sales, eBay, etc.) and I wanted to be able to automate simple reactions to those changes. I'd been interested in personal automation for a while and Huginn was initially a quick project I built over the Christmas holidays that year."
However, that simple Christmas-holiday project quickly grew.
Today, Huginn is a community-driven project with hundreds of contributors and thousands of users. Andrew still uses Huginn for its original use case:
"I still primarily use Huginn for this purpose: it tells me about upcoming yard sales, if I should bring an umbrella today because of rain in the forecast, when rarely-updated blogs have changed, when certain words spike on Twitter, etc. I also have found it very useful for sourcing information for the weekly newsletter that I write about the space industry, called The Orbital Index."
However, the community has found a wider range of uses. So let's look at exactly what Huginn is, how to set it up, and how to use it to automate your everyday life.
Huginn is a web-based scheduling service that runs workers called Agents. Each Agent performs a specific function, such as sending an email or requesting a website. Agents generate and consume JSON payloads called events, which can be used to chain Agents together. Agents can be scheduled, or executed manually.
It's easy to deploy Huginn with just one click using the Deploy to Heroku button. Huginn also supports Docker and Docker Compose, manual installation on Linux, and many other deployment methods. After installing, you can extend Huginn by using one of the many available Agent Gems, or by creating your own.
Once you've deployed Huginn and have logged in (check your specific setup for the URL), creating a new Agent is simple, as seen in this screen shot. This Agent follows a Twitter stream in real time.
Here's an existing Agent that pulls the latest comic from xkcd.com. You can see the basic stats of the Agent (last checked, last created, and so on). The Options field shows how the Agent is configured, including the CSS selectors used to extract data from the page.
You can also organize Agents into Scenarios, which allows you to group similar Agents as well as import and export Agent configurations as JSON files. You can also fine-tune Agent scheduling and configuration using special Agents called Controllers. Here we see a Scenario build around the theme of "Entertainment."
Lastly, Huginn uses the Liquid templating engine, which allows you to load dynamic content into Agents. This is commonly used to store configuration data (such as credentials) separately from Agents.
Here, it's used to format the URL, title, and on-hover text from the XKCD Source Agent as HTML:
In addition to web scraping, Huginn supports a wide variety of actions that can allow for some truly complex workflows. Disclaimer: Many sites disallow automated web scraping. Be sure to check the terms of service (TOS) of any website you intend to access using Huginn.
Some of the examples from the GitHub page include:
- Watch for air travel or shopping deals
- Follow your project names on Twitter and get updates when people mention them Connect to Adioso, HipChat, Basecamp, Growl, FTP, IMAP, Jabber, JIRA, MQTT, nextbus, Pushbullet, Pushover, RSS, Bash, Slack, StubHub, translation APIs, Twilio, Twitter, Wunderground, and Weibo, to name a few.
- Send digest emails with things that you care about at specific times during the day
- Track counts of high frequency events and send an SMS within moments when they spike
- Send and receive WebHooks
- Track your location over time
- Create Amazon Mechanical Turk workflows as the inputs, or outputs, of agents (the Amazon Turk Agent is called the "HumanTaskAgent"). For example: "Once a day, ask 5 people for a funny cat photo; send the results to 5 more people to be rated; send the top-rated photo to 5 people for a funny caption; send to 5 final people to rate for funniest caption; finally, post the best captioned photo on my blog."
Let's look at a few of these use cases in detail.
Using the Website Agent, you can fetch the latest contents of multiple web pages, filter and aggregate the results, then send the final contents to yourself as an email. The default Scenario demonstrates this by fetching the latest XKCD comic. This creates an event containing the comic title, URL, and on-hover text, which are rendered as HTML via an Event Formatting Agent. Another Website Agent simultaneously gets the latest movie trailers from iTunes, then both events are merged into an Email Digest Agent that fires each afternoon:
Huginn supports several social networks including Twitter and Tumblr. These Agents can watch for new posts, trending topics, and updates from other users.
Let’s say you live in a hurricane-prone area and want to follow the impact of a storm. Using a Twitter Stream Agent, you can watch for Tweets containing “hurricane,” “storm,” and so on, and pass the results to a Peak Detector Agent. This counts Tweets over a period of time, measures the standard deviation, and fires an event if it detects an outlier. You can have this event trigger an Email Agent that notifies you immediately. Andrew Cantino explains this use case in more detail on his blog.
Huginn makes an excellent online shopping tool. When shopping for the best deal, create Website Agents to run daily searches on discount and trading sites. Use Event Formatting Agents to extract prices, then use a Change Detector Agent to compare the last retrieved price to the current price. If it’s lower, you can extract the item URL and send it straight to your inbox.
Staying on top of security updates is a continuous process. You can use Huginn to watch the National Vulnerability Database for CVEs affecting your systems and notify you immediately. If you want to filter the results (e.g. only show high-priority alerts), you can use a Trigger Agent to only allow results where the severity is above a certain value.
Huginn comes with some powerful Agents that greatly extend its capabilities beyond web scraping.
Huginn can read files stored on the host, making it a useful data processing tool. Let's say you're testing changes to a codebase, and before you commit, you want to scan for any sensitive data that you might have left in during testing. You can create a Local File Agent to scan your project directory, pass the contents to an Event Formatting Agent, and use regular expressions to detect credentials, passwords, and similar strings. Alternatively, you could use a Shell Command Agent to call a utility like repo-supervisor and fire a desktop notification when it detects matches.
One of Huginn’s first great successes was its adoption by the New York Times to automate newsroom tasks. During the 2014 Winter Olympics, Huginn monitored their data pipeline availability and sent notifications when medals were awarded. Huginn also notified reporters when new stories published and updated a Slack channel when content changed on nytimes.com. You can learn more about their use cases at Huginn for Newsrooms.
Huginn is a deceptively simple tool with a lot of flexibility. The best way to see what it can do is to try it yourself. To learn more, visit https://github.com/huginn/huginn.