Dev Diary Week 4 - Saving to SQLite

#showdev #node #javascript

This post is an entry in a weekly development diary on building a feed-aggregator-based blog on Glitch. Only a small part of building an app is code; the bulk of your time is spent planning, experimenting, making mistakes, getting frustrated, and making progress in baby steps.

The plan

Last time, I set up an OPML file with some default feeds that get passed to the feedparsing API. This is a good start, but we don't want the site to want to have to request feeds every time it's loaded — that takes time, and slow sites are a terrible user experience. It's time to add a database!

As I mentioned, storing the feed content in a database saves us from having to send packets back and forth across the internet to get content every time the site loads. There are a couple of practical considerations here:

How often should we check for new content?
How do we deal with updates to content that's already in the database?

The first question is fairly straightforward. We'll want to track the last time we checked for new content on each feed and, if it's been more than a certain amount of time, we can check again for new content. The actual time interval here doesn't matter; what does matter is knowing that we'll be tracking this.

The second question is the kind of thing that sounds like it could become very complex, but should be straightforward because of the project scope I defined in week 2. Specifically, we'll be generating a linklog-style blog that only shares a title, a link back to the source content, and maybe a summary if we can get one from the feed. As such, we don't have to worry too much about how to handle the content of the post changing, though this doesn't help in the case of "reruns" — something all feed-reading apps have to deal with.

So, this week, I'm planning on doing the following:

Determining what the database structure should look like
Creating a way to check when the feed's database entry was last updated
Creating a way to add feed items to the database

This feels a bit ambitious, as I haven't really worked with SQLite before, but we'll see how it goes!

I think we can start with two tables here: a feeds table, which will store metadata about each feed, and an entries table, which will store entries from the various feeds. The theoretical limit for the number of rows in any SQLite table is 2⁶⁴, though that will realistically be constrained by the 140TB maximum database size — still plenty of room for storing the data we're interested in.

The feed table will start off with the following rows:

id (INTEGER): the unique ID for the row, autoincrementing
title (TEXT): the feed title
feedUrl (TEXT): the url of the feed
lastFetched (INTEGER): the timestamp when we last checked the feed (as Unix Time)

The entry table will start off with the following rows:

id (INTEGER): the unique ID for the row, autoincrementing
guid (TEXT): the feed GUID for the entry
title (TEXT): the entry title
entryUrl (TEXT): the url of the entry
publishDate (INTEGER): the date and time when the entry was published (as Unix Time)
feedId: a foreign key matching this entry to the correct feed in the feeds table

How did it go?

All the planning and research work took a lot more time than I expected! Implementing persistent storage is more complex than the work I've been doing so far, and while I'm fairly comfortable working in MySQL or T-SQL, I've never worked with SQLite before, so I needed to do more digging to understand its limitations and built-in types, for example.

I added a storageController.js file that has some methods for initializing and updating the database. These still need some work, but basically here's how storageController works:

When the app loads, it calls initialize(), which checks to see if the database file already exists. If it does, great! It will just output the number of feeds and entries to the server's console. If it doesn't, it'll create it based on a given schema.
When the database needs to be updated, it calls insertNewestEntries() and passes in a feedObject from feedparser.js. It then follows a couple of paths:
- If the feed already exists in the database, it adds any new entries (according to what the latest publishDate is in the feed's stored articles).
- If it doesn't, it creates the record for the feed's metadata in the feed table, and then inserts as many articles as it has in the entry table.

There are a few helper functions in the module as well, for things like getting the ID of a feed, and getting the last published date for a given feed.

What went well?

We're using the sqlite3 npm module and it's really quite a pleasure to work with. I had some example code to look at from the original starter app, which was very helpful.

What did I have trouble with?

Learning the ins and outs of a new database engine can take a while! There were a lot of new concepts to wrap my head around and it took me a lot longer than I thought to get this work done — I didn't post an update last week because of it! This is why it's important to build some flexibility into your estimates.

What did I learn?

I learned a whole lot about using SQLite! While I'm pretty comfortable writing SQL, I'd never used this particular database engine. Luckily, once things are set up, writing queries is pretty similar to MySQL or T-SQL.

What will I work on next week?

I still have some work to do here! For example, when insert the newest entries in the database, the code doesn't update the feed's lastFetched property. We'll need that later.

We also need some way to retrieve everything as a single combined feed, where the entries are ordered by published date. This will be the basis for the content we show on the blog!

And finally, we need to remove some of the endpoints I created in the first couple of weeks — they should be replaced by one to refresh all feeds that are found in the OPML file. Right now I'm adding a DISALLOW_WRITE flag to be sure that no one can add stuff to the database via the /api/parse/:feed endpoint.

Do you have a rule of thumb for estimates?

You never know what kind of interesting problems you'll stumble up when you're working on an app, but it's almost a guarantee that it'll take longer than you expect. How much of a "safety margin" do you generally add to your estimates for unexpected surprises? Has that margin gotten smaller as you get more experienced? What kind of factors, if any, influence your safety margins?

Give your Glitch apps superpowers - keep them awake, lift rate limits, and get more memory and disk space.