DEV Community

Liz Laffitte
Liz Laffitte

Posted on

WP2AT Part 4: Comparing Datasets

I'm working on a Ruby CLI gem that will automate adding WordPress blog data to AirTable. It uses the WordPress API to collect a blog's title, date published, ID and URL, formats the data, and then uses the AirTable API to create new rows in a specified table.

Last time I discussed my new workflow. Instead of adding all the data received from the WordPress to AirTable every time the sync method was called, the gem will compare the AirTable's ID column to the ids of the WordPress posts, and only add new posts.

collect_row_data()

I shared my collect_row_data() method in my last post. This method calls call_at(), and collects the HTTP response data in a usable format for the rest of the gem.

I've revised it so that it isn't requesting data it doesn't need, speeding up the process. This is accomplished by passing in the fields I want (ID and Last Modified) as query parameters. The responses are also collected into a hash, instead of an array. I made this change so that later, when comparing our two datasets, I can easily get the ID column data by calling .keys on row_data.

    def collect_row_data
        row_data = {}
        offset = ""
        loop do
            at_response = call_at("fields%5B%5D=ID&fields%5B%5D=Last+Modified",offset)
            at_response.parsed_response["records"].collect{|post| row_data[post["fields"][@current_settings.headers[:id]]] = [post["id"], post["fields"]["Last Modified"]]}
            offset = at_response.parsed_response["offset"]
            break if !at_response.parsed_response["offset"]
        end
        row_data
    end
Enter fullscreen mode Exit fullscreen mode

sync()

When a user wants to update the WordPress data in AirTable, they call the sync method.

In its latest form, sync pings WordPress to get the total number of result pages and ensure the user has entered the blog post URL correctly.

It then gathers the Wordpress data (collect_post_data) and AirTable data (collect_row_data) into two hashes. The datasets are compared by passing arrays of their ids into compare_datasets() and a hash is returned. The all_data hash will always have a key :current, and sometimes have a of :new. If the hash has a key of :new, we filter the WordPress data, keeping only the posts whose ids appear in all_data[:new]. That data is added to AirTable.

    def sync
        if ping_wp
            post_data = collect_post_data()
            rows = collect_row_data
            all_data = compare_datasets(rows.keys, post_data[:ids])
            if all_data[:new]
                data = prep_data(post_data[:posts].keep_if{|post| all_data[:new].include? post["id"]})
                add_to_at(data, @@at_api)  
            else
                puts "All data up-to-date"
            end
        else
            puts "There was an issue. Try correcting your blog's URL"
        end

    end
Enter fullscreen mode Exit fullscreen mode

compare_datasets()

This method takes our two arrays of ids as arguments. We get all the new posts by subtracting the AirTable id array from the WordPress id array. We are left with any values that appear only on the WordPress array. These are the ids of our new posts to be added to AirTable!

    def compare_datasets(at_arr, wp_arr)
        new_posts = wp_arr - at_arr
        post_data = {:current => at_arr}
        if new_posts.count > 0 
            post_data[:new] = new_posts
        end
        post_data
    end
Enter fullscreen mode Exit fullscreen mode

Next Steps

I now have all of the post data in data structures that make them easy to compare and change. The next step will be altering the WordPress API call so that we get the last modified date for each post. We'll compare these to the last modified date of each AirTable row. If WordPress has been modified later than AirTable, we'll update AirTable with the Wordpress info.

Top comments (0)