DEV Community

RobL
RobL

Posted on

Mission to watch Stephen King in order.

A new mission. We need to watch all of the Stephen King adaptations in order of release. It's all on https://stephenking.com/works/movie/index.html but I want to pull each title into a spreadsheet. So I'm going to knock up a quick script to extract all of the movies in order.

The markup on the page is quite nice.

<div class="works-inner">
  <a href="/works/movie/carrie.html" class="row work" data-date="1976-0-03, " data-sort="Carrie">
    <div class="col-12 col-sm-6 works-title">Carrie</div>
    <div class="col-6 col-sm-3 works-type">Movie</div>
    <div class="col-6 col-sm-3 works-date">November 03rd, 1976</div>
  </a>
  <a href="/works/movie/shining.html" class="row work" data-date="1980-0-23, " data-sort="Shining, The">
    <div class="col-12 col-sm-6 works-title">The Shining</div>
    <div class="col-6 col-sm-3 works-type">Movie</div>
    <div class="col-6 col-sm-3 works-date">May 23rd, 1980</div>
  </a>
</div>
Enter fullscreen mode Exit fullscreen mode

I can grab the markup quite easily.

require 'open-uri'

html = open('https://stephenking.com/works/movie/index.html').read
Enter fullscreen mode Exit fullscreen mode

Ok, but I want to parse it. I can use Nokogiri to extract the data I need.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('https://stephenking.com/works/movie/index.html'))
Enter fullscreen mode Exit fullscreen mode

We can use .css method to extract all matches for the CSS selector.

doc.css('.work')
Enter fullscreen mode Exit fullscreen mode

Each link has a selector of work and inside that we have a div each with a convenient selector for the data we want

doc.css('.work').map do |w| 
  [
    w.css('.works-title')[0].content,
    w.css('.works-date')[0].content
  ]
end
Enter fullscreen mode Exit fullscreen mode

That's great but I want to sort by the date of release. Ruby copes with Dates and Times sure. But Rails has some handy convenience extensions provided by active_support.

> Date.parse('November 03rd, 1976')
=> Wed, 03 Nov 1976
Enter fullscreen mode Exit fullscreen mode

Not every value is actually a date.

> Date.parse('TBD')
Traceback (most recent call last):
(irb):10:in `parse': invalid date (Date::Error)
Enter fullscreen mode Exit fullscreen mode

We can use a quick rescue. This is horrid but this is a quick script.

irb(main):012:0> Date.parse('TBD') rescue nil
=> nil
Enter fullscreen mode Exit fullscreen mode

We can then sort our records by the date. Here's the full script. It works a treat.

require 'active_support'
require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('https://stephenking.com/works/movie/index.html'))
puts doc.css('.work').map do |w| 
  [
    w.css('.works-title')[0].content, 
    (Time.parse(w.css('.works-date')[0].content) rescue nil)
  ] 
end
  .sort_by { |a| a[1] || Time.now }
  .map { |a| a[0] }
Carrie
The Shining
Creepshow
Cujo
The Dead Zone
Christine
Children of the Corn
Cat's Eye
Silver Bullet
Maximum Overdrive
Stand By Me
Creepshow 2
The Running Man
Pet Sematary (1989)
Tales from the Darkside: The Movie
Graveyard Shift
Misery
Sleepwalkers
The Dark Half
Needful Things
The Shawshank Redemption
The Mangler
Dolores Claiborne
Thinner
The Night Flier
Apt Pupil
The Green Mile
Hearts in Atlantis
Dreamcatcher
Secret Window
Riding the Bullet
1408
The Mist
Dolan's Cadillac
Mercy
A Good Marriage
Cell
My Pretty Pony
The Dark Tower
IT - Part 1: The Losers' Club
Gerald's Game
1922
Pet Sematary (2019)
IT: Chapter Two
In the Tall Grass
Doctor Sleep
Firestarter
Mr. Harrigan's Phone
The Girl Who Loved Tom Gordon
Hearts
Suffer the Little Children
Salem's Lot
Enter fullscreen mode Exit fullscreen mode

I guess we're starting with Brian De Palma's Carrie.

Top comments (0)