DEV Community

Cover image for How to visualize timeline of a Wiki article?
Sanjaya Kumar Saxena for WinkJS

Posted on • Updated on • Originally published at observablehq.com

How to visualize timeline of a Wiki article?

Automatic generation of the timeline — a graphical representation of a time period, on which important events are marked — from a Wikipedia article is a fascinating idea and very useful in quickly grasping the historical perspective. This post outlines the approach to create a well formatted timeline from any Wikipedia article using WinkNLP’s API and Named Entity Recognition (NER) feature:

  1. Fetch the article's contents and convert them into a WinkNLP document.
  2. Iterate through detected entities and filter only DATEs.
  3. Use shapes of dates to convert them into standard Unix time.
  4. Using parentSentence() API, extract the sentence containing the date; also markup() the date to highlight it in the corresponding sentence.
  5. Collect each Unix time and sentence pair in an array and sort them on Unix time.
  6. Converts this array into a well formatted timeline using Observable capabilities along with some CSS.

The above approach is realized in about 30 lines of code:

timeLine = {
  const response = await fetch( `https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=${WikiArticleTitle || '2022 United Nations Climate Change Conference'}&explaintext=1&formatversion=2&format=json&origin=*` );
  const body = await response.json();
  const text = body.query.pages[ 0 ].extract;

  var doc = nlp.readDoc( text || '' );
        var timeline = [];
        doc
          .entities()
          .filter( ( e ) => {
            var shapes = e.tokens().out( its.shape );
            // We only want dates that can be converted to an actual
            // time using new Date()
            return (
              e.out( its.type ) === 'DATE' &&
              (
                shapes[ 0 ] === 'dddd' ||
                ( shapes[ 0 ] === 'Xxxxx' && shapes[ 1 ] === 'dddd' ) ||
                ( shapes[ 0 ] === 'Xxxx' && shapes[ 1 ] === 'dddd' ) ||
                ( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) ||
                ( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxx' &&  shapes[ 2 ] === 'dddd' ) ||
                ( shapes[ 0 ] === 'd' &&  shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) ||
                ( shapes[ 0 ] === 'd' &&  shapes[ 1 ] === 'Xxxx' &&  shapes[ 2 ] === 'dddd' )
              )
            );
          })
          .each( ( e ) => {
            e.markup();
            let eventDate = e.out();
            if ( isNaN( eventDate[ 0 ] ) ) eventDate = '1 ' + eventDate;
            timeline.push({
              date: e.out(),
              unixTime: new Date( eventDate ).getTime() / 1000,
              sentence: e.parentSentence().out( its.markedUpText )
            })
          });

        return timeline.sort( ( a, b ) => a.unixTime - b.unixTime )
}
Enter fullscreen mode Exit fullscreen mode

You can see it in action on an interactive Observable notebook — "How to visualize timeline of a Wiki article?".

About winkNLP

WinkNLP is a developer friendly JavaScript library for Natural Language Processing (NLP). It can easily process large amount of raw text at speeds over 650,000 tokens/second  on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.

It is built ground up with a lean code base that has no external dependency. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.

Top comments (0)