DEV Community

Cover image for Social Learning Journal - Mediums

Social Learning Journal - Mediums

dev3l profile image Justin Beall ・3 min read

Learning comes in many different forms. Whether it is reading a book or attending a course, we can use structured messages to help identify the medium of each learning event. In my learning journal, I assume that the starting text of the first line in the message will contain the event if one exists. I discussed this briefly in Social Learning Journal - Parsing Audiobooks. In this post we will expand upon the density calculation as described in Social Learning Journal - Density.


Identify all the types of learning events currently available given the Twitter data set. Technically, this does not have to be limited to Twitter. Any stream of events, such as a YouTube playlist, git commit log, or any other event log that contains data relevant to learning could be used. The medium of code commits, extracting data from GitHub (or BitBucket, GitLab, etc), is out of scope for today's activities.

Extracting Mediums

Given our message pattern, let's first loop through our data set and extract and identify each available medium that has been recorded to date. As mentioned above, we are going to assume the starting text of the first line contains this information. Generally, journaled events contain the following structure:

- <author>\n
Enter fullscreen mode Exit fullscreen mode

We will start off with a simple script that just loops through each event and grabs the text:

import json
import os

from dotenv import load_dotenv


DATA_SEED_TWITTER_PATH = os.environ.get("DATA_SEED_TWITTER_PATH", "./data/tweet.json")

if __name__ == "__main__":
    with open(DATA_SEED_TWITTER_PATH) as data_seed:
        data = json.load(data_seed)

    tweets_text = [tweet['tweet']['full_text'] for tweet in data]
Enter fullscreen mode Exit fullscreen mode

We don't get a lot of information from this yet, but at least we confirm that we have all of the text for all of the 4556 journaled events in our data set. Next, we need to use a bit of string manipulation to extract the medium. This will be a slightly sloppy list, as when I first started this project I was not as rigorous in journaling events. By applying a filter, we are able to start working with a more refined data set:

tweets_with_mediums = list(filter(lambda text: ":" in text.split("\n")[0]
                                               and len(text.split("\n")) > 1
                                               and not text.startswith("RT @"), tweets_text))
Enter fullscreen mode Exit fullscreen mode

In the full text, retweets started out with the text "RT @", so we filter those out. In addition, events that are not multiple lines are removed. Finally, we make sure that there is a ":" in the first line of the text. This whittles us down to 915 events. We can use a set comprehension to further reduce the number of entries we will have to manually process.

mediums_set = {tweet_text.split("\n")[0].split(":")[0].lower() for tweet_text in tweets_with_mediums}
mediums = sorted(list(mediums_set))
Enter fullscreen mode Exit fullscreen mode

This produces a list of 171 different events. This is not a perfect solution, but it's easy enough to go through manually and produce a list of valid medium category labels.

@allspaw watched
attending - keynote
attending meetup
finished listening
finished reading
i finished listening to 'start with why
i'm at bdd microskill
i'm at fp vs. oop
i'm at organizational learning
i'm at the trust transaction
lightning talk
listen to
listend to
listened to
listened to agile uprising podcast
started listening to
started reading
Enter fullscreen mode Exit fullscreen mode

Granted this isn't perfect, but there is a defined set of things we can use to help identify our mediums. Without going through the list manually, it would have been hard to find my misspelling of the word "listened" as "listend" to be identified as a podcast. The final list is as follows:

attended : conference/session
attending : conference/session
i'm at : conference/session
lightning talk : conference/session

watched : video

began: course
completed: course

started listening to : audiobook start
finished listening : audiobook end

started reading : book start
finished reading : book end

listend to : podcast
listened to : podcast

presented : speaking
Enter fullscreen mode Exit fullscreen mode

I love the expressivity of Python. Given we have loaded the data, four statements later, we have a filtered list that is easily processable by human eyes. The full script can be found on GitHub,


In future posts, we will use this set of tags to help identify each medium as we process our events. Given our original 4556 events, then whittled down to 171 using various filters, we were able to identify 14 relevant tags. In doing so, we have identified seven different mediums: conference/session, video, course, audiobook, book, podcast, and speaking.

Discussion (0)

Editor guide