Deepjyoti Barman

Posted on Feb 21, 2020

Automate watching a local series in Python

#python #tutorial #beginners #productivity

I have a few series locally downloaded (I know piracy is not cool) and the only reason I haven't been able to watch them till now is I'm too bored to watch them one by one and keep track of which one I last watched.

So, I thought, I can write a Python script which will do all of that for me and all I'd have to do is run it and it'll do the rest. (Yeah I know Netflix does that but then these series were not available on Netflix).

After a lot of brainstorming I came with the following process that I would implement in order to do the above.

The process

If the series is not already cached, cache it (will explain it later)
Get the last played episode
Call an external player like mpv

1. Caching

So, here's the deal. The series that we keep locally downloaded are stored in the following format

├── TAAHM
│   ├── Season 01
│   │   ├── Two And A Half Men Season 01 Episode 01 - Pilot - Most Chicks Wont Eat Veal.avi
│   │   ├── Two And A Half Men Season 01 Episode 02 - Big Flappy Bastards.avi
│   │   ├── Two And A Half Men Season 01 Episode 03 - Go East on Sunset Until You Reach the Gates of Hell.avi
│   │   ├── Two And A Half Men Season 01 Episode 04 - If I Can't Write My Chocolate Song, I'm Going to Take a Nap.avi
│   │   ├── Two And A Half Men Season 01 Episode 05 - The Last Thing You Want to Do Is Wind Up with a Hump.avi
│   │   ├── Two And A Half Men Season 01 Episode 06 - Did You Check with the Captain of the Flying Monkeys.avi
│   │   ├── Two And A Half Men Season 01 Episode 07 - If They Do Go Either Way, They're Usually Fake.avi
│   │   ├── Two And A Half Men Season 01 Episode 08 - Twenty-Five Little Pre-pubers Without a Snoot-ful.avi
│   │   ├── Two And A Half Men Season 01 Episode 09 - Phase One, Complete.avi
│   │   ├── .....
│   ├── Season 02
│   │   ├── ......

So first things first, we need to define a function that can recursively find out all the seasons and episodes.

However there's a catch, we can't just recursively go through all the files, we also need to extract the season and episode numbers from them and will have to sort them on the basis of that and store them accordingly.

Only then we can cache them so that next time we can just find the cached
data and skip this part.

Let's write a(some) function(s) to extract the data

def _get_all_match(self, values, keyword):
        """Get all the possible matches of the keyword passed in the list."""
        matched_list = {}

        for value in values:
            # Try to extract the season number, if it's not present
            # skip the dir
            season_name = value.name.lower()
            result = re.search(
                    '{}({})?[\ \.]?[0-9]?[0-9]'.format(keyword[0], keyword[1:]),
                    season_name
                )
            if result is not None:
                result = result.group(0)
            else:
                continue
            season_number = re.sub(
                                    '{}({})?'.format(keyword[0], keyword[1:]),
                                    '',
                                    result
                                ).replace(" ", "")
            matched_list[int(season_number)] = value.as_posix()

        return matched_list

What this function is doing is taking some values and taking a keyword.

I made the function generic so that it can get episode matches as well as season match with the same logic.

For eg:

If we want to find all the episodes in a season.

We will match each filename with a regex pattern.

If we want to find for an episode then it can either be E01 or Episode01 or E12 or E1.

See anything similar in the above?

Okay, in each of them starting letter is e and pisode part is optional but again there should be two numbers following this letters and in the numbers the first is optional (In 01, 0 might not be present all the time) but the second is required.

Thus, we come up with the following regex

'e(pisode)?[\ \.]?[0-9]?[0-9]'

But again in case of Season it will be

's(eason)?[\ \.]?[0-9]?[0-9]'

We are converting the strings to lower just to be on the safe side.

Once our logic is ready, we can just define two functions that will accordingly call this function for episode and seasons.

def _process_parent(self):
        """Process the parent directory and extract all the Seasons."""
        season_list = []

        for season in self.parent_dir.iterdir():
            if season.is_dir():
                season_list.append(season)

        return OrderedDict(sorted(
                            self._get_all_match(season_list, "season")
                            .items()
                            ))

Here, we are using an OrderedDict because we need the seasons in the proper order of increasing numbers.

Serializing the data

We do have another issue here. We can't just store the data like that in a JSON file and call it cached.

If we do the above, we would have to again find the next season/episode by iterating the whole data(cached) again.

So here's what we can do.

We will refer to each season by using its season number and each episode with its episode number.

We will store the data in the following format

"3": {
    "1": "/home/deepjyoti30/Downloads/TAAHM/Two and a Half Men Season 3 720p MaRS/Two.and.a.Half.Men.S03E01.720p.WEB-DL-MaRS.mkv",
    "2": "/home/deepjyoti30/Downloads/TAAHM/Two and a Half Men Season 3 720p MaRS/Two.and.a.Half.Men.S03E02.720p.WEB-DL-MaRS.mkv",
    "3": "/home/deepjyoti30/Downloads/TAAHM/Two and a Half Men Season 3 720p MaRS/Two.and.a.Half.Men.S03E03.720p.WEB-DL-MaRS.mkv",
    "4": "/home/deepjyoti30/Downloads/TAAHM/Two and a Half Men Season 3 720p MaRS/Two.and.a.Half.Men.S03E04.720p.WEB-DL-MaRS.mkv",
    "5": "/home/deepjyoti30/Downloads/TAAHM/Two and a Half Men Season 3 720p MaRS/Two.and.a.Half.Men.S03E05.720p.WEB-DL-MaRS.mkv",

We can serialize the data by the following function

def _serialize_data(self, series_data):
        """Serialize the data."""
        serialized_data = {}

        for season_number, season_data in series_data.items():
            for episode_number in range(0, len(season_data)):
                data_value = season_data[str(episode_number + 1)]
                if len(season_number) < 2:
                    season_number = "0" + season_number
                if episode_number < 9:
                    episode_number = "0" + str(episode_number + 1)
                else:
                    episode_number = str(episode_number + 1)
                data_key = "{}{}".format(season_number, episode_number)
                serialized_data[data_key] = data_value

        return serialized_data

We will also store the current playing episode in a different file that we will read at the start of the script each time.

Now that we have the data cached in a serial way, here's how the script executes after this part

It will load the cached (serialized) data.
It will read the current episode
It will play the current episode and wait till the external player returns
Read the next episode in the serialized data.

For Eg:

If the current episode is 0106, we will play the current one and the once that's done, we will just get the next number from the JSON file and call the player function again.

2. Get the last played

This is the simple part. We can just read the file that we have the last playing episode stored in.

I am storing the files in the ~/.cache directory.

Also, the file containing the current episode has name as .current and the one containing the serialized data is .series

We can read the data using the following function


# Read the JSON data
with open(<file_path>, 'r') as RSTREAM:
     data = json.load(RSTREAM)

# Read the current episode
current_epi = open(<file_path>, 'r').read().replace("\n", "")

Here's how we will play the file

def _play(self, episode):
        """Play the passed episode."""
        starting_epi = episode

        for it in self._cached_data:
            try:
                if it < starting_epi:
                    continue
                print("[*] Playing {}".format(it))
                self._save_current(it)
                self._mpv(self._cached_data[it])
            except KeyboardInterrupt:
                print("[*] You watched till {}".format(it))
                exit(0)

We are detecting keyboard interrupt to exit from the script.

3. Call external player

I am using the subprocess module to call the external player.

I am using mpv because it has a built in option to resume the playback at the last stopped position.

Here's how we will call the player

def _mpv(self, path):
        """Call MPV and pass the path to play."""
        call([
                'mpv',
                '--really-quiet',
                '--save-position-on-quit',
                '--resume-playback',
                path
            ])

We are calling the player with three arguments, one is to keep the verbose minimum.

The other two are pretty self explanatory.

One will make the player save the playback position on quitting and the other will make it resume from the last saved point.

Conclusion

First things first, Piracy is not cool.

But then, Netflix just doesn't give us some series that we want to watch real bad and when that happens, get the series from somewhere and then use this script to automate everything.

Just to make things simpler, I also added the functionality that the path to the series is passed as argument to the script.

Just use the sys module for that.

Although, because of above, we need to add extra checks to see if the paths are valid and stuff.

The whole script can be found here

Thanks for reading!

DEV Community

Automate watching a local series in Python

The process

1. Caching

For eg:

Serializing the data

For Eg:

2. Get the last played

3. Call external player

Conclusion

I also write about tech stuff in my personal page. Do consider checking out the posts there.

Top comments (0)

Read next

Golang básico - Comparação de Igualdade

Week 4 Recap of #100DaysOfCode: From Creative CSS to JavaScript Adventures 🚀

How is Kanban Doing in 2024?

Real-Time Processing of Big Data: Tools and Best Practices