DEV Community

Cover image for File management with Python: Part 2
Marvin
Marvin

Posted on • Edited on • Originally published at marvin.hashnode.dev

File management with Python: Part 2

This is the second part of the series File management with python. We pick up from where we left last time Part 1, where we organized files according to the extension. So, let's get started.

Sometimes, organizing files might need just a bit more than knowing their extensions. For instance, take a directory where all the files are of the same type. Whether .pdf, .doc , .mp4 and so on. In this piece, we take our organization a little further. Say you have a folder with slides(.ppt ). In this case, you've just received a whole lot of lecture files, but they are not exactly easy to go through. So instead of having a whole slide having everything for the first lecture, they were broken down to a slide for a session. Our folder, in this case, is assumed to look as below.

DataStructures/
|_Datastructuressession1Slide1.ppt
|_Datastructuressession1Slide2.ppt
|_Datastructuressession1Slide3.ppt
|_Datastructuressession2Slide3.ppt
|_Datastructuressession7Slide8.ppt
|_Datastructuressession9Slide2.ppt

... and so on

Enter fullscreen mode Exit fullscreen mode

What's happening? We got the slides alright, but they are a mess. You would have to look through the folder for a specific slide that follows from where you've just read. We should make this easier, Let's have all the slides organized according to the session. Remember how we generated random files in the previous article? We'll do the same thing, only this time, all the files will be of the same type. Have a look at that here for a quick refresher. Our file looks much like the create_random_files.py.

#!/bin/python3
# create_lectures.py

import os
from pathlib import Path

sessions = [str(x) for x in range(1,21)]  # create 20sessions 
sessions = [str(0)+item if int(item) < 10 else item for item in sessions]

# Datastructuressession01Slide1.ppt

# get into the DataStructures directory
os.chdir('./DataStructures')


for item in sessions:
    # create 20 slides for each session
    for num in range(21):
        file_to_create = f"Datastructuressession{item}Slide{num}.ppt"
        Path(file_to_create).touch()

Enter fullscreen mode Exit fullscreen mode

Okay, okay. I'll admit I went a little overboard with the number of files this time. That's quite the number.

Let's draw our attention to this line:

sessions = [str(0)+item if int(item) < 10 else item for item in sessions]
Enter fullscreen mode Exit fullscreen mode

The line just before this makes a list of twenty numbers, but here's the catch, we convert each of these numbers to strings. Why?
We are appending the number 0(zero) as a string to each number if indeed it is below 10. That would make each number in the lower 10 range look like this; 01, 02, 03,... and so on.

Above, we created a number of files for each session in a range of 20 sessions.

What we do next is simple, group these files according to the session.

# clean_reading.py
#!/bin/python3
# move files to directories according to the file name pattern

import os
import shutil


# get into the Datastructures directory
os.chdir('./DataStructures')

# Datastructuressession01Slide1.ppt
for f in os.listdir("."):
    folder_name = f[14:23]
    # print(folder_name)

    if not os.path.exists(folder_name):
        os.mkdir(folder_name)
        shutil.move(f, folder_name)
    else:
        shutil.move(f, folder_name)

Enter fullscreen mode Exit fullscreen mode

The only line that might need some explaining would be:

folder_name = f[14:23]

Enter fullscreen mode Exit fullscreen mode

We have counted the number of characters for the kind of name we want our folder to be named after. In this case, we broke down the string for the file name Datastructuressession01Slide1.ppt from where we get that the first s for the session is character number 14 in the whole string while the last session count would be character number 23 which translating from our generated files would mean the last digit of the number 20.
Running this would get all our slides in their respective sessions quickly, and clean.
There is a lot more that one would want to do, say let the program know which sessions are included without a manual key in of the character position, but more advanced tools exist for this especially under the UNIX environment. Feel free to do some scouting and find what works best for you. As a heads up, here's a cool GUI sorter made with python. Paul did an awesome job on this. Questions? Do reach out in the comment section. As usual, all this code can be found at TheGreenCodes.

Top comments (0)