DEV Community

Gabor Szabo
Gabor Szabo

Posted on • Originally published at code-maven.com

Listing first elements of a huge directory using Python

At a client we have a huge directory of files. I wanted to list the first few files. ls -l | head took ages as it first lists all the files and only then cuts it down.
After my first attempts in Python failed I wrote a Perl one-liner to list the first elements of a huge directory. However I wanted to see if I can do it with Python in some other way.

using iterdir of pathlib

The original attempt in Python was using the iterdir method of pathlib.

import pathlib

path = pathlib.Path("/home/gabor/work/code-maven.com/sites/en/pages/")
count = 0

for thing in path.iterdir():
    count += 1
    print(thing)
    if count > 3:
        break

Enter fullscreen mode Exit fullscreen mode

On the real data it took 47 minutes to run.

using walk of os

The second attempt was to use the walk method of os.

import os

path = "/home/gabor/work/code-maven.com/sites/en/pages/"
count = 0

for dirname, dirs, files in os.walk(path):
    for filename in files:
        print(os.path.join(dirname, filename))
        count += 1
        if count > 3:
            exit()


Enter fullscreen mode Exit fullscreen mode

I don't know how long this would take. I stopped it after a minute.

using scandir of os

Finally I found the scandir method of os. That did the trick:

import os

path = "/home/gabor/work/code-maven.com/sites/en/pages/"
count = 0

with os.scandir(path) as it:
    for entry in it:
        print(entry.name)
        count += 1
        if count > 3:
            exit()

Enter fullscreen mode Exit fullscreen mode

Top comments (0)