A simple program to sort all the items in a directory, be they file or folder, by descending size.
Sorting items in a directory by descending size is not as straightforward as you might think, whether you are using a graphical file browser or the command line, because operating systems do not calculate the total size of a directory's contents when browsing a directory tree. This article offers complete working programs to overcome this on most operating systems.
Problem
Perhaps you will find the following familiar:
Whether for work or for personal projects, I like to organize my digital assets by creating a parent directory, let's say one called Projects, and storing all the content for the individual projects in there. If a project is small and doesn't involve a lot of content, I'll use a single file, usually a text file. If a project involves more content, say a text file as well as a couple of screenshots, I'll create a folder for that project and place all the related assets in there. So, from my perspective, the single text file and the folder are equivalent in the sense that each represents a project. The only difference is that the folder represents a bigger project, one with more stuff.
Sometimes I want to see which of my projects is currently the largest, which has the most stuff. This usually happens because I haven't worked on a particular area for some time, so when I come back to it, I want to see which project has the most content. My reasoning being that the project with the most content should be the most complete, and therefore probably the one I should start working on first, as it will be easiest to finish.
For example, consider a directory with the following contents:
Name | Type | Size |
---|---|---|
Huge Project.txt | File | 2.6KB |
Larger Project | Folder | 1.07KB |
0 - Tiny Project | Folder | 0KB |
Basic Project.txt | File | 0.36KB |
Big Project.txt | File | 2.11KB |
Sorting the above directory by descending size should output:
Huge Project.txt 2.6KB
Big Project.txt 2.11KB
Larger Project 1.07KB
Basic Project.txt 0.36KB
0 - Tiny Project 0KB
However, this is not what we get when we click the Size column header in graphical file browsers on Windows, Mac, and Linux.
Windows
Mac
Linux
Using the command line provides output that is somewhat closer to the desired one, but still not entirely correct:
Windows
dir /b /o:-d
Output:
Larger Project
0 - Tiny Project
Huge Project.txt
Big Project.txt
Basic Project.txt
Mac and Linux
There are various command combinations for directory content sorting on UNIX-based systems such as Mac and Linux. Most involve using du
, sort
, and ls
. Other examples I found online threw find
and grep
into the mix as well.
Here are the ones I tried:
du | sort
du -a -h --max-depth=1 | sort -hr
Output:
32K .
8.0K ./Larger Project
8.0K ./0 - Tiny Project
4.0K ./Huge Project.txt
4.0K ./Big Project.txt
4.0K ./Basic Project.txt
ls
Using the -S
switch on the ls
command is supposed to do exactly what I'm looking for, sort items by descending size.
ls -S
Output:
'0 - Tiny Project' 'Larger Project' 'Huge Project.txt' 'A - Big Project.txt' 'Basic Project.txt'
The output is still off. I tried adding the -l
(long) switch.
ls -lS
Output:
total 20
drwx---r-x 2 admin admin 4096 Sep 20 21:49 '0 - Tiny Project'
drwx---r-x 2 admin admin 4096 Sep 20 21:49 'Larger Project'
-rw-rw-r-- 1 admin admin 2667 Sep 20 21:49 'Huge Project.txt'
-rw-rw-r-- 1 admin admin 2164 Sep 20 21:49 'Big Project.txt'
-rw-rw-r-- 1 admin admin 368 Sep 20 21:49 'Basic Project.txt'
The output includes more detail, as expected, but the sort order is the same as before.
Root Cause
While the output of the various commands does not provide the desired output, it does highlight the root cause of the problem. When browsing a directory tree, operating systems do not recurse into folders to calculate the total size of their contents. Instead, they treat all folders as having the same fixed size. Usually this is the file system's minimum block size, commonly 4096 bytes, 4KB.
Solution
There must be at least a dozen free tools out there that solve this problem, but to be honest, I didn't even look. Writing a script/program that does the same thing and then sharing it here felt like it would be easier, involve less bloat, hopefully useful for others, and definitely more fun.
I've waffled on long enough. Here is the code:
Python
#! /usr/bin/env python3
import sys, os, argparse
def get_dir_items(path):
results = {}
items = os.scandir(path)
for item in items:
if item.is_file():
results[item.name] = item.stat().st_size
elif item.is_dir():
dir_size = 0
dir_results = get_dir_items(item.path)
for key, value in dir_results.items():
dir_size += value
results[item.name] = dir_size
return results
if __name__ == "__main__":
if len(sys.argv) <= 1:
print("Specify a path as the first argument.")
else:
root_path = sys.argv[1]
if not os.path.exists(root_path):
print(root_path, "is not a valid path")
else:
results = get_dir_items(root_path)
results_sorted = dict(sorted(results.items(), key = lambda item: item[1], reverse = True))
for key, value in results_sorted.items():
print(key, "\t", round(value / 1024, 2), "KB")
PowerShell
if (!$args[0]){
echo "Specify a path as the first argument."
} else {
$root_path = $args[0]
if (!(test-path $root_path)){
echo ($root_path + " is not a valid path")
} else {
$results = @{}
$items = gci $root_path
foreach($item in $items){
if ($item -is [System.IO.DirectoryInfo]){
$results[$item.Name] = (gci $item.FullName -recurse | measure length -sum).Sum
} else {
$results[$item.Name] = $item.Length
}
}
$results_sorted = $results.GetEnumerator() | sort value -descending
foreach($result in $results_sorted){
echo ($result.Name + "`t" + [math]::Round(($result.Value/1024), 2) + "KB")
}
}
}
C Sharp
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace Program {
class Program {
private static Dictionary<string, long> getDirItems(string path) {
Dictionary<string, long> results = new Dictionary<string, long>();
DirectoryInfo directoryInfo = new DirectoryInfo(path);
FileInfo[] files = directoryInfo.GetFiles();
foreach(FileInfo file in files){
results.Add(file.Name, file.Length);
}
DirectoryInfo[] directories = directoryInfo.GetDirectories();
foreach(DirectoryInfo directory in directories){
var dirResults = getDirItems(directory.FullName);
long dirSize = 0;
foreach(var dirResult in dirResults){
dirSize += dirResult.Value;
}
results.Add(directory.Name, dirSize);
}
return results;
}
static void Main (string[] args) {
if (args.Length < 1){
Console.WriteLine("Specify a path as the first argument.");
} else {
string rootPath = args[0];
if (!Directory.Exists(rootPath)){
Console.WriteLine("{0} is not a valid path", rootPath);
} else {
var results = getDirItems(rootPath);
var resultsSorted = results.OrderByDescending(x => x.Value).ToDictionary(x => x.Key, x => x.Value);
foreach(var result in resultsSorted){
Console.WriteLine("{0}\t{1}KB", Path.GetFileName(result.Key), Math.Round((decimal)result.Value / 1024, 2));
}
}
}
}
}
}
Go
package main
import(
"fmt"
"os"
"path/filepath"
"sort"
)
func getDirItems(path string) map[string]int64 {
results := make(map[string]int64)
items, e1 := os.ReadDir(path)
if e1 != nil {
fmt.Println("Error:", e1)
os.Exit(1)
}
for _, item := range items {
if item.IsDir(){
var dirSize int64 = 0
var dirResults = getDirItems(filepath.Join(path, item.Name()))
for _, dirResult := range dirResults {
dirSize += dirResult
}
results[item.Name()] = dirSize
} else {
itemInfo, e2 := item.Info()
if e2 != nil {
fmt.Println("Error:", e2)
os.Exit(1)
}
results[item.Name()] = itemInfo.Size()
}
}
return results
}
func main(){
if len(os.Args) < 2 {
fmt.Println("Specify a path as the first argument.")
} else {
rootPath := os.Args[1]
_, e1 := os.Stat(rootPath)
if e1 != nil {
fmt.Println(rootPath, "is not a valid path")
} else {
results := getDirItems(rootPath)
var resultKeys []string
for key, _ := range results {
resultKeys = append(resultKeys, key)
}
sort.Slice(resultKeys, func(i, j int) bool{
return results[resultKeys[i]] > results[resultKeys[j]]
})
for _, key := range resultKeys {
fmt.Printf("%s\t%.2fKB\n", key, float64(float64(results[key]) / 1024))
}
}
}
}
There are some minor differences between the four implementations, but the general approach used for all four is the same:
- Create a recursive function that returns a collection of key-value pairs of item (file or folder) name and size.
- In the main function or block, do some basic input validation, and if the user has provided a valid path, run the recursive function on that path.
- Sort the output of the recursive function by value (size), in descending order.
- Print the sorted output to the console. Each line printed adheres to the format: the item name, followed by a tab character, followed by the item size divided by 1024 and rounded to two decimal places to get the size in kilobytes, followed by "KB" to denote the size unit.
Usage
On the command line, pass the path to the directory you want to sort as the first parameter. I won't list all the possible examples, but here are a couple, assuming you've copied the code and saved it as a file name dir_desc
, short for "directory descending", plus the appropriate file extension:
Using Python on Mac or Linux:
python3 dir_desc.py <some path>
Using PowerShell on Windows:
powershell -f dir_desc.ps1 <some path>
Differences Between Languages and Implementations
- Python and Go resemble C and other C-like languages in that the first command line argument is the second item in the args array. In the .NET languages, PowerShell and C#, the first argument is the first item in the
args
array. - In PowerShell, there is no need to create a separate recursive function, because the desired result can be more easily achieved by using the built-in
Get-ChildItem
(gci
) andMeasure-Object
(measure
) cmdlets. - In Go, sorting a collection of key-value pairs (map) by value requires a couple more lines of code than in other languages, as the built-in sorting functions are designed to work with arrays/slices, not maps.
- In Go, rounding a floating point number to X decimal places is handled when printing the output, using the
fmt.Printf()
function, as opposed to when rounding the number, which, incidentally, can be done without using themath.Round()
function. If you have a C background, this is probably intuitive. For the rest of us, it's a bit bizarre, but works fine.
I ported my original approach in Python to a few other languages, so that there is at least one version that should work on each of the three major operating systems:
- Mac and Linux: should have the
python3
interpreter installed by default. If not, you can use the Go version. Some Linux systems may have a version ofgcc
installed by default that can compile Go, but most systems will not, so you will need to download the Go compiler. - Windows: the PowerShell version should work out of the box on systems with Windows 10 or later. For older systems, the C# version is probably the better choice. You can use Windows' built-in C# compiler to compile the code.
And that's it. Another yak, shaved. I hope you found this useful.
Top comments (0)