Motivation
Have you ever wondered what are the largest files in your local disk ? Well, I also did. But at the same time, I had two constraints on mind :
- I didn't want to use any third party tool to process the disk scan.
- I was absolutely not going to scan it manually.
This article will show you step by step how I did it. But before we're diving in, let me show you the final Tableau data vizualisations, which are quite satisfying !
Dataviz results
First dataviz
Insights
- There are a lot of files without any extension (light blue on the left-hand side).
- The ucas files from Unreal Engine archives actually make sense, as I do play Fortnite.
- The vsix files are some visual code extensions. I still wonder how they came into my computer, I only use Sublime text as main editor...
- I didn't realize how big my png photos were until this chart showed it up.
Second dataviz
Insights
- On average, OS files are bigger than non-OS ones.
- There are more than 150k files without any extension (I assumed they are for the OS but who knows?).
- There are only 171 ucas files, which means that 1 ucas file is larger than the average.
- I honestly should remove the useless 2Gb used by vsix files.
Third dataviz
Insights
- There are 24 levels of folders, where the first one is the disk itself
C:/
. - Most used directories are generally between 4th and 12th depth.
- 6th level don't contain a lot of files : there must be only subdirectories in this folder depth.
Fourth dataviz
Folders depths grouped by usefulness |
1 dot = 1 file |
1 color = 1 folder |
Y axis = folder depth starting with 1, from top to bottom |
Insights
- The far we go down (to greater directories depth), the less are the amount of files.
- Empty spaces that are created in non-OS files stand for exclusive OS folders.
- Among OS files, those large lined-up areas stand for Microsoft Services files : ![os-folder-1][os-folder-1] ![os-folder-2][os-folder-2]
-
Among non-OS files, the large pink and green lines stands for
%AppData%
subfolders, where all caching processes are happening and stored : ![pink-line][pink-line]
How did I do it
Gathering files details
Before having the above final vizualisation, the first step is obviously to gather datas. I just used the following two lines code from my cmd terminal :
cd C:/
where "*.*" /r . /t > f:\list-of-c-files.txt
Note that the output file is stored out of the scanned disk so that it doesn't interfer while scanning.
Initial output
The output will look like shown below :
Quite ugly, right ? Let's do some cleaning.
Data cleaning
This step can be done in any software or programming language that you like. In my case, I directly used Tableau Software.
- I import the initial file as a text file with a random non-used character as delimiter. From this way, I can customize all new calculated fields from raw datas manually. In my case, I used
^
as the seen in this (french version) screenshot from Tableau Software Desktop : - I create all the new calculated fields and hide the single raw column
src_all
: - I preview final output datas to make sure everything fits to what I expected :
And that's it, we are ready to dataviz !
If you want to preview your own files...
Just ping me on Twitter and I will be glad to give you the Tableau template to get started quickly !
Feel free to tell me what are your thoughts on this side-project of mine on the comments section below.
Top comments (0)