DEV Community

Cover image for I discovered the largest files in my Windows C disk
Jady Nekena
Jady Nekena

Posted on

I discovered the largest files in my Windows C disk

Motivation

Have you ever wondered what are the largest files in your local disk ? Well, I also did. But at the same time, I had two constraints on mind :

  • I didn't want to use any third party tool to process the disk scan.
  • I was absolutely not going to scan it manually.

This article will show you step by step how I did it. But before we're diving in, let me show you the final Tableau data vizualisations, which are quite satisfying !

Dataviz results

First dataviz

Extension total sizes, grouped by usefulness
Extension total sizes, grouped by usefulness

Insights

  • There are a lot of files without any extension (light blue on the left-hand side).
  • The ucas files from Unreal Engine archives actually make sense, as I do play Fortnite.
  • The vsix files are some visual code extensions. I still wonder how they came into my computer, I only use Sublime text as main editor...
  • I didn't realize how big my png photos were until this chart showed it up.

Second dataviz

Extensions with their total sizes and number of files, grouped by usefulness
Extensions with their total sizes and number of files, grouped by usefulness

Insights

  • On average, OS files are bigger than non-OS ones.
  • There are more than 150k files without any extension (I assumed they are for the OS but who knows?).
  • There are only 171 ucas files, which means that 1 ucas file is larger than the average.
  • I honestly should remove the useless 2Gb used by vsix files.

Third dataviz

Number of files per folder depth
Number of files per folder depth

Insights

  • There are 24 levels of folders, where the first one is the disk itself C:/.
  • Most used directories are generally between 4th and 12th depth.
  • 6th level don't contain a lot of files : there must be only subdirectories in this folder depth.

Fourth dataviz

Folders depths grouped by usefulness
Folders depths grouped by usefulness
1 dot = 1 file
1 color = 1 folder
Y axis = folder depth starting with 1, from top to bottom

Insights

  • The far we go down (to greater directories depth), the less are the amount of files.
  • Empty spaces that are created in non-OS files stand for exclusive OS folders.
  • Among OS files, those large lined-up areas stand for Microsoft Services files : ![os-folder-1][os-folder-1] ![os-folder-2][os-folder-2]
  • Among non-OS files, the large pink and green lines stands for %AppData% subfolders, where all caching processes are happening and stored : ![pink-line][pink-line]

How did I do it

Gathering files details

Before having the above final vizualisation, the first step is obviously to gather datas. I just used the following two lines code from my cmd terminal :

cd C:/
where "*.*" /r . /t > f:\list-of-c-files.txt 
Enter fullscreen mode Exit fullscreen mode

Note that the output file is stored out of the scanned disk so that it doesn't interfer while scanning.

Initial output

The output will look like shown below :
First raw datas outputed from script
Quite ugly, right ? Let's do some cleaning.

Data cleaning

This step can be done in any software or programming language that you like. In my case, I directly used Tableau Software.

  • I import the initial file as a text file with a random non-used character as delimiter. From this way, I can customize all new calculated fields from raw datas manually. In my case, I used ^ as the seen in this (french version) screenshot from Tableau Software Desktop : delimiter
  • I create all the new calculated fields and hide the single raw column src_all : calculated-fields
  • I preview final output datas to make sure everything fits to what I expected : final-output

And that's it, we are ready to dataviz !

If you want to preview your own files...

Just ping me on Twitter and I will be glad to give you the Tableau template to get started quickly !

Feel free to tell me what are your thoughts on this side-project of mine on the comments section below.

Top comments (0)