Discussion on: In programming, is it better to have many small files or one large file?

View post

Basically, if you need (almost) all data from a file, keep it in one file. If you need just a small subset at a time and you can explicitly request a specific subset, keep subsets in separate files.

Let's say you have to create an address book for a small company. It contains a few offices and you will want to show all of them at once. It makes sense to keep the addresses in one file. But what if you were to create an address book of all companies in your country?

You would probably need to find a way to group addresses and keep them in separate files. Otherwise, it would be difficult to send so much data to a client or even open it on a server.

One solution would be to group addresses alphabetically. One file would contain companies starting with 'a', another starting with 'b' and so on. Then, your address book could allow a user to pick the first letter and reduce the amount of data loaded unnecessarily. You could further reduce file size by grouping addresses by the first two letters: 'aa', 'ab', 'ac'... Then, you could easily create a search on your site that would work when a user enters the first two letters.

This is pretty much how databases work with indexes and partitioning.

Sometimes, even if you need to send all the data to a user, you might want to keep it in separate files. Back in the days of floppy disks, a game would consist of 19 archive files (rar/zip), each under 1.44. This is still a relevant approach in the days of the internet as you can deliver "just enough" data to a user more quickly and load subsequent packages while the user is enjoying the data.