loading...

Thoughts on Data Structures

louy2 profile image Yufan Lou Updated on ・3 min read

From the 1913 Webster's Dictionary, "data" means "a fact [...]; that upon which an inference or an argument is based"; "structure" means "manner of building; form", and "arrangement of parts". A data structure is an arrangement, as well as manner of building such arrangement, of facts on which we can rely to make arguments, inferences, and ultimately decisions.

The essence of data structure is how best to organize facts to support our decision making.

This essence is present everywhere: arrangement of logical gates in a CPU; structure of magnetism in a hard disk; organization of files in a database system; assortment of modules in an API; management of relationships in a social network; etc.

The essence is not very specific to "computer" science. This article is also an arrangement of facts on which I rely to make arguments. So is every scientific paper. The difference is that other scientific disciplines rely mostly on mature structures, such as our natural languages, mathematical notations, papers and folders, lab notebooks, etc., so that they can focus on their object of study. For computer science, the object of study is the structures themselves. However, computer science is not the only discipline for such studies. Study of literature, logistics, linguistics, and logic, are all highly related.

The data structure that we should be most familiar with is our source code. For example, to print "Hello world!" to the standard output, we write the following Python source code:

print("Hello world!")

At the basic level, the code is a list of characters. We may write it in Python as following:

['p', 'r', 'i', 'n', 't', '(', '"', 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!', '"', ')']

It is obvious that this structure doesn't have much meaning to it. The basic unit of meaning in English is a word. Let's put the words together:

['print', '(', '"', 'Hello', ' ', 'world', '!', '"', ')']

Great! It's still a list, but a bit more meaningful. Next, the parenthesis and double quotes indicate structures, just like they do in English. We can use nested list to show their structures:

['print', ['(', ['"', 'Hello', ' ', 'world', '!', '"'], ')']]

Remember our goal: to print "Hello world!" to the standard output. "Hello world!" is associated with "print". To express the association, we can use dict in Python. dict associates a value with a key, just like a dictionary associates a definition with a word.

{'print': ['(', ['"', 'Hello', ' ', 'world', '!', '"'], ')']}

Next, since arguably the parenthesis and double quotes are associated with the words contained within, we can express parenthesis and double quotes in a more consistent way.

{'print': [
    {'()': [
        {'""': [
            'Hello', 
            ' ', 
            'world', 
            '!'
        ]}
    ]}
]}

Cool! Now we can go on to actually give this data structure a meaning. That is, write an interpreter to perform the action it is describing.

But why couldn't we perform the action at the beginning? In fact, you can! The problem is not in whether you can perform the action, but what else you can perform. For example, if we change the string we want to print:

print("Howdy python!")

Or if we change the function:

input("Howdy python!")

What data structure enables the computer to realize most quickly what is different and what is the same? Questions of such nature quickly piles on in the development of a complex software system.

For more thoughts along the same trajectory, I suggest you study The Structure and Interpretation of Computer Programs.


If you feel that you've got something from this post, I am glad! Please don't hesitate to comment or reach out.

If you feel that you've got enough that you'd like to donate, please donate to Wikimedia Foundation, to which I owe infinitely.

Discussion

pic
Editor guide