Why static typing matters

#types #staticanalysis #scalable

Introduction

Python's popularity has been skyrocketing in recent years, thanks to the increase in interest and use of data science as a discipline to analyze and process large bulks of data.
Thanks to its open-source status and vibrant community, as well as for having batteries included, lots of libraries appeared that pushed Python's popularity to the top of the charts (scientific IDEs like Spyder, libraries like numpy, pandas, among others).
And, while for data processing a Number type suffices when tied with matrices and plotting utilities, Python, as a dynamically typed language has some shortcomings when it comes to normal product based web development.

Dynamic and static type systems

There are different ways languages go about using type information.

Some enforce that all types are correct up to function definition level (like Haskell, where types need to be specified), then Java, C/C++ and the lot, due to their statically typed nature enforce types to be explicitly declared (or not: if the compiler/JVM supports it, it's possible to do type inference, where the compiler deduces the types from context, even when no type is specified: var x = 5; is valid Java 11+ and it denotes that x is of type integer, because it's inferred on compile-time), and checked upon compilation.

Other languages, like Python, which are interpreted, have dynamic typing, which means that types don't need to be explicitly declared and because of being interpreted, no compile-time checks are ever done on the correctness of types, which means that this is valid Python:

a = 10 #integer
a = ["a", "b"] #List of strings

What this means is that there are no guarantees before runtime regarding the types of variables you're operating on. This can become very hard to manage with large codebases filled with legacy code and lots of modules and functions and classes, all with no explicit type checking, it means you're basically on your own.
It also means, in languages similar to Python, that a variable knows nothing about itself NOR the type of value it holds, it's literally like having a cardboard box with a tag attached to it. The identifier is what's written in the tag, but, that provides absolutely no information about the contents of the box, and, as an important consequence, it also provides no information regarding what we can do with what's inside the box. It's up to the programmer to keep track of it somehow and dynamically adjust the code to accommodate to what operations the value supports in its type. Meaning: you can concatenate strings and sum integers, and the plus sign as an operation doesn't really care. It will be information that will be encoded in the types.

Purposefully losing this information, makes the language more versatile and compact, so you can write more code faster and express more behaviour and functionality with less code, but at a very high cognitive cost, since no types are available to help you understanding the code. And we reach the cornerstone of this article. The purpose of code is to be understandable and explicit and predictable to other developers. Types are free documentation that increase understanding and improve developer's productivity.

Use types

Seeing as how types are so important in providing a framework of intrinsic code documentation, it's good practice to try and use languages that give you the benefit of static analyses and type checking. This is even crucial for lots of tasks that some developers just take for granted in their IDEs and tool sets. IntelliJ can offer you great refactoring candidates information, code simplification, point to locations with code duplication, simplification of expressions, inlining variables, creating constants out of literals because it has type checking and type information during writing phase.

Types allow for IDEs and other tools to perform what's known as static code analysis. So, thanks to knowledge of types, a tool like a daemon in your development environment can analyze your code and immediately point out to you how you can improve your code.

Case-study: how Dropbox improved their Python codebase

You can read here how Dropbox developed a tool for performing type checking on interpreted languages (specifically Python) and how, by keeping in touch with developers, they successfully developed a tool which is now widely used and brings all the benefits of types that were discussed above to a dynamically typed language.
Their main argument is that code at scale is about ensuring that developers can quickly understand the current codebase so they can get up to speed quickly and being productive quickly. They managed to effectively combine the flexibility and code churn production speed of a dynamically typed language, with the power and reassurance of types and type-checking, which enables new developers to see a very versatile and well documented codebase that is easier to maintain and easier to work with, which enables both developers and the company to extract much more value out of it. Win-win.

Conclusions

We saw that type-checking is such an important "tool" to benefit from, with so many benefits on so many different fronts, that even companies which used dynamically typed languages, realized that, at scale, types become an intrinsic part of your business, because the code is the business, and harnessing the help and gains of types becomes vital to stay efficient.
Multiple companies and individuals have since adopted this tool, I hope more people will like, understand and love types and their power!!

Top comments (9)

Eljay-Adobe • Sep 14 '19

As of Python 3.5, optional type annotations were introduced. For small projects, of small value. For large projects, invaluable.

Much as TypeScript brings statically checked types to JavaScript. Huge win for big projects.

Bruno Oliveira • Sep 14 '19

Yes!! Exactly! I wasn't aware of this, since last version I used was 3.0

Delyan Iliev • Sep 14 '19

Doesn't that mean you should have done better research on the topic before writing this article?

Bruno Oliveira • Sep 14 '19

Not really, since the point of the article was to talk about types and why it's important to have them.
The article of dropbox mentioned there is from 5 of September. Seems quite recent to me :)
And it was more of a discussion on the side of why it's important to use them and how a company like Dropbox also did it... It wasn't really about Python versions and if anything, it just makes the article and comparisons more accurate

Rasmus Schultz • Sep 15 '19

Interesting 🙂

Note that this isn't a new language feature per se, though - technically, this is new syntax for annotations: it isn't checked by the compiler when the code is loaded, and the types aren't checked by the interpreter at run-time either. An external tool is required to perform the static analysis offline.

I suppose these things would be possible in the future though. Possibly even in userland? (I don't know Python all that well myself.)

Either way, just adding the syntax is a major step up - it's real incentive to proof and harden your code, and start using an offline inspection tool. 👍

akildemir • Sep 15 '19

Don't get me wrong but I really don't understand the benefit of not having a static type checking(php, js, python etc) in a programming language. It always produces some problems and stupid bugs. And then tools like typescript start to pop up. Why you didn't have a static type checking in the first place? I don't think the reason can not be having to write type of the variable. You gotta have explicit type checking in a language so that devoloper knows what he/she's doing.

Tova Kovalt • Sep 15 '19

Sorry to say, but this article is a misleading piece of shallow knowledge. Python cares about types, carries info on types during runtime and performs type checking when executing statements. Python is a strong-typed language.
And this is not about interpreter versions. It has been built in such way from the very beginning.

Bruno Oliveira • Sep 15 '19

Sure, I agree, I mean, you can't just do 1+"two" and hope no type inspection happens or that an error is not raised. But this happens only when executing the code. While with using the mypy, this would be checked statically and before the code is even executed. Which was the point I was trying to make.
Strongly typed doesn't mean necessarily statically typed.