DEV Community

Cover image for 11% of Python files are not closed! Affecting Google, Python.org, Ansible, Tensorflow ...
Code Review Doctor
Code Review Doctor

Posted on

11% of Python files are not closed! Affecting Google, Python.org, Ansible, Tensorflow ...

It’s a problem that 11% of files are not closed properly as leaving files open degrades application performance and increases the risk of unexpected side effects.

It's not just a mistake that rookies make. Many of the offenders are big names:

Those are just 4 of the >500 examples we found in the 5409 open(…) calls we analyzed over 666 repositories.

How did we find this statistic? At CodeReview.doctor we run static analysis checks on codebases, and we collect stats on the public repos. We tracked every line we could find where developers opened files via open(...) but never closed them and compared that with every line we could find where developers "did it right” (either explicitly calling close() or implicitly via context manager usage).

At the end of this article is a link and embed of the stats.

The problem

To recap: leaving files open results in degradation of application performance and increases the risk of unexpected side effects.

Computers have finite resources, including limits on how many files can be simultaneously open. Therefore leaving files open consumes more finite file handles than necessary. Under extreme conditions this could exhaust the number of allowed open files causing an “IOError: Too many open files" exception to be raised. Additionally, leaving files open consumes finite memory for longer than is necessary. Therefore for performance of the application it is preferable to closing file after finished with them.

Python’s garbage collection will eventually close dangling unclosed files, but "eventually" should raise a red flag if the goal is to write predictable software. Indeed, Python3 and Cpython might implicitly close the file the instant they’re no longer being used, while others might not for a long time. Under some conditions it might not happen at all. When pairing this problem with the zen of Python's Explicit is better than implicit, it’s clear that it’s better to not leave files open.

Python can also be fuzzy around exactly when file changes are committed to disk: the data may not be written to disk until the file is closed:

# bad: maybe not written to disk yet as the file was not closed
open('foo.txt', 'w').write('foo')
# also bad: maybe not written to disk as the file was not closed
f = open('bar.txt', 'w')
f.write('bar')
# better: written to disk when the file closed
f = open('baz.txt', 'w')
f.write('baz')
f.close()
# good: written to disk when the file closed and code is neater
with open('qux.txt', 'w') as f:
    f.write('qux')
Enter fullscreen mode Exit fullscreen mode

Windows (the Operating System) locks files opened by Python. This could mean leaving files open is not thread safe as one thread will lock the file preventing other threads from opening it — even if the other thread just wants to read from it.
That’s why we give this advice:

Image description

The stats

Here’s a link to the 4812 examples of "doing it right": all the files were closed either by explicitly calling close() or by opening the file using the context manager approach.

Here's a link to the 597 examples of not “doing it right”: the files were opened and read from but never closed.

Check your repo

Check your repo for tech debt like this at codereview.doctor, or even review your GitHub PRs.

Top comments (0)