Seth Michael Larson

Posted on Jun 19, 2017

Linting as Lightweight Defect Detection for Python

#python #lint #flake8 #bug

When many people think of linting they think about how it improves the readability and maintainability of code by forcing developers to stick with an agreed upon code style. This is indeed one of the major merits of having a 'linter' as a part of your build process, but it's not the only one!

To illustrate, here's an example scenario that is a simplified version of a real-life scenario I was faced with.

The Scenario

Imagine one of your colleagues submits you a review request. From the looks of things continuous integration passed all tests and the code has 100% coverage which are great first signs. The code they submitted includes a function that takes a list of names and processes every name except the first one. The function looks like this:

def process_all_names_except_the_first_one(names):
    if len(names) == 0:
        raise ValueError('There should be at least one name!')

    # Gets the first name as a list.
    name = names[:1]

    # Processes all the other names.
    others = [process_name(name) for name in names[1:]]

    return name + others

Not the most interesting function but it serves its purpose. It even uses a list comprehension, very Pythonic! But as you know from the title of this post, there's something wrong here. Some readers may even spot something that looks fishy! The list comprehension uses name as it's internal iterating variable which is defined outside of the list comprehension!

So you do the proper thing, you leave a comment on the line and explain that there is an issue here and mark the pull request as requiring some changes.

Shortly after your coworker calls you and asks you to come to their machine to show you that the variable doesn't leak out! They switch to their already open Python console and type the following example:

$ x = 1
$ l = [x for x in range(10)]
$ print(x)
1

To your amazement, it prints out 1 instead of 9! That settles it, the code works as planned and your colleague was correct. You go back to your desk and revise your review to be approved and your colleague submits their code.

A defect has now made its way into your code-base. If you also have continuous deployment it's being shipped to production as we speak. The customer complaints will begin rolling in shortly.

What Just Happened?

You must forgive me, for I have held back some information in order to illustrate this point. The project you're maintaining supports Python 2.7 and onwards and your colleague was using a Python 3.6 console. In Python 2.x, the function does not work correctly and the variable binds to names outside of the list comprehension. If you didn't know this about Python you are far from alone. I certainly didn't know until about a year ago.

This example is simple and contrived, but you could imagine receiving a large pull request along with lots of accompanying code making something small like this extremely easy to miss. This example also happens to be one where you can write a test that covers this function with 100% coverage and doesn't trigger the defect.

So this bug could have been missed while having a test suite, 100% code coverage, continuous integration, and code peer review. What was the missing piece?

Linting to the Rescue!

What could have saved us from shipping code with defects in this case? Of course it could be argued that your colleague should have written more tests, or that you should have reviewed the tests to make sure that they were rigorous enough. The problem with that thinking is that in this instance that would certainly be enough, but this example is basically the simplest instance of this defect occurring.

In some cases you will be modifying or reviewing legacy code where functions are hundreds of lines long such that you can't even see where the original variable was bound while reviewing. As we know from many projects there may not be an exhaustive test suite and possibly no tests at all! These are all realities of software projects. I think especially in these tougher cases linting has a massive impact on catching potential defects while investing minimal time.

The Python Lint Tool of Choice: Flake8

If you've never heard of Flake8, it is the tool that I would recommend for linting Python projects. As an alternative I've also used Pylint for past projects with success. You should check out them both, but for this example I'll be using Flake8.

For this example I'll assume you're somewhat aware of how Flake8 works, it reads and parses your Python source files and then compiles a list of violations and outputs them along with line and character numbers in order to pinpoint exactly where the fault is.

If you're not as familiar with Flake8 or want to do additional reading I've found their documentation to be quite good. The source is hosted on GitLab for anyone curious or wanting to raise an issue with the project.

To install Flake8, I would recommend pip. Install Flake8 with this command:

$ python -m pip install flake8

Configuring Flake8 to Catch Defects

One of the main complaints and push-backs I've heard about linting is that it would take too much effort to integrate into a large existing code-base. Especially one that was developed by a different team with either different code quality standards or none at all. Refactoring such a project would be a lot of effort for improvements that are harder to justify to management such as code quality.

In order to avoid this problem use Flake8 with a much more constrained configuration. I've created an example setup.cfg file which ignores all violations except those likely to cause defects.

[flake8]
ignore = E123,E126,E127,E128,E2,E3,E401,E50,W2,W3

Flake8 should be invoked like this:

$ flake8 [[SOURCE DIRECTORY NAME]]

And if we run Flake8 using Python 2.7 over the old function that we saw earlier we'll receive the following violation:

example.py:6:33: F812 list comprehension redefines `name` from line 5

If Flake8 had been a part of our build process this defect would have been caught quickly and fixed much quicker than having to rely on reviewers knowledge of subtle differences between Python 2.x and 3.x.

Programmers and reviewers alike can feel more confident in the quality and correctness of code. That's a win for everyone.

Configuring Flake8 to Work for You

The list of ignored violations was collected from the Flake8 documentation as well as the pycodestyle documentation. If you find this list of ignored restrictions not exactly to your liking you can browse these documents and either add or remove violations from your Flake8 configuration.

In the case that a violation is required in your code-base, you can also use the following comment in your code to suppress that violation in one place:

imagine_this_is_code_with_a_violation # noqa: F401,E3

In this case, that single line would not be checked for violations F401 or any violation starting with E3

You Didn't Ignore All Whitespace Violations? Aren't those style?

It is true that almost all whitespace violations can be considered stylistic and unlikely to cause defects. This is not true of all whitespace violations.

For example, mixing tabs and spaces. This is technically allowed in Python 2 but can cause a spell of problems for editors that have configurable display spacings for tabulation characters. Here's an example:

# This is how the function would display on your
# editor with 8 character tabs...
def function(a, b, c, d):
    if a:
        if b:
            d += 1
            c += 1
    return c * d

# ...But this is how Python will see it.
def function(a, b, c, d):
    if a:
        if b:
            d += 1
        c += 1 # <--- There's a rogue tab character on this line.
    return c * d

Another example is putting multiple statements on one line which can cause readers to misinterpret the flow of a function when not reading closely.

def function(a, b, c):

    for a in b:

    # ... Tons of code ...

    # It's tough to see that this is even an `if`
    # statement without syntax highlighting!
    if c: a += 1

    # ... Tons of code ...

If the Project Supports 2.x and 3.x, Run Flake8 using Both Versions

Because it's such a fast tool to run I would suggest running Flake8 from within all Python versions that the project supports. This will guarantee that defects like the one above are caught regardless of Python version.

If you're looking for a tool to assist in doing this I suggest taking a look at tox.

Summary

Linting isn't just for style checking, it's also a powerful tool for finding defects. You don't have to use linting as a style checker to receive these benefits by configuring the tool to only detect violations which are likely to be the source of defects now or in the future. Using this guide may help cut down time on integrating with existing code-bases.

Many thanks to the PyCQA for developing Flake8.

Follow me on Twitter @pythoasis or check out my GitHub @SethMichaelLarson.

Top comments (2)

Joel Bennett • Jun 19 '17

As someone who has inherited an... interesting... code base, I totally agree that flake8 is great. Another thing that we've been slowly trying to do as a team is clean up things as we go. E.g.: with every pull request, run and correct any flake8 warnings on functions that you touch. It's been slowly getting better.

Seth Michael Larson • Jun 19 '17

Completely agree! I've done something exactly like how you describe. That method works great, especially for smaller teams! :)

The trouble is when it's a part of the build process it checks every file instead of just the areas around where you make changes. This is one of the nice things about having things like allow_failure in travis.yml along with similar options in other Continuous Integration services.