Python Static Analysis tools

#python #codequality #codereview

In a survey by stack-overflow in 2020, developers worldwide ranked Python as the third most loved programming language and the topmost wanted programming language. Python is so popular among developers that no wonder why there are so many static analysis tools for it. But how do you choose the best static code analysis tool among them? In this blog, I'll share what, why, and how of static code analysis for your Python code.

What is Static code analysis?

Static code analysis is the process of analyzing a computer program to find problems in it without actually executing it. Generally, static analysis is performed on the source code of the program with tools that convert the program into an abstract syntax tree (AST) to understand the code's structure and then find problems in it.

Why should I use a static analysis tool?

Static code analysis can help identify the anti-patterns in the code and detect possible code quality and security issues. It lets you find and rectify issues in code at the early stages of development, reducing the chances of issues being raised later in the production. The type of static analysis done by these tools are:

Code styling analysis
Security linting
Error detection
UML diagram creation
Duplicate code detection
Complexity analysis
Comment styling analysis
Unused code detection

Benefits of Static Code Analysis

Static code analysis is not 100% accurate and sometimes returns false positives or false negatives. However, it has numerous benefits, including:

Relative accuracy - catch many more errors than by manual analysis
Speed to discover errors
Comprehensiveness of testing
Decreases risk of high impact error after software release
Ability to uncover errors that aren't usually detected during dynamic testing

Top static analysis tools for Python

Let's take a look at the tools that exist in the Python ecosystem for static code analysis:

Pylint

pylint is a static code analysis tool that lists error which may come after execution of the python code, helps to enforce a coding standard, and look for code smells, offers simple refactoring suggestions, and other suggestions about code complexity.

pylint has been around for 13 years, and it is still constantly maintained. Though it is pedantic out of the box, it is fully customizable through a .pylintrc file that you can customize for errors or agreements relevant to you.

Example:

Here is a program having some styling issues:

sample.py

a = 23
b = 45
c = a + b

print(c)

After running pylint you'll get the following output that lists down multiple styling issues in the program.


% pylint sample.py  
************* Module sample
sample.py:5:0: C0304: Final newline missing (missing-final-newline)
sample.py:1:0: C0114: Missing module docstring (missing-module-docstring)
sample.py:1:0: C0103: Constant name "a" doesn't conform to UPPER_CASE naming style (invalid-name)
sample.py:2:0: C0103: Constant name "b" doesn't conform to UPPER_CASE naming style (invalid-name)
sample.py:3:0: C0103: Constant name "c" doesn't conform to UPPER_CASE naming style (invalid-name)

Pyflakes

pyflakes is a verification tool for python source code. It just doesn't verify the style at all but verifies only logistic errors. It emits very few false positives, which means that it will not display errors about missing docstrings or argument names that don't match the naming style.

What makes pyflakes faster than pylint is its ability to examine the AST of each file individually, combined with a limited set of errors.

You can install pyflakes with

$ pip install --upgrade pyflakes

As I mentioned before, pyflakes don’t do any stylistic checks, but if you want, you can do style checks using another tool called Flake8 that combines pyflakes with PEP8 style checks. Additionally, Flake8 also gives you the advantage of adding configuration options for each project.

Mypy

mypy is slightly different from pylint and pyflakes as it is a static type checker for Python. It requires your code to be annotated using Python 3 function annotation syntax (PEP484) in order to type-check the code and detect common bugs. The purpose of mypy is to combine the advantages of dynamic and static typing (using a typing module).

From Python...

def fib(n):
    a, b = 0, 1
    while a < n:
        yield a
        a, b = b, a+b

...to statically typed Python

def fib(n: int) -> Iterator[int]:
    a, b = 0, 1
    while a < n:
        yield a
        a, b = b, a+b

Type declarations act as machine-tested documentation, and static typing makes your code clear and easy to modify without making errors.

Prospector

Prospector is a powerful static analysis tool for Python code. It displays information about errors, potential problems, convention violations, and complexity. It brings together the functionality of other analysis tools such as:

PyLint - Code quality/Error detection/Duplicate code detection
pep8.py - PEP8 code quality
pep257.py - PEP27 Comment quality
pyflakes - Error detection
mccabe - Cyclomatic Complexity Analyser
dodgy - secrets leak detection
pyroma - setup.py validator
vulture - unused code detection

Prospector has a number of settings to suppress picky warnings from pylint, pep8 or pyflakes and provide only what is important.

Bandit

Bandit is a tool developed to find common security issues in Python code. To do this, it analyzes every file, builds an AST from it, and runs suitable plugins to the AST nodes. Once it has completed static analysis for security issues on all of the documents, it generates a report. It can look for Hardcoded passwords, Invalid pickle serialization/deserialization, Shell injections, and SQL injections.

Automated Static Code Analysis

You can also automate the static code analysis that can help you develop the culture of creating quality code. Automating the static code analysis saves a lot of time. It helps to identify the issues that may not be detected otherwise.

Features & Capabilities

Most of the automated static code analysis tools offer the following features: