Python is a powerful language. However, the import system might be hard to understand and not just for beginners.
It is a little bit challenging. Hopefully, this article will give you some hints.
Do you want to import something in Python? Use
import os try: os.mkdir("new") except: print("cannot create new dir ")
Any module can access other modules like that. Seems fair enough, what is so complicated?
Before we answer that, what exactly is a module in Python? The idea is to reuse blocks of code, making the program more robust and maintainable (modularity).
However, it's not any file containing Python code with a
.py extension. Sometimes, developers like to group several instructions into a
.py file instead of typing them one by one with the interpreter. In this case, it's called a script.
When the code gets longer, it's better for maintenance to split it, e.g., function definitions. Otherwise, the code becomes harder to read. Those definitions are available thanks to a unique and central file we call a module.
We also write modules because definitions are lost when you leave the Python interpreter.
A package is a group of modules (~ folder).
There are many different syntaxes for imports. You can import the entire thing:
You can import only one or several functions:
from os import mkdir
The apparent difference is that the second syntax allows for using
mkdir directly, whereas the first syntax makes you write
os.mkdir every time. Another big difference is that the second syntax does not import
os but only
There's another standard syntax that involves aliases:
import super_mega_long_module_name as module
It is useful to shorten extra-long names and when two packages define different functions with the same name.
Anyways, it's imperative to note that :
- imports are case-sensitive
- Python runs imported modules and packages
I encourage using PEP8 styles for your imports, especially if you are a beginner. It's a style guide that gives best practices.
It would be best if you never wrote something like the following:
from MODULE import *
You are loading everything contained in said MODULE, which can be a massive amount of code, and it's the "best" way to get naming collisions. You may get errors or weird behaviors that are difficult to debug.
Linters such as Pylint always flag wildcard imports as errors.
When you write any import statement, the Python interpreter looks first in the module's cached list:
a dictionary that maps module names to modules which have already been loaded
If it does not find anything, it looks in the built-in modules (written in C), e.g.,
math. If it still does not find anything, it uses the sys.path, which is a list of directories that includes several paths, including the PYTHONPATH.
When it finds something, it binds the name you use in your statement in the local scope, which allows you to use it and make aliases.
An absolute import looks like that :
from mypackage import mymodule
Here is a relative import :
from . import mymodule
The dot (".") is for the current code file directory. If there are two dots (".."), then it refers to the parent directory.
You have to use dots because you must be explicit when making relative import. Python 3 does not allow implicit relative imports.
According to the the PEP8 guidelines, absolute imports are the best practice as it's better for readability, and you should use explicit relative imports only in particular cases:
when dealing with complex package layouts where using absolute imports would be unnecessarily verbose
It means that the following syntax has very little interest:
from mypackage.mysubpackage.mysubsubpackage.mymodule import myfunction
Here, an explicit relative import seems legitimate.
Imports can turn nasty.
The circular effect happens when A imports B and B imports A. You often get an error. You'd rather refactor your code than trying any hacky workaround.
There are other traps on the list, but I prefer debugging instead of listing all cases here. Let's do it!
One of the best options is the "-v" option. It stands for "verbose," and it can save you a lot of time:
python3 -v mymodule.py
However, there are more vicious bugs, which are tough to debug. In those cases, you don't have a lot of choices. A step by step debug is probably your only chance. To do that, use the
import pdb; pdb.set_trace()
Very useful to add breakpoints and start your investigations. It allows for fixing the context where you put the breakpoint so that Python will execute any expression in that specific context.
We just saw it's better to structure your code with modules and packages when you make Python apps.
There's this file with a strange name,
__init__py, you might see it multiple times in blog posts and the documentation. This file can be either empty or full of code to initialize stuff.
What the heck is this?
Before Python 3.3, it was a mandatory file! If you removed it, Python would not load any submodules from the package anymore.
It's essential to note that Python loads this file first in a module. That's why developers use it to initialize stuff.
Python 3.3 introduced implicit namespace packages, so you can remove this file. It still works in Python 2, though.
But wait. It only applies to empty
__init__py files. If you need some particular initialization, you may still need that file! One should be extra careful when migrating old code.
Thus, it's pretty wrong to say it's no longer needed. I would advise not to use implicit namespace packages unless you are perfectly aware of what you are doing.
__name__ is a magical variable that holds the name of the current Python module. With the following code :
if __name__ == "__main__": # some code here
You tell Python to execute the code only when you run it from a CLI with
python -m. The code does not run if you import it as a module.
So Python imports are tricky, but at the same time, it's good practice to split code into reusable modules?
What do we do next?
First, remember that not all codes need files, modules, packages, and complex folder hierarchy. A significant part of the work consists of small scripts and command lines.
When doing some data science stuff with Python, you do many operations with fancy software such as Jupyter. Python is just the language. The only thing you care about is the data results.
However, for an entire application, you probably need some layout. Let's try some typology.
In that case, you are writing a script. Let's keep things at the same level (in the same folder):
myscript/ │ ├── .gitignore ├── myscript.py ├── LICENSE ├── README.md ├── requirements.txt ├── setup.py └── tests.py
The structure above is simple on purpose:
setup.pyis for dependencies and installation
tests.pyis for tests
requirements.txtis for other developers that want to use our script. It installs the correct versions of the required Python libraries, the Python package manager (pip) uses it
myscript.pyis your code
A package is a collection of modules. This time, we will have subdirectories:
moduloo/ │ ├── .gitignore ├── moduloo/ │ ├── __init__.py │ ├── moduloo.py │ └── utils.py │ ├── tests/ │ ├── moduloo_tests.py │ └── utils_tests.py │ ├── LICENSE ├── README.md ├── requirements.txt └── setup.py
N.B.: It's probably a good idea to add a
docs directory to that structure, but we won't see that here.
Indeed those layouts are quite basic. If you need a more complex structure, e.g., for the web, I recommend looking at web frameworks such as Django.
I hope you have a better overview of Python imports. I strongly recommend using PEP8 styles, especially if you are a beginner.
Most of the time, you'd better use absolute imports than relative imports. It makes sense only in a few cases, and even in those cases, relative imports have to be explicit. Do not hesitate to use aliases along with your imports.
Keep your layout as simple as possible. It's better for both readability and maintenance.