Python Package Management (4 Part Series)
There are a lot of decisions to be made with any new Python project. Developers put a lot of effort into picking the right framework with the right plugins for whatever job needs doing. One less glamorous decision that I think often gets overlooked is what I call the "dependency stack". These tools cover a variety of tasks all centered around how you manage your dependencies in development and deployment. This post will define what the dependency stack is, what it's used for, and the things you should consider when selecting the tools you'll use. In a series of followup posts, I'll talk about the pros and cons of some of the options out there.
|Vanilla||The standard choice||Hardest to use|
The most basic task nearly every Python project needs to do is declare dependencies (also called requirements). In simple terms, this is listing all of the libraries that need to be installed to work on or use your package. Each dependency will often be constrained, limiting the versions that are known to work with your code. If you're creating an application to be deployed in some production environment, your dependency list will also serve as a manifest for what that application includes, making it easier to check for known security vulnerabilities in whatever third-party libraries you're making use of. As projects get more complex, you might end up with multiple lists of requirements, defining some that are only required for developers, or some which provide extra functionality to your project but aren't strictly required. The "tool" you'll use for declaring dependencies is actually a file, and there are several formats to choose from depending on the stack you're using.
A very common development pattern with Python is to create a separate "virtual environment" for each project. This makes it very easy to have different versions of dependencies (and different versions of Python!) for different projects. Having isolated dependencies is extremely important for reproducible results. A lot of languages have local dependencies as the default behavior (npm's node_modules), with Python you need to select a tool to manage your environments, and this is part of the dependency stack.
Once you've defined your dependencies and have a place to put them, you'll need to use a tool to actually install them. Ideally this tool will produce dependable results so that when the developer next to you installs from the same requirements file, you'll both have matching setups (reduce "it works on my machine" syndrome). You'll also want this tool to be able to install from a variety of sources. Some examples might be the public PyPI repository, a private PyPI repo, a Git repo, or a local folder (say, for git submodules). Private PyPI repos will often require authentication, so this installation tool must be able to handle authentication and ideally be able to store your credentials somewhere so you don't have to copy them in from your password manager every time you update.
While there are plenty of use cases for simple source distribution of a Python project, you'll often want the ability to bundle up your code into another format. In particular, if you plan on publishing a library, you'll want to be able to build a wheel. Your tool of choice will have to be able to include a bunch of metadata about your project (like a version number and all of its dependencies). The simpler this process is for you, the better. Developers should get to focus on development.
If you're developing a library (a package designed to be imported and used by other packages), you'll most likely want to publish. Uploading to a PyPI repo (public or private) is the easiest way to distribute Python code with proper version management. Your stack will need a tool that makes this process (and the previous build step) simple, and can authenticate with whatever PyPI repo you're uploading to.
This is a bit repetitive, because publishing is a form of distribution, but publishing is usually only used for libraries. If you're creating an application, you'll need a way to get that application to devices that will actually run it. You could publish to a PyPI repo, but if your package depends on private code, then any machine which need to run the app will also need credentials for your private repo (inconvenient to say the least). Sometimes it's useful to be able to either download all the libraries your app depends on and bundle them with your application for distribution. This is the one step that none of the stacks I'll cover in future posts make easier, but some of them make it more difficult. There is another form of distribution called "freezing", but I won't cover it because in my experience it's more trouble than it's worth.
That's it! Those are all the common tasks in Python which are covered under the "dependency stack". As I said, I'm going to follow up with posts about the different stacks I use every day. What tools do you use to accomplish these tasks? Is there anything that you think is related that I left out? Let me know!