Bernd Wechner

Posted on Oct 16, 2021

Publishing a Python Package: What I Wish the Maze of Tutorials Covered

#python #pip #pypi

I've written a number of python packages over time for my own use really and I'm so snowed under just getting it done that every time I stopped to consider sharing one, so it could be pip installed, I searched on-line and immediately got lost among the tens, hundreds maybe, of guides, tutorials and options and what looked like enough material for a doctoral dissertation to wade through. And all of it looked scary talking about starting from scratch (not from an existing package), wanting me to add and write a dozen files and understand this and that, and worse there are old ways, new ways, alternate ways ... Aaargh.

But hey, we're in a snap 3 day lockdown here (thanks COVID!) and I was tidying up some projects and tried again ... this time, experience in hand, (and less hair) I figured I would put my blinkers on and use just the official tutorial:

https://packaging.python.org/tutorials/packaging-projects/

And I was mildly pleased. I guess my expectations had dropped but hey, it proved manageable and I did succeed in publishing a few packages. But even this tutorial left me wasting far too much time trying to work out how stuff works and I wanted to write it down ASAP, for my own sake, and well, here we are ... share it.

So here's a better (IMHO) guide to packaging your Python Project and publishing it (better because it says what I wanted to know, and I'm making different mistakes and it has it's own shortcomings that I don't notice ;-).

1. Start with a package

Yep, I already have packages ... a good few of them. I want to publish them. Anyhow a package of course is just a folder with an __init__.py file in it, and if it's a small package (as many of mine are) that's all they are. Nothing more, nothing less, than a folder with an __init__.py in it that provided some classes and/or functions.

Sometimes there might be a few more .py files in the folder beside the __init__.py. Minor detail, just what happens when it's a little too big to fit conveniently into the one file.

Key thing here is we're not starting from scratch, but have a package.

2. Get the tools

You only need two tools as it happens and they are:

build - which creates from your package by default a .tar.gz file and a .whl file which is what pip wants/needs for it's installing career.
twine - which publishes to pypi.org

and so for prep:



pip install build twine

Not too bad. Step 2 was easy.

3. Prep the few extra files a package wants

Not many, don't fret.

README.md - a simple markdown file with a welcome message and whatever you want to add. What is this package, how do you use it? I use Typora but you can write it in any text editor and it can be as brief or in depth as you like. It is what's shown on the pypi.org page for your package so it's also your ad if you like for your package.
LICENSE.md - Not sure you need it but worth doing and easy as Py. I am beastly careless in this space and just love the Hippocratic License. Download the markdown version and save it as LICENSE.md

That's it! And to think it seemed so scary in past.

4. But no! There's more - just a little more, not much.

The tutorial recommends that you lay out your folder like this (yes, I've simplified it a bit):



the-folder-I-keep-it-in/
├── LICENSE
├── README.md
├── pyproject.toml
├── setup.cfg
└── src/
    └── my_package/
        └── __init__.py

The things to note are:

You don't need to put it in a src folder, but why not? If it ain't broke don't fix it. The src strategy means you can just drag and drop you package from where it was into src ... done. The stuff above it is the publishing kit ...
pyproject.toml and setup.cfg just tell build and twine what to do. We'll come back to these shortly. The first is just a standard file to tell build that we're going to use twine in a roundabout way ;-) and the second one describes your package so twine can publish it (purists may argue with this neat division, but let them).
The folder the-folder-I-keep-it-in can have any name you like. won't change a thing with the build or publish. I actually call it my-package (in this example). As to why, keep reading. It's just convenient that's all.
The folder my_package should use underscores between words, yes, do it. There's a bizarre confusion in the Python world between my_package and my-package.

`my_package` and `my-package`? What, why, when?

This is described nowhere, and I had to work this out with a lot of trial an error and hair pulling alas. But here's what I got for you.

my_package: just stick to this don't waver, never waver, use only this ;-). I kid ye not. Using my-package in either the folder under src or in setup.cfg will cause you grief during the build, publish, install and test.
Once it's published it will appear on pypi.org as my-package and people will install it with pip install my-package, but use it with import my_package. That's just the way it is, that's the convention, don't rock the boat, all you need to know is you don't have to lift a finger to make that happen, just stick with my_package in the src folder and in setup.cfg.
But of course, the-folder-I-keep-it-inis irrelevant here and I call it my-package just because, because that's what the package is called. The only other exception is the github repo if you're using one (and I do), that too can be my-package and is in my case in fact later you'll se I can exploit that for a nice two line install script.

5. `pyproject.toml` and `setup.cfg`

pyproject.toml is easy. Just copy the standard. Put this in it:



[build-system]
requires = [
    "setuptools>=42",
    "wheel"
]
build-backend = "setuptools.build_meta"

and be done with it. Ask no more. It's build internals and unless you're super keen in digging deeper, let it rest, this just means when you run python3 -m build in your package folder, it knows what to do (if you don't have this file it will ask for one). What it does, is created a dist folder and drops two files in it. These are what twine needs

setup.cfg is not hard either and here's my minimalist take and the clarifications that I felt were missing elsewhere:



[metadata]
name = my_package
version = 0.1
author = my name
author_email = my email address
description = My little package
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/me/my-package
project_urls =
    Bug Tracker = https://github.com/me/my-package/issues
classifiers =
    Programming Language :: Python :: 3
    License :: Freely Distributable
    Operating System :: OS Independent
    Development Status :: 4 - Beta
    Framework :: Django :: 3.2
    Intended Audience :: System Administrators
    Topic :: Software Development :: Libraries :: Python Modules

[options]
install_requires =
    other_package1 >= 0.1.1
    other_package2 >= 2.0.1
package_dir =
    = src
packages = find:
python_requires = >=3.6

[options.packages.find]
where = src

And here's what I felt I should have known:

name should use my_package not my-package. Just believe me. Things go weird if it says my-package. Experiment if you like, I wish I didn't need to and the tutorial was clear here.
install_requires wants one indented line per requirement with relatively familiar syntax (similar to pip freeze - another one of those mysteriously named python commands that actually means pip show-me-whats-installed). This is completely missed in the tutorial.
package_dir is weird, yes, but forget it. Like install_requires it has a list of one liners beneath it, in this case just one. The one liners map package names to folders somehow in the internal complexities of setuptools - details most of don't care about or want to know about when publishing our simple one file package. The tutorial tells us that this line maps the "noname" package to the src folder, and that the "noname" package (that nothingness before the = sign) is a code name for the overarching root package, so the src folder becomes the mystical "root package". Do most of us actually care about this? What is a "root package"? anyhow. Nah, let's leave it for the boffins, and just accept this is the odd way of telling build and/or twine that our package is in the src folder.
there's nothing missing after find:. No. That's just the syntax, live with it. Refer back to the intro, re: my sentiments on the unnecessary befuddling cryptic nature of Python package publication ... Ditto the where = src, just accept it.
The classifiers are bit fiddly they have to come from the list of allowed classifiers. And they bothersomley lack a clear way of saying you're using the Hippocratic License (which I just happen to love).

6. The Importance and Catches with Testing

Publishing is as simple as:



python3 -m twine upload dist/*

BUT, it's committal. Once you've published there appears to be no way of undoing it and it consume the filenames you used (which means also the version you have in setup.cfg as these get built into the filenames in dist).

And so, testing first is critical. And pypi.org provide testpypi at https://test.pypi.org/ that you can publish to freely, as often as you need to get it right.

The main things that demand a retry are in my experience:

You look at it on pypi and README.md has issues. Either typos, or code lines that are too long and render badly etc. Either way, you get see how it's going to presented on pypi and can adjust your README to look nice.
Your test installing it with pip doesn't work. Which actually doens't happen now that I have a workflow, but happened a lot while Iw as trying to work all that setup.cfg syntax out that the tutorial deigns to gloss over.

To publish to the test site it's just small variant:



python3 -m twine upload --repository testpypi dist/*

The catches

So testing is great. A lifesaver. But it caused me some modest grief too (the flip side of the same coin).

Firstly you need to create an account on the site, and I did that but use Bitwarden always, and generate large random passwords for me - a habit (that we should all have).

twine when used as above prompts for username and password. Alas these long random passwords of mine are not easy to type, so I usually do a copy/paste but alas pasting the password does not work - I tried and tried.

Fortunately they can be provided on the command line as in:



python3 -m twine upload --repository testpypi -u $username -p $password dist/*

and I saved this in a file called test-publish that reads:



#!/bin/bash
source ~/.auth/pypi.auth
python3 -m twine upload --repository testpypi --verbose -u $username -p $password dist/*

Secondly, you can't republish. At all. You need to increment the version in setup.cfg and rebuild before you can republish. Slows things down some. Not least because of the time and energy spent searching online for ways and means to republish. Some on-line sources suggest --skip-existing does the trick, but it doesn't - not for me and it's not clear what it does or what it's for and maybe I just misread that. C'est la vie.

Thirdly, the dependencies listed under install_requires in setup.cfg don't work, presumably because, when testing the required packages aren't on https://test.pypi.org/. But it took a bit of head scratching and try and try again to convince myself of that, as I was trying believe it or not to validate the syntax for just that setting as it's not described in the tutorial and sent me looking at that warren of other sources quickly again. I do wish that testpyi would look at pypi for requirements as a fallback so this test cycle could be complete.

7. A Standard Workflow

OK, so having gone through that all now, like most folk eventually do, I have a standard template (the last package I published). I now routinely use five tiny little two line shell scripts to make life easy for myself.

Basically a build script, and two publish and install scripts.

In order:

A script to build:
build:



#!/bin/bash
rm dist/*
python3 -m build

A script to test publishing:
test-publish:



#!/bin/bash
source ~/.auth/pypi.auth
python3 -m twine upload --repository testpypi  --verbose -u $username -p $password dist/*

A script to install the test publish (test installing) - noting that errors here about requirements that cannot be met are expected:
test-install:



#!/bin/bash
package=$(basename $(dirname $(readlink -f "$0")))
python -m pip install --index-url https://test.pypi.org/simple/ $package

A script to publish properly:
publish:



#!/bin/bash
source ~/.auth/pypi.auth
python3 -m twine upload --verbose -u $username -p $password dist/*

A script to install the package properly:
install:



#!/bin/bash
package=$(basename $(dirname $(readlink -f "$0")))
python -m pip install $package

A basic example of that together you can visit at:

https://github.com/bernd-wechner/django-model-admin-fields

and see here:

https://pypi.org/project/django-model-admin-fields/

I hope that helps someone save all the learning hassle, and publish something easily, by just adding 4 files to a folder (a README.md to write, a LICENSE.md to download, a pyproject.toml to copy, and setup.cfg to tune) and maybe 5 tiny little helper bash scripts and in no time a test and then publish cycle is underway.

Top comments (6)

Vincent A. Cicirello • Oct 16 '21

The reason you can use an underscore but not a hyphen in the name is that it must be a valid identifier for the import statement, and hyphens are not allowed in Python identifiers. The directory name inside src must match so must not use hyphens or any other characters that are disallowed in identifiers.

As for why hyphens are otherwise preferred over underscores for the name used to install from pypi, I'm not as certain but I have a guess.... It is likely due to the general preference of hyphens instead of underscores in URLs. See for example these guidelines from Google: developers.google.com/search/docs/.... They don't explain why that is the preference but it could be related to the fact that underscores are not allowed in domains, so even though they are allowed in the rest of a URL you have greater visual consistency if you also avoid them in the rest of a URL.

This isn't unique to Python. I use Java more than Python. Java package names and module names can use underscores but not hyphens. Generally, when you publish Java artifacts to Maven Central, the artifact name is often the same as either the Java module (if modules are in use) or a Java package contained in the artifact except using hyphens rather than underscores if you have a reason to use either. I'm not actually sure if underscores are actually disallowed in artifact names or if it is a strong convention to use hyphens. The file name of a jar on Maven Central includes the artifact name, the version, and an identifier all separated by hyphens, so by using hyphens in an artifact name when separation is needed looks nicer since it is consistent with rest of filename.

It also has the benefit that if you have a site dedicated to it that you can use the artifact name in the domain if you use hyphens, which you can't do if there were underscores. Here is an example.... I have a Java library named rho-mu (with a hyphen) so the artifact name and corresponding jar file uses a hyphen. But the jar contains a Java module named rhu_mu with an underscore. The website for the project uses the artifact name in the domain: https://rho-mu.cicirello.org. An underscore would not have been allowed there even though it could otherwise be used elsewhere in the URL.

Bernd Wechner • Oct 16 '21

Thanks for the considered appraisal. It is indeed likely that the - norms arise out of a need for URLs so the package name for example is needed in a URL like: github.com/bernd-wechner/my-package

That said, you misread me a little in that it it is not the specifics of the wherefores and why's that are my central observation or complaint, so much as the enormous unnecessary complexity that would-be contributors are exposed to to this day, not least in a language that is currently at the arguable peak of popularity.

But as you've given a moment to specifics I will add some of my specific test results that led me to pull my hair out and write down these notes (and those results which I did not include as the article is long enough as is). Consider the two configurables, the name of the folder under src and the same declared in setup.cfg. There are 4 variations to explore of - vs _ use and there are two outputs from build, a .tar.gz, and a .whl file. Put this in the context of the official tutorial in which:

name = example-pkg-YOUR-USERNAME-HERE

That is they use - not _. bear that in mind as you examine these four build outputs:

Using: src/my_package and name = my_package
Produces: my_package-0.1.tar.gz and my_package-0.1-py3-none-any.whl

Using: src/my-package and name = my_package
Produces: my_package-0.1.tar.gz and my_package-0.1-py3-none-any.whl

Using: src/my_package and name = my-package
Produces: my-package-0.1.tar.gz and my_package-0.1-py3-none-any.whl

Using: src/my-package and name = my-package
Produces: my-package-0.1.tar.gz and my_package-0.1-py3-none-any.whl

Key observations:

Two scenarios produce files using the _
The use of name with - never works and yes is the recommended name int he official tutorial
Of the two that build to the same apparent result (files using _) the second publishes fine but cannot be installed and used. Go figure.

In conclusion the official tutorial, one of the last havens we have in a world that will produce is replete (as noted in my intro) an already befuddling number of tutes and more a cacophony of research material (to which I've only added I admit) is both a) wrong (suggests using a name that does not work) and b) completely ignores the issue (let along others, like how to defined requirements).

On top of which, befuddling to me is how that tutorial provides no ready clues on how to contribute to improving it. In so many other context today, such material is anything from an open wiki to sporting feedback buttons or notes on how to help improve the documentation. This one is wrong and the best it offers is a tiny "Found a bug?" link in the footer that jumps to an Issues list at:

github.com/pypa/packaging.python.o...

Given we've come this far (and are still in lockdown here ;-). I may just look at filing an issue or PRing a fix over there for the doc.

But the bemusement goes further. Setuptools for example have (finally) evolved to the point where you can use just a setup.cfg file with a basic setup.py assumed if it's missing. Next step will be for build to simply assume that basic pyproject.toml if it's missing. For one of the most popular languages and community based ones at that it would be nice if sharing packages became much much easier.

Vincent A. Cicirello • Oct 17 '21

Wow. That's weird that options that won't work actually produce something and in some cases even publish to pypi. If you try to do the equivalent in Java, either directory name or package name or both, you'll get syntax errors when you compile.

Bernd Wechner • Oct 17 '21

Totally agree. It's rather frustrating how complex it is and moreover that the official tutorial suggests something that plain doesn't work.

Vincent A. Cicirello • Oct 16 '21

Not being allowed to replace or remove a version is also not just a pypi thing. Maven Central also doesn't allow this. Once it is public, other packages might depend on it. Removing or even replacing it can then break other people's projects.

Bernd Wechner • Oct 17 '21 • Edited

That's all good and well, and easy enough to understand but still falls short of awesome ;-). There's public and there's public. In the extreme, there's public and got lots of people using it, and there's public just published now and ooops, made a mistake, let's fix it.

To help with the latter cast testpypi was born and that rocks! And yet it falls short of awesome too as we cannot test the install_requires there (that could be fixed by having pip more smartly try pypi if testpyi doesn't have a package - easily generalised to if repository is testX and an install_requires package cannot be find try the repository X).

But pypi could also be smarter. Allowing for two steps like many publishing media do. Push to pypi (visible publicly perhaps, maybe installable only with your account credentials) and then Releasing, making fully public. OR alternately keeping track of all installs (downloads and from where the request came) and if there are no downloads from source IPs different to the one that uploaded, then allow an overwrite (an oops style fix).

All just thoughts in the stunning and still very surprising complexity of publishing Python packages.