It wasn't until I changed jobs in November of 2021 that I began using something other than pip
to manage Python packages for my projects. I started using pipenv
at work and felt that it had a solid base and should use it in my own projects to manage package hashes and provide a more secure install. I also quickly realized that there were areas of the pipenv
codebase that needed improvement.
I found myself combing through the GitHub Issues, triaging reports and responding to other users. At the start of 2022 I joined forces with Oz Tiram--with the support of Frost Ming who was seeking pipenv maintainers in order to work more exclusively on pdm, a modern alternative python package manager.
We have been hard at work in our spare time improving pipenv and there have have been more than a few exciting releases this year. The latest and greatest pipenv==2022.9.4
has worked out the remaining edge cases in a performance optimization that was released the end of August 2022.
Benchmarks have been a tough question to answer in the python package manager space--there are many variables: host computer, internet connection, private package indexes, many dependencies, resolver, installer, oh my! Well fortunately at the end of July 2022 Lincoln Loop built an independent benchmark for python package managers which runs on GitHub actions every 6 hours and averages the results over the prior 4 runs for the primary python package managers.
Upon learning of this benchmark, we began exporting the results at major release points in order to have data to track how pipenv performance is overtime. The 2022.8.5
release had an average install benchmark time of just under 150 seconds, and perhaps the 2022.8.24
release was marginally faster than that, we had just fully dropped pip-shims
after all. The reality was, pipenv install was still slow and so I spent a weekend looking over the install logic in more detail and refactoring how the batch install worked.
The fundamental problem was that pip
was being invoked per individual dependency and on my 8 core SMT processor laptop it sounded like a plane taking off. I began to wonder about batching up the resolved dependencies and invoking pip
a minimal number of times to accomplish the same install. It turned out that the prior level of parallelism used a lot of processing power without improving the wall clock time.
The other potential problem was doing a true batch install meant it was no longer possible to show an incremental progress bar, where the increment the completion of each individual package install. The compromise here was to read the stdout
of the pip
process in real-time for users that pass in the --verbose
flag which allows observing the installation progress, though its not visually a progress bar.
My PR went through the peer review process and fortunately Oz is very supportive of making such changes for performance, and so the excitement began to grow as we got closer to releasing the enhancement. What would our benchmark tell us about the install performance after this released? Based on my best feeling, I was expecting it could be a 33% speed increase but it was anybody's guess. Then August 31st we got our first inclination that the improvement was closer to 50% for the case of installing the sentry requirements, whoa!
So how does pipenv==2022.9.4
compare to the other package managers for install speed? Here are the benchmarks at the time of writing:
For the latest benchmarks and complete stats for each operation: install, lock, update, add package, and tooling--visit the Python Package Manager Shootout.
Top comments (1)
Solid writeup on the thought process and the journey! Thanks for making pipenv better 👏