~ 6 min read

One Does Not Simply 'pip install'

So, you’ve got your brand new pristine machine and you’re keen to kick off a new Python project. You know you want to use packages that others have written and focus only on the code you need to write. So you head off to github, find your favourite package and turn to the readme to get it installed - it seems dead simple just a ‘pip install’ away. Nothing could possibly go wrong. Right?

Did you spot what the problem is? It’s easy to miss since many packages don’t point out best practices or alternatives to pip in their READMEs. In fact looking at some of the most popular Python projects we can see this problem there - requests, Pillow, and Scrapy. The problem is most readmes completely skip mentioning virtual environments and often lead new devs (and those from other languages) down a poor path. This isn’t a documentation problem, but one deep within Python itself.

The Problems with pip

If you’ve ever experienced this, let me break down whats wrong. When you add a package in this way is you’ll be installing and possibly upgrading the global version of Python packages present on your system. When you come to install another package for your next project you may well end up with conflicting dependencies. Lets say you use the same package again, but theres been a new release with some additional features. When you upgrade your global Python to use it, you now need to ensure every project you’ve done now works with it. That’s annoying and unlikely to happen, what you’ll be left with is a broken build.

Global Installation

Lets use an example of using requests the http request library - First lets go ahead and install it.

pip install requests

This installs latest version into our system Python, but also doesn’t capture which version anywhere, the only notice we’ve got of what was installed is what is output in the terminal. Your global python packages will be installed somewhere other than your project folder, so you can’t easily locate them.

Manual Capture

If we want to capture what was installed, we need to remember to do it ourselves. We can see the version of packages and anything else it’s installed with pip freeze:

pip freeze
...
certifi=-2022.12.7
charset-normalizer=-3.0.1
idna==3.4
requests=-2.28.2
urllib3-1.26.14

This gives us everything that’s been installed along with requests, crucially all it’s dependencies too. Typically to capture dependencies that have been installed in this way, you’d direct freezes output to a requirements.txt file. Capturing what we’ve installed is paramount if we want to produce a deterministic build and sharing it to other locations than a local machine.

pip freeze > requirements.txt

The problem with this approach is it’s simply a text file, so doesn’t allow us to have types of dependencies, I couldn’t install black to format my code as a development dependency for instance, it gets lumped with everything else.

You might expect if I were to pip uninstall requests that I get back to a clean system, right? Wrong - when do so, I uninstall requests, but none of its dependencies. We can see what’s happened with a pip freeze again.

pip freeze
...
certifi=-2022.12.7
charset-normalizer=-3.0.1
idna==3.4
requests=-2.28.2
urllib3-1.26.14

Instead, I need to manually remember to uninstall using a requirements.txt that I’d captured to.

pip uninstall -r requirements.txt

So it’s simple to manage dependencies with pip, but the problem is remembering all these steps is cumbersome and could be simpler. If you’re a new Python developer, it’s highly unlikely you may be familiar with doing some of these things (or as a developer from another language may assume they’re done for you) and quickly get your system into a horrible mess.

Differences to npm, the Node.js Installer

Let’s contrast this with how you might install packages with another language, like Node.js. In this scenario I’m going to install node-fetch, used for making http requests. I’m immediately directed by the readme to install using npm, which is something that comes down as part of Node:

npm install node-fetch

Once we do this, we have a whole load of things in the folder we just installed into.

drwxr-xr-x  5 ian staff 160B 9 Feb 15:17 .
drwxr-xr-x  3 ian staff 96B  9 Feb 13:48 ..
drwxr-xr-x  9 ian staff 288B 9 Feb 15:17 node_modules
-rw-r--r--  1 ian staff 5.0K 9 Feb 15:17 package-lock.json
-rw-r--r--  1 ian staff 55B  9 Feb 15:17 package.json

We have all our dependencies in a single place, a package.json and package-lock.json indicating all the dependencies and there dependencies together. Crucially we have everything to reproduce what we’ve just installed all through using a npm install along with the json files.

Also, if we npm uninstall node-fetch we actually uninstall everything (including the packages dependencies) along with removing our json files and node_modules directory since nothing is installed anymore. A much better experience in my opinion and what you’d expect to happen.

The Alternatives

There’s no shortage of package management alternatives available for Python that are built for working with virtual environments. Which is part of the problem unfortunately.

  • venv + pip - The one built straight into Python’s standard library but requires you to remember each of the above steps to capture to a file.
  • pipenv - Will automatically create and manage a virtualenv and adds and removes dependencies to a Pipfile
  • Poetry - Good for creating your own packages, uses a pyproject.toml. Typically faster at resolving dependencies.
  • PDM (Python Development Master) which allows you to use packages but without a virtual environment.

How someone is meant to pick between these as a new developer is a mystery. I’ve mostly used Pipenv, but like Poetry for publishing my own stuff. I have a playlist of tutorials on each of the above as videos on YouTube.

Now there are some packages which do things really well. Flask for instance, has a really good set of instructions where it says to use a virtual environment for installing stuff and gives examples of doing so with venv. But most of the time the first suggestion is using pip alone, and I’m not sure that that is the correct default for guiding new developers.

NB. I’m in the process of writing an ebook on virtual environments, subscribe to my newsletter if you’re interested in hearing about it when it launches

Subscribe for Exclusives

My monthly newsletter shares exclusive articles you won't find elsewhere, tools and code. No spam, unsubscribe any time.