All over the interwebs I see people/companies suggesting:
pip freeze > requirements.txt
as a way to update requirements file with latest and greatest after pip install-ing some new fancy packages.
The problem
There are few dangers to this approach, mainly that libraries have their own dependencies which will get surfaced via pip freeze.
These dependencies may then be deprecated as our libraries are updated, but will then be left in our requirements.txt file with no good reason, polluting our dependency list.
Example
When switching to Python 3 recently, we had futures package installed, which is a backport of some Python 3 functionality to Python 2. futures was brought in as a 2nd level dependency on some package (which one is very hard to know, because tracing it through the git log may result in seeing a single commit in which multiple packages are updated, and many added, as was the case for our futures dependency).
When I started to run our service on our hosts, I kept seeing very odd and confusing error messages related to some concurrency module. I did a bunch of googling, and testing (deploying the service again and again to servers), and finally, after 3 hours of digging in, I finally realized this was due to the futures package being in our requirements.txt file. There were other complications which made things worse ? how our service deployment managed it?s virtual environment how the PYTHONPATH was managed and how/when it installed the packages. For some reason, this didn?t affect running our service locally, which didn?t make figuring this out any easier.
Bottom line: We never directly depended on this futures package, and yet it managed to waste a lot of time and energy when we changed the environment in which our app ran.
Here?s how our requirements file progressed over time for this scenario:
a_library_we_need=1.0futures==1.0 # brought in by `a_library_we_need`
Then, a_library_we_need was updated to a version that didn?t require futures aymore:
a_library_we_need=2.0futures==1.0 # not really required at all anymore
A Solution
If we had hand-crafted our requirements.txt file, we would never have had this issue.
Manually updating the file has the benefit of having the history and reasoning behind all of our dependencies clearly stated in the repository history.
It also makes it much easier for developers to reason about when considering updating these dependencies, or re-evaluating their need.
It allows developers to add some sensible structure to the requirements.txt file ? lines at top could be major libraries (say, Django) and closer to the bottom we could have utility libraries (say, requests).
Yet another benefit is that sometimes developers are testing things out locally, and may bring in some libraries that were part of experiments or some local setup, that have nothing to do with the production version of the service.
Using pip freeze is very simple. One has simply to do something like:
$ pip install new_package==1.2.3$ pip freeze > requirements.txt$ git add requirements.txt$ git commit “Update libs”$ git push
But what I?d suggest is to rather go for an approach like this:
$ pip install new_package==1.2.3$ vim requirements.txt # add “new_package=1.2.3” into file$ git add requirements.txt$ git commit “Add library: ‘new_package==1.2.3′”$ git push
This isn?t a complete solution however, as pointed out by my friend Baldur, because as our dependencies often state their dependencies using >=, it means that if two persons install the same hand-crafted requirements.txtfile at different time, those sub-dependencies may have been updated, resulting in a different final set of packages between the two persons. This could be amended by actually using pip freeze to set up another file, say, requirements-freeze.txt and exclude our direct dependencies from that resulting list.
One can then focus on updating the direct dependencies, but still keep an eye on how the sub-dependencies change over time, and keep it separate from the main requirements.
(Small side note on git-usage: Try and get used to using git add <filepath> or git add -p/–patch rather than git add -a. And use git show and git diff <remote-branch> every time before pushing your changes out. You never know if you accidentally left in some changes you didn?t mean to push.)
Dependency hell
Another problem with pip which I?m not really touching on is dependency conflict.
Say you require library p1 and p2 but that they both depend on another library, d . This is usually not a problem, unless they depend specifically on different versions of that package. If p1 depends on d==1.0 and p2 depends on d==2.0 , there will be conflicts when installing.
pip freeze will change a requirements file from:
p1p2
into something like
p1p2d==2.0
This rarely happens, as usually dependencies are set up as >=X.Y.Z and packages tend to ?march? forward together over time, and it seems rare that people put < or == in their dependency list.
I?m honestly not entirely sure how pip handles these ? it may vary between Python 2 and Python 3, and may even vary between pip versions. I think I?ve seen the conflict surfaced during pip install in the form of a warning at the end.
Of course, wherever p1 is used in a way that uses parts of d==2.0 which are not understood by p1 , we may end up with some exceptions flying around.
There have been attempts at having good dependency resolution sitting on top of pip/PyPI, but most people just use vanilla pip, so I won?t get into those here. There?s also a new package tool, called pipenv, which may help solve some of these problems I raise.
The main point is that if you hand-craft the requirements.txt file, it?ll be easier to resolve the conflict, because the developer, looking at the requirements.txt file, will know for sure that d is a 2nd level dependency, and that the project doesn?t itself require a specific version of it.
Well ? hope you gained something from this post, and that you will consider moving from pip freeze to hand-crafting.