Using a Virtual Environment to Avoid Seeming like a Sadist
TL;DR: $ pip freeze > requirements.txt
Why not just write pretty code and push it to GitHub like a happy little clam, and not worry about making a requirements.txt? If my code runs on my computer, why should I give a care about my python environment? What even is a python environment? Perhaps a reticulated python?s terrarium?
There?s a happy little clam, in her environment
Nope. In short, we generate and share requirements.txt files to make it easier for other developers to install the correct versions of the required Python libraries (or ?packages?) to run the Python code we?ve written.
Python Packages, and Environment
Open-source python packages ? like beautifulsoup, or jupyter, or any of the other 158,872+ projects on the PyPi index ? offer tremendous functionality, way beyond that of the standard Python library. It?s like you can push a button and download any one of a bazillion effects pedals for your neat but sort of vanilla Fender Stratocaster, for free:
Python + Open source packages = Fuego
When I say Python environment, I mean: The ecosystem consisting of your particular installed version of python, plus all the third-party packages (?libraries?) it can access (and their precise versions). Every time you $ pip installsomething, you are expanding your python environment, giving it access to packages that are not part of the Python standard library.
If you $ pip install a bunch of stuff outside of a virtual environment (more on this later), then you are adding to your ?base? or ?root? or ?system? python environment. That is fine and good and totally valid for many sandbox-y purposes?
However, working exclusively in your base environment will inevitably cause headaches later, when you try to show the code you?ve built to other human beings. You can also start to run into compatibility issues (with your own code) as time goes on.
The problem we run into when we share our Python code is this: Not everyone has the same packages (and versions of those packages) installed when they grab your code from GitHub and try to run it.
Augustine Chang and I made a simple dash/flask app for class at Flatiron School. Allow me to highlight the potential pain involved in trying to download and run it on a totally random, different computer with no packages pre-installed. Here we?ll try to run the python file called run.py from the terminal:
$ python run.pyTraceback (most recent call last): File “run.py”, line 1, in <module> from pkg import app File “/Users/rob/Documents/_flatiron/mod-1-proj/pkg/__init__.py”, line 2, in <module> from flask import FlaskImportError: No module named flask
We get an error message saying ?No module named [n].? Okay, well, simple solution: We should be able to download and install [n] python module using pip, which we?ll touch on very briefly:
$ pip install flaskCollecting flask Downloading https://files.pythonhosted.org/packages/7f/e7/08578774ed4536d3242b14dacb4696386634607af824ea997202cd0edb4b/Flask-1.0.2-py2.py3-none-any.whl (91kB) 100% |????????????????????????????????| 92kB 3.2MB/s…Successfully installed Werkzeug-0.14.1 click-7.0 flask-1.0.2 itsdangerous-1.1.0
Great, we installed [n] by using pip. Now that that?s out of the way, let?s try running the run.py again and jam out:
$ python run.py Traceback (most recent call last): File ?run.py?, line 1, in <module> from pkg import app File ?/Users/rob/Documents/_flatiron/mod-1-proj/pkg/__init__.py?, line 4, in <module> from flask_sqlalchemy import SQLAlchemy ModuleNotFoundError: No module named ?flask_sqlalchemy?
No cigar. Now it wants some other package called flask_sqlalchemy. Gross.
How many times will we have to undergo this tedious back-and-forth? Why do we have to try to run the app just to be told one-at-a-time which packages we?re missing? Why can?t we just install all the packages this run.py script depends on, all at once? Are we even installing the right versions of the packages?
This is where a requirements.txt file comes into play. As long as the developers of this app provide a file listing the necessary packages, we can simply $ pip install -r requirements.txt and voila! All of the program?s ?dependencies? will be downloaded, installed, and ready to go in one fell swoop.
But alas: The developers failed to include a requirements.txt with their code, so we are back to error-message ping-pong, installing packages one-by-one as our terminal bosses us around. What clown forgets to include a requirements.txt? Certainly not our future selves.
Pip literally just stands for Pip Installs Packages. Enjoy that, momentarily. MIT computer scientists have a long and storied history full of recursive acronyms.
Pip, in addition to downloading and installing packages from the PyPi repository, can generate a requirements.txt file with this command:
$ pip freeze > requirements.txt
Here?s what comes out when I try this:
This is too many packages. I know for a fact that we did not implement every single one of these packages in the app we just wrote. This is just a list of all the python packages I?ve ever downloaded and installed on my base system.
We shouldn?t give this list to our users. There?s no need to make everybody who wants to use this app download and install every single package I have on my computer. We want to create a list of only the packages relevant to our app, and we want that list to detail the correct versions of each package.
Here?s what we should have done from the start:
We need to make a pristine little bubble world. A world where no python packages exist, except for the ones we need for this particular program to run. Then we can generate a requirements.txt, and archive it for future users, including ourselves.
The nice thing about making a virtual environment (?venv?) is that it enables us to take a snapshot of this fully functioning little bubble world, pin our dependencies, fold it up, and put it away. In the future, even after many new versions of each package (including python itself) have come and gone, we will be able to re-hydrate this little retro bubble world, and re-populate it with the correct versions of all the packages needed to make our old code happy.
The Python 3 standard library has built-in venv creation capabilities, but I don?t feel like talking about that. We?re gonna touch on the very basics of creating and navigating virtual environments with Conda.
First, install Anaconda. That gets you the Conda package and environment manager, which just makes life more pleasant, in my experience, and allows us to do this:
conda create -n shiny_new_env python=3.4
We?ve just created a new virtual environment and specified which version of Python will be installed to it! Now we can conda env list to see a list of venv?s available for us to play with:
$ conda env list# conda environments:#base * /Users/rob/anaconda3Data_Sci /Users/rob/anaconda3/envs/Data_Scishiny_new_env /Users/rob/anaconda3/envs/shiny_new_env
You?ll notice the star indicating which environment we?re in right now, the base environment. We can switch to our newly created virtual environment like so:
$ conda activate shiny_new_env(shiny_new_env)$
Suddenly our venv?s name appears in parentheses to the left of our terminal prompt ($). This means we?re in our venv.
Now I can play the game of error-message ping-pong in this bubble world?just once, so that nobody else ever has to do it again to run our code. I?ll pip-install all the packages our app requires in this as-of-yet empty place, then try another $ pip freeze > requirements.txt:
# write the file(shiny_new_env)$ pip freeze > requirements.txt# show the contents(shiny_new_env)$ cat requirements.txtcertifi==2018.10.15chardet==3.0.4Click==7.0dash==0.30.0dash-core-components==0.38.0dash-html-components==0.13.2dash-renderer==0.15.0decorator==4.3.0Flask==1.0.2Flask-Compress==1.4.0Flask-SQLAlchemy==2.3.2idna==2.7ipython-genutils==0.2.0itsdangerous==1.1.0Jinja2==2.10jsonschema==2.6.0jupyter-core==4.4.0MarkupSafe==1.1.0nbformat==4.4.0numpy==1.15.4pandas==0.22.0plotly==3.4.1python-dateutil==2.7.5pytz==2018.7requests==2.20.1retrying==1.3.3six==1.11.0SQLAlchemy==1.2.14traitlets==4.3.2urllib3==1.24.1Werkzeug==0.14.1
Still a lot of dependencies, but better. This is just the stuff our run.py needs and nothing more. Now we can roll this requirements.txt file into our GitHub repo, and nobody else will have to go back-and-forth with their terminal installing all these dependencies manually. The correct versions of each package are safely tucked away for future reference. Hooray!
Let?s pretend to be a new user and try to get set up in another completely new, empty virtual environment:
# Exit the current venv(shiny_new_env)$ conda deactivate# Spin up a new one$ conda create -n env_2 python=3.4# Activate it$ conda activate env_2# Install from our fancy new file(env_2)$ pip install -r requirements.txt
You might also try $ conda install -r requirements.txt if pip installation conflicts are giving you trouble.
Don?t be a sadist. Always make a requirements.txt, and do it from inside a nice clean virtual environment. If you use a new virtual environment for every python project you undertake, you will thank yourself later, when new versions of Python and its accompanying libraries are released.
Lastly, remember to breathe!
Florian Brand notes that Pipenv is a great tool for simultaneous virtual environment and dependency tree management. I have grown super fond of Pipenv for most things, but fall back on Conda when I get in over my head. Certain packages (ahem fbprophet) give me no end of trouble.
Sterling Paramore cites pip-tools as a great aid in compacting requirements lists and simplyfying updates.
Rebecca Sichel points to pipdeptree and pipreqs. Check them out!