Running scripts in temporary conda environments with conda execute

2015

October

3

Tags: conda (2), Python (10)

Conda is awesome - it is a simple package manager which allows me to create isolated software environments much like virtualenv. Unlike virtualenv though it can handle any package type, not just python ones.

The more I use it, the more I want to make use of conda's dependency tracking for my own simple scripts to ensure they were being executed in a suitable environment with the expected dependencies already installed.

conda build is an excellent tool for building your own distributions and sharing them on anaconda.org, but creating a distribution is tiresome if all you have is a single script, rather than a fully-fledged software package. That is where conda execute comes in.

conda execute allows you to run a script of any kind in a temporary environment defined by metadata in the script itself.

For example, take the following Python script:

1
2
3
4
5
#!/usr/bin/env python

import numpy as np

print(np.random.poisson(size=5))

By adding appropriate conda execute metadata to our script, we can describe the kind of environment we would need to be able to run this code:

$ cat my_script.py

#!/usr/bin/env python

# conda execute
# env:
#  - python >=3
#  - numpy

import numpy as np

print(np.random.poisson(size=5))

conda execute can now be used to run this script:

$ conda execute -v my_script.py

Using specification: 
env: [python >=3, numpy]
run_with: [/usr/bin/env, python]

Prefix: /Users/pelson/miniconda/tmp_envs/ea977067a8fbeb21a594

[1 2 0 1 0]

As you can see, the special comment in the script has been read, and an appropriate temporary environment has been created.

Temporary environments

In order to provision suitable environments for the executed scripts, conda-execute implements a temporary environment concept. Rather than adding a new environment in your conda environments directory (and thus filling up the available environments listed in conda env list), a new "tmp_envs" environments directory has been created, within which conda-execute's temporary environments are created (this location is configurable with the conda-execute/env-dir conda config item). As you may have noticed in the previous example, where the environment created was named ea977067a8fbeb21a594, these temporary environments are named by a hashing algorithm (SHA 256, truncated to 20 characters). The hash is taken from the conda-execute metadata of your script, which means that you can re-run a script many times and only need one environment to be created. Additionally, it has the advantage that multiple scripts can share the same environment if their conda-execute metadata is the same.

Each time a temporary environment is run with conda-execute a log entry is added, allowing it to keep track of which environments are still in use. Once an environment has been unused for 25 hours any subsequent conda-execute call will trigger it to be garbage collected, thus preventing your disk filling up with unneeded temporary environments.

Configurability

conda-execute builds on top of conda's configuration to allow some customisation in behaviour. The following condarc shows the configuration options that are available:

conda-execute:
    # The directory to use to hold the temporary environments.
    env-dir: "{config.envs_dirs[0]}/../tmp_envs"

    # The number of hours that an environment should be unused for to be
    # considered for garbage collection.
    remove-if-unused-for: 25

Reproducibility of scripts with conda-execute

There are a few really interesting use-cases which I'm keen to explore with conda-execute.

I'm already making use of conda-execute as a form of Makefile for this blog. My make.py is simply a command line wrapper to the appropriate pelican sub-command:

$> ./make.py --help
usage: Help [-h] {html,publish,reload} ...

positional arguments:
  {html,publish,reload}
    html                Make the html
    publish             Make publishable html, and put it in the
                        output_branch.
    reload              Make the html, and watch the folder for any changes.

optional arguments:
  -h, --help            show this help message and exit

The concept of creating reproducible scripts goes far wider than trivial Makefiles though - with conda-execute, because the metadata in the script is the definition of the execution environment, important information about its dependencies and how it is run are all embedded into the script itself.

I'm particularly keen to explore the reproducibility angle that conda-execute brings, particularly for scientific applications.

conda-execute can be found at github.com/pelson/conda-execute, and installed with conda install conda-execute --channel conda-forge.