Building a matrix of conda distributions with conda-build-all

2015

December

9

Tags: conda (2)

Introducing conda-build-all, a tool which extends conda-build to provide powerful build matrix capabilities.

Repositories such as conda-forge/stages-recipes, SciTools/conda-recipes-scitools and ioos/conda-recipes exist to provide a set of conda recipes, and ultimately, channels from which users can access the product of conda-build-ing those recipes. The build phase of all of these repositories looks very similar: a tool (ObviousCI) computes the build matrix, builds those distributions which haven't already been built, and then uploads them to their respective channels. The functionality is tried and tested, and has been powering these repositories for over a year with huge success, however, I recently had need to use this functionality without wanting to upload the built distributions to conda.anaconda.org and found the tool didn't quite fit the bill. Additionally, having originally cobbled together ObviousCI with string and sticky-tape to prove the concept of a continuously integrated repo of recipes, I didn't have huge confidence in its ability to function between python/conda/conda-build upgrades.

As a result, I have re-factored the build part of ObviousCI into a general purpose library which can now be used for the original "conda recipe repository" use-case as much as it can for the ad-hoc "just build this" use-case. Critically, the most significant part of this re-factoring was adding a huge array of unit and integration tests which can be used to ensure expected behaviour is unchanged through future dependency version upgrades.

The new CLI is conda-build-all (BSD license) and is developed at SciTools/conda-build-all.

The build matrix

So what does a build matrix actually look like? Let's jump in at the deep-end and look at a package which has a NumPy C-API dependency. Whilst NumPy's ABI is (intended to be) forward-compatible, in practice it is safer to compile against a specific version and "pin" the distribution to that version. Essentially, that means we need to build our recipe N times, where N is the number of NumPy versions we wish to support. Of course, the same is true for Python itself, leading to a permutation problem of up to NxM builds (N: number of supported NumPy versions; M: number of supported Python versions).

The current conda recipe form for such a package looks like:

package:
    name: my_library
    version: 1.0
requirements:
    build:
        - python
        - numpy x.x
    run:
        - python
        - numpy x.x

Whilst I believe there is room for improvement in the recipe definition, it is still pretty easy to define a complex set of build- and run-time dependencies.

With the existing conda-build tool, should we want to build this for Python 2.7, 3.4 and 3.5, and against NumPy 1.9 and 1.10 (the latest versions of these libraries at the time of writing), things can get a little tedious:

CONDA_PY=27 CONDA_NPY=19 conda build my_library
CONDA_PY=34 CONDA_NPY=19 conda build my_library
CONDA_PY=35 CONDA_NPY=19 conda build my_library
CONDA_PY=27 CONDA_NPY=110 conda build my_library
CONDA_PY=34 CONDA_NPY=110 conda build my_library
CONDA_PY=35 CONDA_NPY=110 conda build my_library

With conda-build-all the special environment variables are taken care of for you (and importantly there is future scope to generalise beyond Python & NumPy) and a build matrix is computed:

$ conda-build-all my_library
Resolving distributions from 1 recipes... 
Computed that there are 7 distributions from the 1 recipes:
Resolved dependencies, will be built in the following order: 
    my_library-1.0-np19py26_0 (will be built: True)
    my_library-1.0-np110py27_0 (will be built: True)
    my_library-1.0-np19py27_0 (will be built: True)
    my_library-1.0-np110py34_0 (will be built: True)
    my_library-1.0-np19py34_0 (will be built: True)
    my_library-1.0-np110py35_0 (will be built: True)
    my_library-1.0-np19py35_0 (will be built: True)

Notice how this command is not conceptually equivalent to the original conda-build calls as I have not asked for particular versions to build against. conda-build-all has chosen the top two major versions and within those, the top two minor versions of the packages which require "pinning". Unfortunately, that included Python 2.6, which I didn't really want - to resolve that, we can add extra conditions to our build:

$ conda-build-all my_library --matrix-conditions "python >=2.7"
Fetching package metadata: ........
Resolving distributions from 1 recipes... 
Computed that there are 6 distributions from the 1 recipes:
Resolved dependencies, will be built in the following order: 
    my_library-1.0-np110py27_0 (will be built: True)
    my_library-1.0-np19py27_0 (will be built: True)
    my_library-1.0-np110py34_0 (will be built: True)
    my_library-1.0-np19py34_0 (will be built: True)
    my_library-1.0-np110py35_0 (will be built: True)
    my_library-1.0-np19py35_0 (will be built: True)

We now have functionally equivalent behaviour that will move forwards as new Python and NumPy versions become available.

Building multiple recipes in a single call

conda-build-all knows what a conda recipe looks like, and will traverse the directories you give it to look for things to build.

Supposing we have a directory of recipes which we wish to build, such as the following:

$ find * -name meta.yaml -exec sh -c "echo RECIPE: {}; cat {}; echo" \;
RECIPE: my_recipes_directory/my_library/meta.yaml
package:
    name: my_library
    version: 1.0
requirements:
    build:
        - python
        - numpy x.x
    run:
        - python
        - numpy x.x

RECIPE: my_recipes_directory/my_other_library/meta.yaml
package:
    name: my_other_library
    version: 1.0
requirements:
    build:
        - python
        - numpy x.x
    run:
        - python
        - numpy x.x

We can simply call conda-build-all on the directory of recipes to have them built appropriately:

$ conda-build-all my_recipes_directory --matrix-conditions "python 2.7.*|3.5.*"
Fetching package metadata: ........
Resolving distributions from 2 recipes... 
Computed that there are 8 distributions from the 2 recipes:
Resolved dependencies, will be built in the following order: 
    my_library-1.0-np110py27_0 (will be built: True)
    my_library-1.0-np19py27_0 (will be built: True)
    my_library-1.0-np110py35_0 (will be built: True)
    my_library-1.0-np19py35_0 (will be built: True)
    my_other_library-1.0-np110py27_0 (will be built: True)
    my_other_library-1.0-np19py27_0 (will be built: True)
    my_other_library-1.0-np110py35_0 (will be built: True)
    my_other_library-1.0-np19py35_0 (will be built: True)

This functionality becomes invaluable when we wish to build many packages, such is the case for the conda-recipes repositories mentioned earlier.

Only building the missing distributions

The build matrix is supremely useful, but it does come at the cost of the extra time needed to build the many distributions. With repositories full of recipes, it is easy to come to hundreds of build matrix items. If we want to be able to run conda-build-all on a regular basis, we can't reasonably expect to build each of those items each time. Therefore, conda-build-all has the ability to inspect various locations to determine if a distribution has already been built. In fact, the default behaviour is to inspect the local conda-build directory to determine if a distribution has already been built locally. Other options include the ability to inspect conda channels as well as arbitrary local directories. Supposing we wanted the pelson/channel/testing channel to have all of the built distributions from my_recipes_directory, we can use conda-build-all to good effect:

conda-build-all my_recipes_directory/ --matrix-conditions "python 2.7.*|3.5.*" \
    --inspect-channels "pelson/channel/testing" \
    --upload-channels "pelson/channel/testing" \
    --no-inspect-conda-bld-directory

Summary

conda-build-all is a tool which builds on top of conda-build to give powerful build-matrix options when building conda distributions. It has come from ObviousCI, whose primary objective was to simplify the build and upload of many recipes in a Continuous Integration environment. In migrating the code base from ObviousCI several new test strategies have been developed - making conda-build-all easier to maintain, and giving rise to the possibility of improving the conda and conda-build test suites themselves.