Namespaces - A Python Packages Predicament

2018-08-04 By Edgar Hassler

In python, a module is just a unit of scope - an envelope in which identifiers for variables and functions and hopes and dreams live their lives. Often this scope is constructed from a single file with python code in it, but it doesn't need to be (for example, a function maintains scope within it and could be made into a module with a little leg work, a module could be written in C code so it's not like is has to be python per se, et cetera).

A package is a module that has a __path__ variable which tells python where to search for child modules/packages. Note that every package is a module but not every module is a package. The manual notes that every module with a __path__ attribute is a package.

Modules and packages are great. If I'm writing code and I want to move some methods out that all interact with some REST API I can just make a new file rest_api.py and put them in there and then in my current code just

from rest_api import get_data, put_data

and it works. My code is much cleaner and easier to understand. I can take it a step further: I can make a directory rest_api with a file __init__.py in it. When that file exists in a directory, python treats the directory like a package where the contents of the module rest_api is whatever happens to be in its __init__.py. But I can also add rest_api/put.py and rest_api/get.py and access them via

from rest_api.put import put_data

and

from rest_api.get import get_data

respectively. We can even write a setup.py file to make our package installable and ship it to our friends and family.

And everything is good, except...

Sometimes it makes sense to use a package as a namespace. For example, we might have a REST package for a specific API, a cloud package for accessing storage services, and a configuration management package. It is desirable to have all of these packages start with the same name, e.g. coolguy.rest, coolguy.azure, and coolguy.config. Or to put it another way, we don't want to make one giant monolithic package, we want to spread out the love across many small packages so that we can load only what we need to get the job done.

Python supports namespace packages. In fact, it supports them three different ways! And these ways are often incompatible! The pkgutil one is oldest, superseded by pkg_resources, which was superseded by PEP 420. PEP 420 is python 3.3+ only, whereas the other two work in python 2 or 3.

Another "feature" of python is that there's (at least) 4 different ways to install a package. The setuptools package allows us to python setup.py install which installs a copy of the package into a site-packages folder, or python setup.py develop which installs a pointer in the site-packages folder back to where your code is (so that you can work on the package like a developer or whatever). There's also pip install ... and pip install -e ... which do respectively similar things. Depending on how you install the packages, the namespacing techniques may fail.

If you're keeping track at home, 4 installation methods times 3 methods of namespacing times 2 versions of python... so 24 possible ways to do it. Consider incompatibilities between methods and that's 2 times 12 choose 2 or 132 ways to fail! Some beautiful souls have put together a table of tests for the plausibly compatible cases at https://github.com/pypa/sample-namespace-packages/blob/master/table.md. I've reproduced their table in the Appendix here in case they pack up shop and head home.

If you're still not convinced it's a problem then imagine this scenario that happened to a good friend of mine: you have a legacy package that had dependencies which are part of the same namespace and you python setup.py develop the package to work on it. Python will install the dependencies in such a way that breaks your namespace. And to fix it you either have to manually install everything one-by-one or change all of your packages. You note that pip might be able to make it work but pip won't handle the switch from http to https on your legacy pypi server. You brood on it, it fills you with anger, you want to lash out but to what end?

What fun!

Investigating Import

There are three privileged modules that load automatically: __main__, sys, and builtins (see the manual for more information). I bring this up because I'm going to use the sys module but it'd be weird to just ignore how that module comes into existence before all the import machinery kicks off. It's because it's always there. If you walk across a beach and look down to see only one set of footprints in the sand, it was because __main__, sys, and builtins are carrying you.

A top level __path__ exists as sys.path that describes where to look for packages. If I import sys; print(sys.path); in python 2 I get:

[   '', 
    '/usr/local/lib/python27.zip', 
    '/usr/local/lib/python2.7', 
    '/usr/local/lib/python2.7/plat-linux2', 
    '/usr/local/lib/python2.7/lib-tk', 
    '/usr/local/lib/python2.7/lib-old',     
    '/usr/local/lib/python2.7/lib-dynload', 
    '/usr/local/lib/python2.7/site-packages']

and in python 3 I get:

['',
 '/usr/local/lib/python37.zip',
 '/usr/local/lib/python3.7',
 '/usr/local/lib/python3.7/lib-dynload',
 '/usr/local/lib/python3.7/site-packages']

This is instructive: the first place python looks for a module is the current directory, then inside a zip file, then in all kinds of directories. When you pip install ... or python setup.py install those go to site-packages.

Sidebar: Uninstalling a python package was not obvious to me. If you pip install … or python setup.py install you can try pip uninstall … to remove the package. If you do python setup.py develop you can follow with python setup.py develop --uninstall to remove the package from your distribution while leaving your package's files unmolested.

The __path__ variable is almost always a list of places to look. When python 3 goes looking for what you want to import it does so using meta path finder objects. From the manual,

When the named module is not found in sys.modules, Python next searches sys.meta_path, which contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module. Meta path finders must implement a method called find_spec() which takes three arguments: a name, an import path, and (optionally) a target module.

By default, the array of meta path finder objects is

[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]

but we can do a fun thing and define a hook for this to watch things happen:

class MetaHooker:
    def find_spec(self, name, path, prev=None):
        print('Loading: {} on path {}'.format(repr(name), repr(path)))
        return None

sys.meta_path.insert(0, MetaHooker())

and now if we do import requests we get:

Loading: 'requests' on path None
Loading: 'urllib3' on path None
Loading: '__future__' on path None
Loading: 'warnings' on path None
Loading: 'urllib3.connectionpool' on path ['env/lib/python3.6/site-packages/urllib3']

... it goes on for a while here ...

Loading: 'requests.hooks' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.auth' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.status_codes' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.api' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.sessions' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.adapters' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'urllib3.contrib.socks' on path ['env/lib/python3.6/site-packages/urllib3/contrib']
Loading: 'socks' on path None

In python 2 it's a little different but I'm going to ignore it for now because the difference is unimportant for this post. Returning to the requests package, once you import it, it will also be accessible via sys.modules['requests'], where it is mutable. The module will have a lot of properties, including a __path__ property. I guess I should call it a package since it has a __path__. Well, you get the idea.

Most of the time you see import statements like this

import requests

and this puts a name requests that points to the same thing as sys.modules['requests'] does into your current scope. There is a less common variant

__import__('requests')

that loads and returns the module without adding anything to your namespace.

Abusing Imports

We can use this knowledge of __path__ to do some weird stuff with imports. Consider the following directory structure:

site1/
  __init__.py
  module1.py
site2/
  module2.py

and let __init__.py be:

import os
__path__ += ['{}/../site2'.format(os.path.dirname(__file__))]

Now, what would you guess would happen if I run python and do from site1 import module1, module2? It actually works. If we inspect the sys.modules we find:

>>> sys.modules['site1']
<module 'site1' from 'site1/__init__.py'>
>>> sys.modules['site1'].__path__
['site1', 'site1/../site2']

which is as I intended. Pretty weird, right? So, let's look at how it actually happened. When I use my MetaHooker in python 3 and perform from site1 import module1, module2 I get:

Loading: 'site1' on path None
Loading: 'site1.module1' on path ['site1', 'site1/../site2']
Loading: 'site1.module2' on path ['site1', 'site1/../site2']

Here we see it not finding site1 in sys.modules and so it goes searching in sys.path. It finds site1 directory with its __init__.py in there and loads it. Then it goes to load the children, but by now the __init__.py has provided our weird search path so it easily finds module1 and module2.

Let's get crazy-bananas! I'll now add a folder site3 with a module3.py in it, and from python do

sys.modules['site1'].__path__ += ['site3']
from site1 import module3

and it works. Such is the importance of __path__. But there's better ways to spread out package files several places, which we shall get to shortly.

Breaking Imports

So what happens when we install two packages that share a namespace without explicitly doing something to fix it? Consider the following directory structure where # is 1 or 2

pkg#/
    setup.py
    pkg/
        __init__.py
        module#.py

and where setup.py is

from setuptools import setup
setup(
    name='pkg',
    version='1',
    packages=['pkg']
)

When I go to pkg1 and do python setup.py install a pkg-1.0-py3.6.egg directory is made in env/lib/python3.6/site-packages which is a member of sys.path and thus is searched when we try to do an import. I can then do from pkg import module1 but from pkg import module2 will fail because it's not installed. If I go to the pkg2 directory next and python setup.py install then python makes noises about the package already being installed (recall that both have the same name and version) and then I can do from pkg import module2 but from pkg import module1 will fail!

What's happening is the package is defined by the one __path__ coming from the last installed version, which ignores the other one. It's here that the need for namespace packages arises. Let's look at the three different methods (pkgutil, 'pkg_resource, and PEP 420) to gain an appreciation for PEP 420. Note that PEP 420 is accepted as the way to do imports as of python 3.3, so it's sometimes referred to as the native method.

pkgutil Method

This method essentially just adds to the __path__ in the __init__.py so that the children are all visible. Each __init__.py that is for a namespace needs to have

__path__ = __import__('pkgutil').extend_path(__path__, __name__)

and then the setup.py just needs to not be zip_safe:

from setuptools import setup

setup(
    name='pkg.moduleN',
    version='1',
    packages=['pkg.moduleN'],
    zip_safe=False
)

From python's packaging namespace packages guide:

pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both pip and python setup.py install.

pkg_resource Method

Here, each __init__.py for a namespace should have

__import__('pkg_resources').declare_namespace(__name__)

and then the setup.py just needs to indicate the namespace elements:

from setuptools import setup

setup(
    name='pkg.moduleN',
    version='1',
    packages=['pkg.moduleN'],
    namespace_packages=['pkg'],
    zip_safe=False
)

As noted later, this method is least compatible with the other methods and I've seen recommendations against using it. Note that I've personally seen legacy code that follows a historical recommendation to do the following:

try:
    __import__('pkg_resources').declare_namespace(__name__)
except ImportError:
    __path__ = __import__('pkgutil').extend_path(__path__, __name__)

As stated in the python docs, this is a terrible fucking idea.

From python's packaging namespace packages guide:

pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this method or if your package needs to be zip-safe.

Avoid this method. Scold those who use it.

PEP 420 - The Native Method

For PEP 420 style namespaces you need to not include the __init__.py file in the namespace directory, and instead just me sure to indicate the package in the setup.py file.

from setuptools import setup

setup(
    name='pkg.moduleN',
    version='1',
    packages=['pkg.moduleN'],
    zip_safe=False
)

The downside of this method is that you need to explicitly list the packages in the packages list. For the other two methods people like to import the find_packages function from setuptools that will list every package based on scanning for __init__.py files, but that obviously doesn't work here, and honestly using that function with namespaces is probably a bad idea because it will see your __init__.py files from the two methods above and add them to your packages list (which could make your namespace less robust to a stray empty __init__.py file).

From python's packaging namespace packages guide:

Native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is recommended if packages in your namespace only ever need to support Python 3 and installation via pip.

That's not to say that PEP 420 is incompatible with python setup.py install methods in modern python. Actually I don't really know why "and installation via pip" is in there. ¯\_(ツ)_/¯

At the opening of this post I mentioned that __path__ variables were almost always lists, and here's the exception. Instead of a list you get an object like this:

_NamespacePath(['env/lib/python3.6/site-packages/pkg'])

and instead of a normal module like either:

<module 'something' from '/somepath/__init__.py'>
<module 'something' (built-in)>

you get one that is specifically labeled as a namespace like this:

<module 'example_pkg' (namespace)>

Comparison

For all three methods, the structure of the modules in the site-packages is almost identical, with all of the namespace parts combined into a single folder. Each separate package still retains an egg-info, but only pkgutil has a __init__.py for the namespaced package. Personally, I prefer the style of PEP 420. It's got a joie de vivre the other's do not posses. It seems like the best thing to do is use pkgutil when python 2 compatability is required and then use PEP 420 when using python 3. The table shows that pkgutil interacts favorably with PEP 420 in python 3. Just, please, for the love of God, don't use pkg_resource.

For all three namespace package methods in python3 I did a test where I made one of the packages have an empty __init__.py in one of the namespace packages (and for pkg_resources I removed the namespace_package argument from the respective setup.py as well). Surprisingly, all three were robust to this error and managed to properly load the namespace packages. More work could be done to determine under what conditions these errors cause the namespace package to fail, but that's sort of daunting given how many ways there are to do things.

Appendix: Incompatability Table

Note: This is from https://github.com/pypa/sample-namespace-packages/blob/master/table.md.

Tool Version(s)
python 2.7.15, 3.6.5
setuptools 39.1.0
pip 10.0.1
wheel 0.31.0
Type Interpreter Package A command Package B command Status
cross_pep420_pkgutil python2 pip install . pip install .
cross_pep420_pkgutil python2 pip install . pip install -e .
cross_pep420_pkgutil python2 pip install . python setup.py install
cross_pep420_pkgutil python2 pip install . python setup.py develop
cross_pep420_pkgutil python2 pip install -e . pip install .
cross_pep420_pkgutil python2 pip install -e . pip install -e .
cross_pep420_pkgutil python2 pip install -e . python setup.py install
cross_pep420_pkgutil python2 pip install -e . python setup.py develop
cross_pep420_pkgutil python2 python setup.py install pip install .
cross_pep420_pkgutil python2 python setup.py install pip install -e .
cross_pep420_pkgutil python2 python setup.py install python setup.py install
cross_pep420_pkgutil python2 python setup.py install python setup.py develop
cross_pep420_pkgutil python2 python setup.py develop pip install .
cross_pep420_pkgutil python2 python setup.py develop pip install -e .
cross_pep420_pkgutil python2 python setup.py develop python setup.py install
cross_pep420_pkgutil python2 python setup.py develop python setup.py develop
cross_pep420_pkgutil python3 pip install . pip install .
cross_pep420_pkgutil python3 pip install . pip install -e .
cross_pep420_pkgutil python3 pip install . python setup.py install
cross_pep420_pkgutil python3 pip install . python setup.py develop
cross_pep420_pkgutil python3 pip install -e . pip install .
cross_pep420_pkgutil python3 pip install -e . pip install -e .
cross_pep420_pkgutil python3 pip install -e . python setup.py install
cross_pep420_pkgutil python3 pip install -e . python setup.py develop
cross_pep420_pkgutil python3 python setup.py install pip install .
cross_pep420_pkgutil python3 python setup.py install pip install -e .
cross_pep420_pkgutil python3 python setup.py install python setup.py install
cross_pep420_pkgutil python3 python setup.py install python setup.py develop
cross_pep420_pkgutil python3 python setup.py develop pip install .
cross_pep420_pkgutil python3 python setup.py develop pip install -e .
cross_pep420_pkgutil python3 python setup.py develop python setup.py install
cross_pep420_pkgutil python3 python setup.py develop python setup.py develop
cross_pkg_resources_pkgutil python2 pip install . pip install .
cross_pkg_resources_pkgutil python2 pip install . pip install -e .
cross_pkg_resources_pkgutil python2 pip install . python setup.py install
cross_pkg_resources_pkgutil python2 pip install . python setup.py develop
cross_pkg_resources_pkgutil python2 pip install -e . pip install .
cross_pkg_resources_pkgutil python2 pip install -e . pip install -e .
cross_pkg_resources_pkgutil python2 pip install -e . python setup.py install
cross_pkg_resources_pkgutil python2 pip install -e . python setup.py develop
cross_pkg_resources_pkgutil python2 python setup.py install pip install .
cross_pkg_resources_pkgutil python2 python setup.py install pip install -e .
cross_pkg_resources_pkgutil python2 python setup.py install python setup.py install
cross_pkg_resources_pkgutil python2 python setup.py install python setup.py develop
cross_pkg_resources_pkgutil python2 python setup.py develop pip install .
cross_pkg_resources_pkgutil python2 python setup.py develop pip install -e .
cross_pkg_resources_pkgutil python2 python setup.py develop python setup.py install
cross_pkg_resources_pkgutil python2 python setup.py develop python setup.py develop
cross_pkg_resources_pkgutil python3 pip install . pip install .
cross_pkg_resources_pkgutil python3 pip install . pip install -e .
cross_pkg_resources_pkgutil python3 pip install . python setup.py install
cross_pkg_resources_pkgutil python3 pip install . python setup.py develop
cross_pkg_resources_pkgutil python3 pip install -e . pip install .
cross_pkg_resources_pkgutil python3 pip install -e . pip install -e .
cross_pkg_resources_pkgutil python3 pip install -e . python setup.py install
cross_pkg_resources_pkgutil python3 pip install -e . python setup.py develop
cross_pkg_resources_pkgutil python3 python setup.py install pip install .
cross_pkg_resources_pkgutil python3 python setup.py install pip install -e .
cross_pkg_resources_pkgutil python3 python setup.py install python setup.py install
cross_pkg_resources_pkgutil python3 python setup.py install python setup.py develop
cross_pkg_resources_pkgutil python3 python setup.py develop pip install .
cross_pkg_resources_pkgutil python3 python setup.py develop pip install -e .
cross_pkg_resources_pkgutil python3 python setup.py develop python setup.py install
cross_pkg_resources_pkgutil python3 python setup.py develop python setup.py develop
pep420 python2 pip install . pip install .
pep420 python2 pip install . pip install -e .
pep420 python2 pip install . python setup.py install
pep420 python2 pip install . python setup.py develop
pep420 python2 pip install -e . pip install .
pep420 python2 pip install -e . pip install -e .
pep420 python2 pip install -e . python setup.py install
pep420 python2 pip install -e . python setup.py develop
pep420 python2 python setup.py install pip install .
pep420 python2 python setup.py install pip install -e .
pep420 python2 python setup.py install python setup.py install
pep420 python2 python setup.py install python setup.py develop
pep420 python2 python setup.py develop pip install .
pep420 python2 python setup.py develop pip install -e .
pep420 python2 python setup.py develop python setup.py install
pep420 python2 python setup.py develop python setup.py develop
pep420 python3 pip install . pip install .
pep420 python3 pip install . pip install -e .
pep420 python3 pip install . python setup.py install
pep420 python3 pip install . python setup.py develop
pep420 python3 pip install -e . pip install .
pep420 python3 pip install -e . pip install -e .
pep420 python3 pip install -e . python setup.py install
pep420 python3 pip install -e . python setup.py develop
pep420 python3 python setup.py install pip install .
pep420 python3 python setup.py install pip install -e .
pep420 python3 python setup.py install python setup.py install
pep420 python3 python setup.py install python setup.py develop
pep420 python3 python setup.py develop pip install .
pep420 python3 python setup.py develop pip install -e .
pep420 python3 python setup.py develop python setup.py install
pep420 python3 python setup.py develop python setup.py develop
pkg_resources python2 pip install . pip install .
pkg_resources python2 pip install . pip install -e .
pkg_resources python2 pip install . python setup.py install
pkg_resources python2 pip install . python setup.py develop
pkg_resources python2 pip install -e . pip install .
pkg_resources python2 pip install -e . pip install -e .
pkg_resources python2 pip install -e . python setup.py install
pkg_resources python2 pip install -e . python setup.py develop
pkg_resources python2 python setup.py install pip install .
pkg_resources python2 python setup.py install pip install -e .
pkg_resources python2 python setup.py install python setup.py install
pkg_resources python2 python setup.py install python setup.py develop
pkg_resources python2 python setup.py develop pip install .
pkg_resources python2 python setup.py develop pip install -e .
pkg_resources python2 python setup.py develop python setup.py install
pkg_resources python2 python setup.py develop python setup.py develop
pkg_resources python3 pip install . pip install .
pkg_resources python3 pip install . pip install -e .
pkg_resources python3 pip install . python setup.py install
pkg_resources python3 pip install . python setup.py develop
pkg_resources python3 pip install -e . pip install .
pkg_resources python3 pip install -e . pip install -e .
pkg_resources python3 pip install -e . python setup.py install
pkg_resources python3 pip install -e . python setup.py develop
pkg_resources python3 python setup.py install pip install .
pkg_resources python3 python setup.py install pip install -e .
pkg_resources python3 python setup.py install python setup.py install
pkg_resources python3 python setup.py install python setup.py develop
pkg_resources python3 python setup.py develop pip install .
pkg_resources python3 python setup.py develop pip install -e .
pkg_resources python3 python setup.py develop python setup.py install
pkg_resources python3 python setup.py develop python setup.py develop
pkgutil python2 pip install . pip install .
pkgutil python2 pip install . pip install -e .
pkgutil python2 pip install . python setup.py install
pkgutil python2 pip install . python setup.py develop
pkgutil python2 pip install -e . pip install .
pkgutil python2 pip install -e . pip install -e .
pkgutil python2 pip install -e . python setup.py install
pkgutil python2 pip install -e . python setup.py develop
pkgutil python2 python setup.py install pip install .
pkgutil python2 python setup.py install pip install -e .
pkgutil python2 python setup.py install python setup.py install
pkgutil python2 python setup.py install python setup.py develop
pkgutil python2 python setup.py develop pip install .
pkgutil python2 python setup.py develop pip install -e .
pkgutil python2 python setup.py develop python setup.py install
pkgutil python2 python setup.py develop python setup.py develop
pkgutil python3 pip install . pip install .
pkgutil python3 pip install . pip install -e .
pkgutil python3 pip install . python setup.py install
pkgutil python3 pip install . python setup.py develop
pkgutil python3 pip install -e . pip install .
pkgutil python3 pip install -e . pip install -e .
pkgutil python3 pip install -e . python setup.py install
pkgutil python3 pip install -e . python setup.py develop
pkgutil python3 python setup.py install pip install .
pkgutil python3 python setup.py install pip install -e .
pkgutil python3 python setup.py install python setup.py install
pkgutil python3 python setup.py install python setup.py develop
pkgutil python3 python setup.py develop pip install .
pkgutil python3 python setup.py develop pip install -e .
pkgutil python3 python setup.py develop python setup.py install
pkgutil python3 python setup.py develop python setup.py develop
In [ ]: