Python Package Importing
It turns out importing packages in an editable way breaks depending on how the packages were defined. It’s the kind of bug that, after driving one insane, makes one write about it to hopefully save anyone else the same trouble.
In python, a module is just a unit of scope - an envelope in which identifiers for variables and functions and hopes and dreams live their lives. Often this scope is constructed from a single file with python code in it, but it doesn’t need to be (for example, a function maintains scope within it and could be made into a module with a little leg work, a module could be written in C code so it’s not like is has to be python per se, et cetera).
A package is a module that has a __path__
variable which tells python where to search for child modules/packages. Note that every package is a module but not every module is a package. The manual notes that every module with a __path__
attribute is a package.
Modules and packages are great. If I’m writing code and I want to move some methods out that all interact with some REST API I can just make a new file rest_api.py
and put them in there and then in my current code just
from rest_api import get_data, put_data
and it works. My code is much cleaner and easier to understand. I can take it a step further: I can make a directory rest_api
with a file __init__.py
in it. When that file exists in a directory, python treats the directory like a package where the contents of the module rest_api
is whatever happens to be in its __init__.py
. But I can also add rest_api/put.py
and rest_api/get.py
and access them via
from rest_api.put import put_data
and
from rest_api.get import get_data
respectively. We can even write a setup.py
file to make our package installable and ship it to our friends and family.
And everything is good, except…
Sometimes it makes sense to use a package as a namespace. For example, we might have a REST package for a specific API, a cloud package for accessing storage services, and a configuration management package. It is desirable to have all of these packages start with the same name, e.g. coolguy.rest
, coolguy.azure
, and coolguy.config
. Or to put it another way, we don’t want to make one giant monolithic package, we want to spread out the love across many small packages so that we can load only what we need to get the job done.
Python supports namespace packages. In fact, it supports them three different ways! And these ways are often incompatible! The pkgutil
one is oldest, superseded by pkg_resources
, which was superseded by PEP 420. PEP 420 is python 3.3+ only, whereas the other two work in python 2 or 3.
Another “feature” of python is that there’s (at least) 4 different ways to install a package. The setuptools
package allows us to python setup.py install
which installs a copy of the package into a site-packages
folder, or python setup.py develop
which installs a pointer in the site-packages
folder back to where your code is (so that you can work on the package like a developer or whatever). There’s also pip install ...
and pip install -e ...
which do respectively similar things. Depending on how you install the packages, the namespacing techniques may fail.
If you’re keeping track at home, 4 installation methods times 3 methods of namespacing times 2 versions of python… so 24 possible ways to do it. Consider incompatibilities between methods and that’s 2 times 12 choose 2 or 132 ways to fail! Some beautiful souls have put together a table of tests for the plausibly compatible cases at https://github.com/pypa/sample-namespace-packages/blob/master/table.md. I’ve reproduced their table in the Appendix here in case they pack up shop and head home.
If you’re still not convinced it’s a problem then imagine this scenario that happened to a good friend of mine: you have a legacy package that had dependencies which are part of the same namespace and you python setup.py develop
the package to work on it. Python will install the dependencies in such a way that breaks your namespace. And to fix it you either have to manually install everything one-by-one or change all of your packages. You note that pip might be able to make it work but pip won’t handle the switch from http to https on your legacy pypi server. You brood on it, it fills you with anger, you want to lash out but to what end?
What fun!
Investigating Import
There are three privileged modules that load automatically: __main__
, sys
, and builtins
(see the manual for more information). I bring this up because I’m going to use the sys
module but it’d be weird to just ignore how that module comes into existence before all the import machinery kicks off. It’s because it’s always there. If you walk across a beach and look down to see only one set of footprints in the sand, it was because __main__
, sys
, and builtins
are carrying you.
A top level __path__
exists as sys.path
that describes where to look for packages. If I import sys; print(sys.path);
in python 2 I get:
'',
[ '/usr/local/lib/python27.zip',
'/usr/local/lib/python2.7',
'/usr/local/lib/python2.7/plat-linux2',
'/usr/local/lib/python2.7/lib-tk',
'/usr/local/lib/python2.7/lib-old',
'/usr/local/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/site-packages']
and in python 3 I get:
'',
['/usr/local/lib/python37.zip',
'/usr/local/lib/python3.7',
'/usr/local/lib/python3.7/lib-dynload',
'/usr/local/lib/python3.7/site-packages']
This is instructive: the first place python looks for a module is the current directory, then inside a zip file, then in all kinds of directories. When you pip install ...
or python setup.py install
those go to site-packages
.
Sidebar: Uninstalling a python package was not obvious to me. If you pip install …
or python setup.py install
you can try pip uninstall …
to remove the package. If you do python setup.py develop
you can follow with python setup.py develop –uninstall
to remove the package from your distribution while leaving your package’s files unmolested.
The __path__
variable is almost always a list of places to look. When python 3 goes looking for what you want to import it does so using meta path finder objects. From the manual,
When the named module is not found in sys.modules, Python next searches sys.meta_path, which contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module. Meta path finders must implement a method called find_spec() which takes three arguments: a name, an import path, and (optionally) a target module.
By default, the array of meta path finder objects is
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
but we can do a fun thing and define a hook for this to watch things happen:
class MetaHooker:
def find_spec(self, name, path, prev=None):
print('Loading: {} on path {}'.format(repr(name), repr(path)))
return None
0, MetaHooker()) sys.meta_path.insert(
and now if we do import requests
we get:
Loading: 'requests' on path None
Loading: 'urllib3' on path None
Loading: '__future__' on path None
Loading: 'warnings' on path None
Loading: 'urllib3.connectionpool' on path ['env/lib/python3.6/site-packages/urllib3']
... it goes on for a while here ...
Loading: 'requests.hooks' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.auth' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.status_codes' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.api' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.sessions' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'requests.adapters' on path ['env/lib/python3.6/site-packages/requests']
Loading: 'urllib3.contrib.socks' on path ['env/lib/python3.6/site-packages/urllib3/contrib']
Loading: 'socks' on path None
In python 2 it’s a little different but I’m going to ignore it for now because the difference is unimportant for this post. Returning to the requests
package, once you import it, it will also be accessible via sys.modules['requests']
, where it is mutable. The module will have a lot of properties, including a __path__
property. I guess I should call it a package since it has a __path__
. Well, you get the idea.
Most of the time you see import statements like this
import requests
and this puts a name requests
that points to the same thing as sys.modules['requests']
does into your current scope. There is a less common variant
__import__('requests')
that loads and returns the module without adding anything to your namespace.
Abusing Imports
We can use this knowledge of __path__
to do some weird stuff with imports. Consider the following directory structure:
site1/
__init__.py
module1.py
site2/
module2.py
and let __init__.py
be:
import os
+= ['{}/../site2'.format(os.path.dirname(__file__))] __path__
Now, what would you guess would happen if I run python and do from site1 import module1, module2
? It actually works. If we inspect the sys.modules
we find:
>>> sys.modules['site1']
<module 'site1' from 'site1/__init__.py'>
>>> sys.modules['site1'].__path__
['site1', 'site1/../site2']
which is as I intended. Pretty weird, right? So, let’s look at how it actually happened. When I use my MetaHooker in python 3 and perform from site1 import module1, module2
I get:
Loading: 'site1' on path None
Loading: 'site1.module1' on path ['site1', 'site1/../site2']
Loading: 'site1.module2' on path ['site1', 'site1/../site2']
Here we see it not finding site1
in sys.modules
and so it goes searching in sys.path
. It finds site1
directory with its __init__.py
in there and loads it. Then it goes to load the children, but by now the __init__.py
has provided our weird search path so it easily finds module1
and module2
.
Let’s get crazy-bananas! I’ll now add a folder site3
with a module3.py
in it, and from python do
'site1'].__path__ += ['site3']
sys.modules[from site1 import module3
and it works. Such is the importance of __path__
. But there’s better ways to spread out package files several places, which we shall get to shortly.
Breaking Imports
So what happens when we install two packages that share a namespace without explicitly doing something to fix it? Consider the following directory structure where #
is 1 or 2
pkg#/
setup.py
pkg/
__init__.py
module#.py
and where setup.py
is
from setuptools import setup
setup(='pkg',
name='1',
version=['pkg']
packages )
When I go to pkg1
and do python setup.py install
a pkg-1.0-py3.6.egg
directory is made in env/lib/python3.6/site-packages
which is a member of sys.path
and thus is searched when we try to do an import. I can then do from pkg import module1
but from pkg import module2
will fail because it’s not installed. If I go to the pkg2
directory next and python setup.py install
then python makes noises about the package already being installed (recall that both have the same name and version) and then I can do from pkg import module2
but from pkg import module1
will fail!
What’s happening is the package is defined by the one __path__
coming from the last installed version, which ignores the other one. It’s here that the need for namespace packages arises. Let’s look at the three different methods (pkgutil
, ’pkg_resource
, and PEP 420) to gain an appreciation for PEP 420. Note that PEP 420 is accepted as the way to do imports as of python 3.3, so it’s sometimes referred to as the native method.
pkgutil Method
This method essentially just adds to the __path__
in the __init__.py
so that the children are all visible. Each __init__.py
that is for a namespace needs to have
= __import__('pkgutil').extend_path(__path__, __name__) __path__
and then the setup.py
just needs to not be zip_safe:
from setuptools import setup
setup(='pkg.moduleN',
name='1',
version=['pkg.moduleN'],
packages=False
zip_safe )
From python’s packaging namespace packages guide: > pkgutil-style namespace packages. This is recommended for new packages that need to support Python 2 and 3 and installation via both pip and python setup.py install.
pkg_resource Method
Here, each __init__.py
for a namespace should have
__import__('pkg_resources').declare_namespace(__name__)
and then the setup.py
just needs to indicate the namespace elements:
from setuptools import setup
setup(='pkg.moduleN',
name='1',
version=['pkg.moduleN'],
packages=['pkg'],
namespace_packages=False
zip_safe )
As noted later, this method is least compatible with the other methods and I’ve seen recommendations against using it. Note that I’ve personally seen legacy code that follows a historical recommendation to do the following:
try:
__import__('pkg_resources').declare_namespace(__name__)
except ImportError:
= __import__('pkgutil').extend_path(__path__, __name__) __path__
As stated in the python docs, this is a terrible fucking idea.
From python’s packaging namespace packages guide: > pkg_resources-style namespace packages. This method is recommended if you need compatibility with packages already using this method or if your package needs to be zip-safe.
Avoid this method. Scold those who use it.
PEP 420 - The Native Method
For PEP 420 style namespaces you need to not include the __init__.py
file in the namespace directory, and instead just me sure to indicate the package in the setup.py
file.
from setuptools import setup
setup(='pkg.moduleN',
name='1',
version=['pkg.moduleN'],
packages=False
zip_safe )
The downside of this method is that you need to explicitly list the packages in the packages
list. For the other two methods people like to import the find_packages
function from setuptools
that will list every package based on scanning for __init__.py
files, but that obviously doesn’t work here, and honestly using that function with namespaces is probably a bad idea because it will see your __init__.py
files from the two methods above and add them to your packages list (which could make your namespace less robust to a stray empty __init__.py
file).
From python’s packaging namespace packages guide: > Native namespace packages. This type of namespace package is defined in PEP 420 and is available in Python 3.3 and later. This is recommended if packages in your namespace only ever need to support Python 3 and installation via pip.
That’s not to say that PEP 420 is incompatible with python setup.py install
methods in modern python. Actually I don’t really know why “and installation via pip” is in there. ¯\_(ツ)_/¯
At the opening of this post I mentioned that __path__
variables were almost always lists, and here’s the exception. Instead of a list you get an object like this:
'env/lib/python3.6/site-packages/pkg']) _NamespacePath([
and instead of a normal module like either:
<module 'something' from '/somepath/__init__.py'>
<module 'something' (built-in)>
you get one that is specifically labeled as a namespace like this:
<module 'example_pkg' (namespace)>
Comparison
For all three methods, the structure of the modules in the site-packages
is almost identical, with all of the namespace parts combined into a single folder. Each separate package still retains an egg-info
, but only pkgutil
has a __init__.py
for the namespaced package. Personally, I prefer the style of PEP 420. It’s got a joie de vivre the other’s do not posses. It seems like the best thing to do is use pkgutil when python 2 compatability is required and then use PEP 420 when using python 3. The table shows that pkgutil interacts favorably with PEP 420 in python 3. Just, please, for the love of God, don’t use pkg_resource.
For all three namespace package methods in python3 I did a test where I made one of the packages have an empty __init__.py
in one of the namespace packages (and for pkg_resources
I removed the namespace_package
argument from the respective setup.py
as well). Surprisingly, all three were robust to this error and managed to properly load the namespace packages. More work could be done to determine under what conditions these errors cause the namespace package to fail, but that’s sort of daunting given how many ways there are to do things.
Appendix: Incompatability Table
Note: This is from https://github.com/pypa/sample-namespace-packages/blob/master/table.md.