SPEC 1 — Lazy Loading for Submodules

🛈  This is a draft document.

Description

Early on, most scientific Python packages explicitly imported their submodules. For example, you would be able to do:

import scipy as sp

sp.linalg.eig(...)

This was convenient: it had the simplicity of a flat namespace, but with the organization of a nested one. However, there was one drawback: importing submodules, especially large ones, introduced unacceptable slowdowns. To address the problem, most libraries stopped importing submodules and relied on documentation to tell users which submodules to import.

Commonly, code now reads:

from scipy import linalg
linalg.eig(...)

Since the linalg submodule often conflicts with similar instances in other libraries, users also write:

# Invent an arbitrary name for each submodule
import scipy.linalg as sla
sla.eig(...)

or

# Import individual functions, making it harder to know where they are from
# later on in code.
from scipy.linalg import eig
eig(...)

This SPEC proposes a lazy loading mechanism—targeted at libraries—that avoids import slowdowns and brings back explicit submodule exports, but without slowing down imports.

For example, it allows the following behavior:

import skimage as ski  # cheap operation; does not load submodules

ski.filters  # cheap operation; loads the filters submodule, but not
                 # any of its submodules or functions

ski.filters.gaussian(...)  # loads the file in which gaussian is implemented
                           # and calls the function

This has several advantages:

  1. It exposes a nested namespace that behaves as a flat namespace. This avoids carefully having to import exactly the right combination of submodules, and allows interactive exploration of the namespace in an interactive terminal.

  2. It avoids having to optimize for import cost. Currently, developers often move imports inside of functions to avoid slowing down importing their module. Lazy importing makes imports at any depth in the hierarchy cheap.

  3. It provides direct access to submodules, avoiding local namespace conflicts. Instead of doing import scipy.linalg as sla to avoid clobbering a local linalg, one can now assign a short name to each library and access its members directly: import scipy as sp; sp.linalg.

Usage

Python 3.7, with PEP 562, introduces the ability to override module __getattr__ and __dir__. In combination, these features make it possible to again provide access to submodules, but without incurring performance penalties.

We propose a utility library for easily setting up so-called “lazy imports” so that submodules are only loaded upon accessing them.

As an example, we will show how to set up lazy importing for skimage.filters. In the library’s main __init__.py, specify which submodules are lazily loaded:

__all__ = [
    ...
    'filters',
    ...
]

from .util.lazy import install_lazy
__getattr__, __dir__, _ = install_lazy(__name__, __all__)

Then, in each submodule’s __init__.py (in this case, filters/__init__.py), specify which functions are to be loaded from where:

from ..util import lazy

__getattr__, __dir__, __all__ = lazy.install_lazy(
    __name__,
    submodules=['rank']
    submod_funcs={
        '_gaussian': ['gaussian', 'difference_of_gaussians'],
        'edges': ['sobel', 'sobel_h', 'sobel_v',
                  'scharr', 'scharr_h', 'scharr_v',
                  'prewitt', 'prewitt_h', 'prewitt_v',
                  'roberts', 'roberts_pos_diag', 'roberts_neg_diag',
                  'laplace',
                  'farid', 'farid_h', 'farid_v']
    }
)

The submodule is loaded only once it is accessed:

import skimage
dir(skimage.filters)

Furthermore, the functions inside of the submodule are loaded only once they are needed:

import skimage

skimage.filters.gaussian(...)  # Lazy load `gaussian` from
                               # `skimage.filters._gaussian`

skimage.filters.rank.mean_bilateral(...)  # Loaded once `rank` is accessed

Implementation

Currently, a test implementation lives in this pull request to scikit-image—specifically, inside of lazy.py.

At this point, there exists an prototype of lazy loading, and we’re showing it to the community to uncover design flaws, discover improvements, and solicit suggestions on APIs.

Once a lazy import interface is implemented, other interesting options become available. For example, instead of specifying sub-submodules and functions the way we do above, one could do this in YAML files:

$ cat skimage/filters/init.yaml

submodules:
- rank

functions:
- _gaussian:
  - gaussian
  - difference_of_gaussians
- edges:
  - sobel
  - sobel_h
  - sobel_v
  - scharr

...

Ultimately, we hope that lazy importing will become part of Python itself. In the mean time, we now have the necessary mechanisms to implement it ourselves.

Core Project Endorsement

Ecosystem Adoption

Notes