The main purpose of pygram11 is to be a faster near drop-in
numpy.histogram2d() with support for uncertainties. The NumPy
functions always return the bin counts and the bin edges, while
pygram11 functions return the bin counts and the standard error on the
bin counts (if weights are not used, the second return type from
pygram11 functions will be
None). Therefore, if one only cares
about the bin counts, the libraries are completely interchangable.
These two funcion calls will provide the same result:
import numpy as np import pygram11 as pg rng = np.random.default_rng(123) x = rng.standard_normal(10000) counts1, __ = np.histogram(x, bins=20, range=(-3, 3)) counts2, __ = pg.histogram(x, bins=20, range=(-3, 3)) np.testing.assert_allclose(counts1, counts2)
If one cares about the statistical uncertainty on the bin counts, or the ability to retain under- and over-flow counts, then pygram11 is a great replacement. Checkout a blog post which describes how to recreate this behavior in pure NumPy, while pygram11 is as simple as:
data = rng.standard_normal(10000) weights = rng.uniform(0.1, 0.9, x.shape) counts, err = pg.histogram(data, bins=10, range=(-3, 3), weights=weights, flow=True)
functions in the pygram11 API are meant to provide an easy transition
from NumPy to pygram11. The next couple of sections summarize the
structure of the pygram11 API.
Core pygram11 Functions¶
pygram11 provides a simple set of functions for calculating histograms:
Histogram data with fixed (uniform) bin widths.
Histogram data with multiple weight variations and fixed width bins.
Histogram data with variable bin widths.
Histogram data with multiple weight variations and variable width bins.
Histogram two dimensional data with fixed (uniform) binning.
Histogram two dimensional data with variable width binning.
You’ll see that the API specific to pygram11 is a bit more specialized than the NumPy histogramming API (shown below).
Histogramming a normal distribution:
>>> rng = np.random.default_rng(123) >>> h, __ = pygram11.fix1d(rng.standard_normal(10000), bins=25, range=(-3, 3))
See the API reference for more examples.
For convenience a NumPy-like API is also provided (not one-to-one, see the API reference).
Conversions between NumPy array types can take some time when calculating histograms.
In : import numpy as np In : import pygram11 as pg In : rng = np.random.default_rng(123) In : x = rng.standard_normal(2_000_000) In : %timeit pg.histogram(x, bins=30, range=(-4, 4)) 1.95 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In : %timeit pg.histogram(x.astype(np.float32), bins=30, range=(-4, 4)) 2.33 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can see the type conversion increases this calculation time by
about 20%. The back-end C++ functions prohibit type conversions of the
input data. If an array with an unsupported
passed to pygram11, a
TypeError will be rasied. Supported
numpy.dtype’s for data are:
and for weights:
For small datasets OpenMP acceleration introduces unncessary overhead. Or, if you’re using the pygram11 API in cluster workflows (like with Dask), you have your threads committed to higher level abstractions.
By default, the C++ back-end utilizes OpenMP parallel loops if the
data size is above a threshold for a respective histogramming
situation. These thresholds are 10,000 for fixed width histograms and
5,000 for variable width histograms. The thresholds can be configured
in a granular way with the
The parameters are:
Low level reading/writing is handled through two functions:
Retrieve a configuration value given a key.
Set a configuration key's value.
If you have specific thresholds in mind,
pygram11.config.set() is the recommended interface.
The recommended entry points for controlling OpenMP acceleration in an on/off switch way are through the provided context managers and decorators (if we want to force OpenMP acceleration, we set the thresholds to zero; if we want to disable OpenMP acceleration, we set the thresholds to sys.maxsize).
Context manager to disable OpenMP.
Context manager to force enable OpenMP.
Wrap a function to disable OpenMP while it's called.
Wrap a function to always enable OpenMP while it's called.
The context manager and decorator APIs provide an interface that
executes temporary adjustments to the thresholds that live during
specific code blocks or for entire function calls. For example, we can
disable a specific threshold during a
call with the
pygram11.omp_disabled() context manager:
import pygram11 import numpy as np rng = np.random.default_rng(123) x = rng.standard_normal(50_000) with omp_disabled(key="thresholds.fix1d"): result = pygram11.histogram(x, bins=50, range=(-3, 3))
or we can decorate a function to disable OpenMP during its use:
import pygram11 import numpy as np @pygram11.without_omp def hist(): rng = np.random.default_rng(123) x = rng.standard_normal(50_000) return pygram11.histogram(x, bins=50, range=(-3, 3))
If the key argument is not provided, all thresholds will be temporarily modified.
An example of threshold modification via the granular interface:
>>> import pygram11 >>> import pygram11.config >>> import numpy as np >>> rng = np.random.default_rng(123) >>> x = rng.standard_uniform(6000) >>> bins = np.array([-3.1, -2.5, -2.0, 0.1, 0.2, 2.1, 3.0]) >>> result = pygram11.histogram(x, bins=bins) # will use OpenMP >>> pygram11.config.set("thresholds.var1d", 7500) >>> result = pygram11.histogram(x, bins=bins) # now will _not_ use OpenMP
Some shortcuts exist to completely disable or enable OpenMP, along with returning to the defaults: