Quick Start

Jumping In

The main purpose of pygram11 is to be a faster near drop-in replacement of numpy.histogram() and numpy.histogram2d() with support for uncertainties. The NumPy functions always return the bin counts and the bin edges, while pygram11 functions return the bin counts and the standard error on the bin counts (if weights are not used, the second return type from pygram11 functions will be None). Therefore, if one only cares about the bin counts, the libraries are completely interchangable.

These two funcion calls will provide the same result:

import numpy as np
import pygram11 as pg
rng = np.random.default_rng(123)
x = rng.standard_normal(10000)
counts1, __ = np.histogram(x, bins=20, range=(-3, 3))
counts2, __ = pg.histogram(x, bins=20, range=(-3, 3))
np.testing.assert_allclose(counts1, counts2)

If one cares about the statistical uncertainty on the bin counts, or the ability to retain under- and over-flow counts, then pygram11 is a great replacement. Checkout a blog post which describes how to recreate this behavior in pure NumPy, while pygram11 is as simple as:

data = rng.standard_normal(10000)
weights = rng.uniform(0.1, 0.9, x.shape[0])
counts, err = pg.histogram(data, bins=10, range=(-3, 3), weights=weights, flow=True)

The pygram11.histogram() and pygram11.histogram2d() functions in the pygram11 API are meant to provide an easy transition from NumPy to pygram11. The next couple of sections summarize the structure of the pygram11 API.

Core pygram11 Functions

pygram11 provides a simple set of functions for calculating histograms:

pygram11.fix1d(x[, bins, range, weights, …])

Histogram data with fixed (uniform) bin widths.

pygram11.fix1dmw(x, weights[, bins, range, flow])

Histogram data with multiple weight variations and fixed width bins.

pygram11.var1d(x, bins[, weights, density, flow])

Histogram data with variable bin widths.

pygram11.var1dmw(x, weights, bins[, flow])

Histogram data with multiple weight variations and variable width bins.

pygram11.fix2d(x, y[, bins, range, weights, …])

Histogram the x, y data with fixed (uniform) binning.

pygram11.var2d(x, y, xbins, ybins[, …])

Histogram the x, y data with variable width binning.

You’ll see that the API specific to pygram11 is a bit more specialized than the NumPy histogramming API (shown below).

Histogramming a normal distribution:

>>> rng = np.random.default_rng(123)
>>> h, __ = pygram11.fix1d(rng.standard_normal(10000), bins=25, range=(-3, 3))

See the API reference for more examples.

NumPy-like Functions

For convenience a NumPy-like API is also provided (not one-to-one, see the API reference).

pygram11.histogram(x[, bins, range, …])

Histogram data in one dimension.

pygram11.histogram2d(x, y[, bins, range, …])

Histogram data in two dimensions.

Supported Types

Conversions between NumPy array types can take some time when calculating histograms.

In [1]: import numpy as np

In [2]: import pygram11 as pg

In [3]: rng = np.random.default_rng(123)

In [4]: x = rng.standard_normal(2_000_000)

In [5]: %timeit pg.histogram(x, bins=30, range=(-4, 4))
1.95 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit pg.histogram(x.astype(np.float32), bins=30, range=(-4, 4))
2.33 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can see the type conversion increases this calculation time by about 20%. The backend C++ functions prohibit type conversions of the input data. If an array with an unsupported numpy.dtype is passed to pygram11, a TypeError will be rasied. Supported numpy.dtype’s for data are:

  • numpy.float64 (a C/C++ double)

  • numpy.int64 (a C/C++ int64_t)

  • numpy.uint64 (a C/C++ uint64_t)

  • numpy.float32 (a C/C++ float)

  • numpy.int32 (a C/C++ int32_t

  • numpy.uint32 (a C/C++ uint32_t)

and for weights:

  • numpy.float64

  • numpy.float32

OpenMP Configuration

For small datasets OpenMP acceleration introduced unncessary overhead. The C++ backend utilizes OpenMP parallel loops if the data size is above a threshold for a respective histogramming situation. By default these thresholds are 10,000 for fixed width histograms and 5,000 for variable width histograms. The thresholds can be configured with dynamic variables in the pygram11 module:

  • FIXED_WIDTH_PARALLEL_THRESHOLD_1D

  • FIXED_WIDTH_PARALLEL_THRESHOLD_2D

  • FIXED_WIDTH_MW_PARALLEL_THRESHOLD_1D

  • VARIABLE_WIDTH_PARALLEL_THRESHOLD_1D

  • VARIABLE_WIDTH_PARALLEL_THRESHOLD_2D

  • VARIABLE_WIDTH_MW_PARALLEL_THRESHOLD_1D

An example changing the threshold:

>>> import pygram11
>>> import numpy as np
>>> rng = np.random.default_rng(123)
>>> x = rng.standard_uniform(6000)
>>> bins = np.array([-3.1, -2.5, -2.0, 0.1, 0.2, 2.1, 3.0])
>>> result = pygram11.histogram(x, bins=bins)  # will use OpenMP
>>> pygram11.VARIABLE_WIDTH_PARALLEL_THRESHOLD_1D = 7500
>>> result = pygram11.histogram(x, bins=bins)  # now will _not_ use OpenMP

Some shortcuts exist to completely disable or enable OpenMP: