pygram11¶
pygram11 is a small Python library for creating simple histograms quickly. The backend is written in C++14 with some help from pybind11 and accelerated with OpenMP.
Installation¶
Requirements¶
The only requirement to use pygram11 is NumPy. If you install binaries from conda-forge or PyPI, NumPy will be installed as a required dependency.
Extras for Source Builds¶
When building from source, all you need is a C++ compiler with C++14
support. The setup.py
script will test to see if OpenMP is
available. If it’s not, then the installation will abort. Most Linux
distributions with modern GCC versions should provide OpenMP
automatically (search the web to see how to install OpenMP from your
distribution’s package manager). On macOS you’ll want to install
libomp
from Homebrew to use OpenMP with the Clang compiler shipped
by Apple.
Install Options¶
PyPI¶
$ pip install pygram11
conda-forge¶
Installations from conda-forge provide a build that used OpenMP.
$ conda install pygram11 -c conda-forge
Note
On macOS the OpenMP libraries from LLVM (libomp
) and Intel
(libiomp
) can clash if your conda
environment includes the
Intel Math Kernel Library (MKL) package distributed by
Anaconda. You may need to install the nomkl
package to prevent
the clash (Intel MKL accelerates many linear algebra operations,
but does not impact pygram11):
Source¶
$ pip install git+https://github.com/douglasdavis/pygram11.git@main
Quick Start¶
Jumping In¶
The main purpose of pygram11 is to be a faster near drop-in
replacement of numpy.histogram()
and
numpy.histogram2d()
with support for uncertainties. The NumPy
functions always return the bin counts and the bin edges, while
pygram11 functions return the bin counts and the standard error on the
bin counts (if weights are not used, the second return type from
pygram11 functions will be None
). Therefore, if one only cares
about the bin counts, the libraries are completely interchangable.
These two funcion calls will provide the same result:
import numpy as np
import pygram11 as pg
rng = np.random.default_rng(123)
x = rng.standard_normal(10000)
counts1, __ = np.histogram(x, bins=20, range=(-3, 3))
counts2, __ = pg.histogram(x, bins=20, range=(-3, 3))
np.testing.assert_allclose(counts1, counts2)
If one cares about the statistical uncertainty on the bin counts, or the ability to retain under- and over-flow counts, then pygram11 is a great replacement. Checkout a blog post which describes how to recreate this behavior in pure NumPy, while pygram11 is as simple as:
data = rng.standard_normal(10000)
weights = rng.uniform(0.1, 0.9, x.shape[0])
counts, err = pg.histogram(data, bins=10, range=(-3, 3), weights=weights, flow=True)
The pygram11.histogram()
and pygram11.histogram2d()
functions in the pygram11 API are meant to provide an easy transition
from NumPy to pygram11. The next couple of sections summarize the
structure of the pygram11 API.
Core pygram11 Functions¶
pygram11 provides a simple set of functions for calculating histograms:
|
Histogram data with fixed (uniform) bin widths. |
|
Histogram data with multiple weight variations and fixed width bins. |
|
Histogram data with variable bin widths. |
|
Histogram data with multiple weight variations and variable width bins. |
|
Histogram the |
|
Histogram the |
You’ll see that the API specific to pygram11 is a bit more specialized than the NumPy histogramming API (shown below).
Histogramming a normal distribution:
>>> rng = np.random.default_rng(123)
>>> h, __ = pygram11.fix1d(rng.standard_normal(10000), bins=25, range=(-3, 3))
See the API reference for more examples.
NumPy-like Functions¶
For convenience a NumPy-like API is also provided (not one-to-one, see the API reference).
|
Histogram data in one dimension. |
|
Histogram data in two dimensions. |
Supported Types¶
Conversions between NumPy array types can take some time when calculating histograms.
In [1]: import numpy as np
In [2]: import pygram11 as pg
In [3]: rng = np.random.default_rng(123)
In [4]: x = rng.standard_normal(2_000_000)
In [5]: %timeit pg.histogram(x, bins=30, range=(-4, 4))
1.95 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: %timeit pg.histogram(x.astype(np.float32), bins=30, range=(-4, 4))
2.33 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can see the type conversion increases this calculation time by
about 20%. The backend C++ functions prohibit type conversions of the
input data. If an array with an unsupported numpy.dtype
is
passed to pygram11, a TypeError
will be rasied. Supported
numpy.dtype
’s for data are:
numpy.float64
(a C/C++double
)numpy.int64
(a C/C++int64_t
)numpy.uint64
(a C/C++uint64_t
)numpy.float32
(a C/C++float
)numpy.int32
(a C/C++int32_t
numpy.uint32
(a C/C++uint32_t
)
and for weights:
numpy.float64
numpy.float32
OpenMP Configuration¶
For small datasets OpenMP acceleration introduced unncessary overhead.
The C++ backend utilizes OpenMP parallel loops if the data size is
above a threshold for a respective histogramming situation. By default
these thresholds are 10,000 for fixed width histograms and 5,000 for
variable width histograms. The thresholds can be configured with
dynamic variables in the pygram11
module:
FIXED_WIDTH_PARALLEL_THRESHOLD_1D
FIXED_WIDTH_PARALLEL_THRESHOLD_2D
FIXED_WIDTH_MW_PARALLEL_THRESHOLD_1D
VARIABLE_WIDTH_PARALLEL_THRESHOLD_1D
VARIABLE_WIDTH_PARALLEL_THRESHOLD_2D
VARIABLE_WIDTH_MW_PARALLEL_THRESHOLD_1D
An example changing the threshold:
>>> import pygram11
>>> import numpy as np
>>> rng = np.random.default_rng(123)
>>> x = rng.standard_uniform(6000)
>>> bins = np.array([-3.1, -2.5, -2.0, 0.1, 0.2, 2.1, 3.0])
>>> result = pygram11.histogram(x, bins=bins) # will use OpenMP
>>> pygram11.VARIABLE_WIDTH_PARALLEL_THRESHOLD_1D = 7500
>>> result = pygram11.histogram(x, bins=bins) # now will _not_ use OpenMP
Some shortcuts exist to completely disable or enable OpenMP:
pygram11.disable_omp()
: maximizes all thresholds so OpenMP will never be used.pygram11.force_omp()
: zeros all thresholds so OpenMP will always be used.
Benchmarks¶
Setup¶
There are a number Python modules providing APIs for histogram calculations. Here we see how pygram11 performs in comparison to numpy, fast-histogram, and boost-histogram. Tests were performed on an Intel i7-8850H 2.60Gz processor (6 physical cores, 12 threads).
Fast-histogram does not provide calculations for variable width bins, so, when benchmarking variable width bins, we only compare to NumPy and boost-histogram.
Results¶
The results clearly show that pygram11 is most useful for input arrays exceeding about 5,000 elements. This makes sense because the pygram11 backend has a clear and simple overhead: to take advantage of N available threads we make N result arrays, fill them individually (splitting the loop over the input data N times), and finally combine the results (one per thread) into a final single result that is returned.
For one dimensional histograms with fixed width bins pygram11 becomes the most performant calculation for arrays with about 5,000 or more elements (up to about 3x faster than the next best option and over 10x faster than NumPy). Fast-histogram is a bit more performant for smaller arrays, while pygram11 is always faster than NumPy and boost-histogram.

For two dimensional histograms with fixed width bins pygram11 becomes the most performant calculation for arrays with about 10,000 or more elements (up to about 3x faster than the next best option and almost 100x faster than NumPy). Fast-histogram is again faster for smaller inputs, while pygram11 is always faster than NumPy and almost always faster than boost-histogram.

For one dimensional histograms with variable width bins pygram11 becomes the most performant option for arrays with about 10,000 or more elements (up to about 8x faster than the next best option and about 13x faster than NumPy).

For two dimensional histograms with variable width bins pygram11 becomes the most performant option for arrays with about 5,000 or more elements (up to 10x faster than the next best option).

API Reference¶
pygram11.fix1d¶
-
pygram11.
fix1d
(x, bins=10, range=None, weights=None, density=False, flow=False)[source]¶ Histogram data with fixed (uniform) bin widths.
- Parameters
x (numpy.ndarray) – Data to histogram.
bins (int) – The number of bins.
range ((float, float), optional) – The minimum and maximum of the histogram axis. If
None
, min and max ofx
will be used.weights (numpy.ndarray, optional) – The weights for each element of
x
. If weights are absent, the second return type will beNone
.density (bool) – Normalize histogram counts as value of PDF such that the integral over the range is unity.
flow (bool) – Include under/overflow in the first/last bins.
- Raises
ValueError – If
x
andweights
have incompatible shapes.TypeError – If
x
orweights
are unsupported types
- Returns
numpy.ndarray
– The resulting histogram bin counts.numpy.ndarray
, optional – The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\). The return isNone
if weights are not used.
Examples
A histogram of
x
with 20 bins between 0 and 100:>>> h, __ = fix1d(x, bins=20, range=(0, 100))
When weights are absent the second return is
None
. The same data, now histogrammed with weights and over/underflow included:>>> rng = np.random.default_rng(123) >>> w = rng.uniform(0.1, 0.9, x.shape[0])) >>> h, stderr = fix1d(x, bins=20, range=(0, 100), weights=w, flow=True)
pygram11.fix1dmw¶
-
pygram11.
fix1dmw
(x, weights, bins=10, range=None, flow=False)[source]¶ Histogram data with multiple weight variations and fixed width bins.
The weights array must have a total number of rows equal to the length of the input data. The number of columns in the weights array is equal to the number of weight variations. (The weights array must be an M x N matrix where M is the length of x and N is the number of weight variations).
- Parameters
x (numpy.ndarray) – Data to histogram.
weights (numpy.ndarray) – The weight variations for the elements of
x
, first dimension is the length ofx
, second dimension is the number of weights variations.bins (int) – The number of bins.
range ((float, float), optional) – The minimum and maximum of the histogram axis. If
None
, min and max ofx
will be used.flow (bool) – Include under/overflow in the first/last bins.
- Raises
ValueError – If
x
andweights
have incompatible shapes (ifx.shape[0] != weights.shape[0]
).ValueError – If
weights
is not a two dimensional array.TypeError – If
x
orweights
are unsupported types
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
Multiple histograms of
x
using 20 different weight variations:>>> rng = np.random.default_rng(123) >>> x = rng.standard_normal(10000) >>> twenty_weights = np.abs(rng.standard_normal((x.shape[0], 20))) >>> h, err = fix1dmw(x, twenty_weights, bins=50, range=(-3, 3))
h
anderr
are now shape(50, 20)
. Each column represents the histogram of the data using its respective weight.
pygram11.fix2d¶
-
pygram11.
fix2d
(x, y, bins=10, range=None, weights=None, flow=False)[source]¶ Histogram the
x
,y
data with fixed (uniform) binning.The two input arrays (
x
andy
) must be the same length (shape).- Parameters
x (numpy.ndarray) – First entries in data pairs to histogram.
y (numpy.ndarray) – Second entries in data pairs to histogram.
bins (int or (int, int)) – If int, both dimensions will have that many bins; if tuple, the number of bins for each dimension
range (Sequence[Tuple[float, float]], optional) – Axis limits in the form
[(xmin, xmax), (ymin, ymax)]
. IfNone
the input data min and max will be used.weights (array_like, optional) – The weights for data element. If weights are absent, the second return type will be
None
.flow (bool) – Include over/underflow.
- Raises
ValueError – If
x
andy
have incompatible shapes.ValueError – If the shape of
weights
is incompatible withx
andy
TypeError – If
x
,y
, orweights
are unsupported types
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
, optional – The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A histogram of (
x
,y
) with 20 bins between 0 and 100 in thex
dimention and 10 bins between 0 and 50 in they
dimension:>>> h, __ = fix2d(x, y, bins=(20, 10), range=((0, 100), (0, 50)))
The same data, now histogrammed weighted (via
w
):>>> h, err = fix2d(x, y, bins=(20, 10), range=((0, 100), (0, 50)), weights=w)
pygram11.var1d¶
-
pygram11.
var1d
(x, bins, weights=None, density=False, flow=False)[source]¶ Histogram data with variable bin widths.
- Parameters
x (numpy.ndarray) – Data to histogram
bins (numpy.ndarray) – Bin edges
weights (numpy.ndarray, optional) – The weights for each element of
x
. If weights are absent, the second return type will beNone
.density (bool) – Normalize histogram counts as value of PDF such that the integral over the range is unity.
flow (bool) – Include under/overflow in the first/last bins.
- Raises
ValueError – If the array of bin edges is not monotonically increasing.
ValueError – If
x
andweights
have incompatible shapes.TypeError – If
x
orweights
are unsupported types
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
, optional – The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\). The return isNone
if weights are not used.
Examples
A simple histogram with variable width bins:
>>> rng = np.random.default_rng(123) >>> x = rng.standard_normal(1000) >>> edges = np.array([-3.0, -2.5, -1.5, -0.25, 0.25, 2.0, 3.0]) >>> h, __ = var1d(x, edges)
pygram11.var1dmw¶
-
pygram11.
var1dmw
(x, weights, bins, flow=False)[source]¶ Histogram data with multiple weight variations and variable width bins.
The weights array must have a total number of rows equal to the length of the input data. The number of columns in the weights array is equal to the number of weight variations. (The weights array must be an M x N matrix where M is the length of x and N is the number of weight variations).
- Parameters
x (numpy.ndarray) – Data to histogram.
weights (numpy.ndarray) – Weight variations for the elements of
x
, first dimension is the shape ofx
, second dimension is the number of weights.bins (numpy.ndarray) – Bin edges.
flow (bool) – Include under/overflow in the first/last bins.
- Raises
ValueError – If the array of bin edges is not monotonically increasing.
ValueError – If
x
andweights
have incompatible shapes.ValueError – If
weights
is not a two dimensional array.TypeError – If
x
orweights
are unsupported types
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
Using three different weight variations:
>>> rng = np.random.default_rng(123) >>> x = rng.standard_normal(10000) >>> weights = nb.abs(rng.standard_normal((x.shape[0], 3))) >>> edges = np.array([-3.0, -2.5, -1.5, -0.25, 0.25, 2.0, 3.0]) >>> h, err = var1dmw(x, weights, edges) >>> h.shape (6, 3) >>> err.shape (6, 3)
pygram11.var2d¶
-
pygram11.
var2d
(x, y, xbins, ybins, weights=None, flow=False)[source]¶ Histogram the
x
,y
data with variable width binning.The two input arrays (
x
andy
) must be the same length (shape).- Parameters
x (numpy.ndarray) – First entries in data pairs to histogram.
y (numpy.ndarray) – Second entries in data pairs to histogram.
xbins (numpy.ndarray) – Bin edges for the
x
dimension.ybins (np.ndarray) – Bin edges for the
y
dimension.weights (array_like, optional) – The weights for data element. If weights are absent, the second return type will be
None
.flow (bool) – Include under/overflow.
- Raises
ValueError – If
x
andy
have different shape.ValueError – If either bin edge definition is not monotonically increasing.
TypeError – If
x
,y
, orweights
are unsupported types
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
, optional – The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A histogram of (
x
,y
) where the edges are defined by anumpy.logspace()
in both dimensions:>>> bins = numpy.logspace(0.1, 1.0, 10, endpoint=True) >>> h, __ = var2d(x, y, bins, bins)
pygram11.histogram¶
-
pygram11.
histogram
(x, bins=10, range=None, weights=None, density=False, flow=False)[source]¶ Histogram data in one dimension.
- Parameters
x (array_like) – Data to histogram.
bins (int or array_like) – If int: the number of bins; if array_like: the bin edges.
range ((float, float), optional) – The minimum and maximum of the histogram axis. If
None
with integerbins
, min and max ofx
will be used. Ifbins
is an array this is expected to beNone
.weights (array_like, optional) – Weight variations for the elements of
x
. For single weight histograms the shape must be the same shape asx
. For multiweight histograms the first dimension is the length ofx
, second dimension is the number of weights variations.density (bool) – Normalize histogram counts as value of PDF such that the integral over the range is unity.
flow (bool) – Include under/overflow in the first/last bins.
- Raises
ValueError – If
bins
defines edges whilerange
is also notNone
.ValueError – If the array of bin edges is not monotonically increasing.
ValueError – If
x
andweights
have incompatible shapes.ValueError – If multiweight histogramming is detected and
weights
is not a two dimensional array.TypeError – If
x
orweights
are unsupported types
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
, optional – The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\). The return isNone
if weights are not used.
See also
Examples
A simple fixed width histogram:
>>> h, __ = histogram(x, bins=20, range=(0, 100))
And with variable width histograms and weights:
>>> h, err = histogram(x, bins=[-3, -2, -1.5, 1.5, 3.5], weights=w)
pygram11.histogram2d¶
-
pygram11.
histogram2d
(x, y, bins=10, range=None, weights=None, flow=False)[source]¶ Histogram data in two dimensions.
This function provides an API very simiar to
numpy.histogram2d()
. Keep in mind that the returns are different.- Parameters
x (array_like) – Array representing the
x
coordinate of the data to histogram.y (array_like) – Array representing the
y
coordinate of the data to histogram.bins (int or array_like or [int, int] or [array, array], optional) –
- The bin specification:
If int, the number of bins for the two dimensions (
nx = ny = bins
).If array_like, the bin edges for the two dimensions (
x_edges = y_edges = bins
).If [int, int], the number of bins in each dimension (
nx, ny = bins
).If [array_like, array_like], the bin edges in each dimension (
x_edges, y_edges = bins
).
range (array_like, shape(2,2), optional) – The edges of this histogram along each dimension. If
bins
is not integral, then this parameter is ignored. If None, the default is[[x.min(), x.max()], [y.min(), y.max()]]
.weights (array_like) – An array of weights associated to each element \((x_i, y_i)\) pair. Each pair of the data will contribute its associated weight to the bin count.
flow (bool) – Include over/underflow.
- Raises
ValueError – If
x
andy
have different shape or either bin edge definition is not monotonically increasing.ValueError – If the shape of
weights
is not compatible withx
andy
.TypeError – If
x
,y
, orweights
are unsupported types
See also
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
>>> h, err = histogram2d(x, y, weights=w)
pygram11.force_omp¶
-
pygram11.
force_omp
()[source]¶ Force OpenMP acceleration by minimizing the parallel thresholds.
The default behavior is to avoid OpenMP acceleration for input data with length below about 10,000 for fixed with histograms and 5,000 for variable width histograms. This function forces all thresholds to be the 1 (always use OpenMP acceleration).
pygram11.disable_omp¶
-
pygram11.
disable_omp
()[source]¶ Disable OpenMP acceleration by maximizing the parallel thresholds.
The default behavior is to avoid OpenMP acceleration for input data with length below about 10,000 for fixed with histograms and 5,000 for variable width histograms. This function forces all thresholds to be the
sys.maxsize
(never use OpenMP acceleration).
pygram11.bin_centers¶
-
pygram11.
bin_centers
(bins, range=None)[source]¶ Construct array of center values for each bin.
- Parameters
- Returns
Array of bin centers.
- Return type
- Raises
ValueError – If
bins
is an integer and range is undefined (None
).
Examples
The centers given the number of bins and max/min:
>>> bin_centers(10, range=(-3, 3)) array([-2.7, -2.1, -1.5, -0.9, -0.3, 0.3, 0.9, 1.5, 2.1, 2.7])
Or given bin edges:
>>> bin_centers([0, 1, 2, 3, 4]) array([0.5, 1.5, 2.5, 3.5])