pygram11¶
pygram11 is a small Python library for creating simple histograms quickly. The backend is written in C++11 (with some help from pybind11) and accelerated with OpenMP.
Installation¶
Requirements¶
The only requirement to use pygram11 is NumPy. If you install binaries from conda-forge or PyPI, NumPy will be installed as a required dependency.
Extras for Source Builds¶
When building from source, all you need is a C++ compiler with C++11
support. The setup.py
script will test to see if OpenMP is
available. If it’s not, then the installation will abort. Most Linux
distributions with modern GCC versions should provide OpenMP
automatically (search the web to see how to install OpenMP from your
distribution’s package manager). On macOS you’ll want to install
libomp
from Homebrew to use OpenMP with the Clang compiler shipped
by Apple.
Install Options¶
PyPI¶
$ pip install pygram11
conda-forge¶
Installations from conda-forge provide a build that used OpenMP.
$ conda install pygram11 -c conda-forge
Note
On macOS the OpenMP libraries from LLVM (libomp
) and Intel
(libiomp
) can clash if your conda
environment includes the
Intel Math Kernel Library (MKL) package distributed by
Anaconda. You may need to install the nomkl
package to prevent
the clash (Intel MKL accelerates many linear algebra operations,
but does not impact pygram11):
Source¶
$ pip install git+https://github.com/douglasdavis/pygram11.git@master
Quick Start¶
Jumping In¶
The main purpose of pygram11 is to be a faster near drop-in
replacement of numpy.histogram()
and
numpy.histogram2d()
. The NumPy functions always return the bin
counts and the bin edges, while pygram11 functions return the bin
counts and the standard error on the bin counts. Therefore, if one
only cares about the bin counts, the libraries are completely
interchangable.
These two funcion calls will provide the same result:
import numpy as np
import pygram11 as pg
counts, __ = np.histogram(np.random.randn(1000), bins=20, range=(-3, 3))
counts, __ = pg.histogram(np.random.randn(1000), bins=20, range=(-3, 3))
If one cares about the standard error on the bin counts, or the ability to retain under- and over-flow counts, then pygram11 is a great replacement. Checkout a blog post which describes how to recreate this behavior in pure NumPy, while pygram11 is as simple as:
data = np.random.randn(1000)
weights = np.random.uniform(0.5, 0.8, x.shape[0])
counts, err = pg.histogram(data, bins=10, range=(-3, 3), weights=weights, flow=True)
The pygram11.histogram()
and pygram11.histogram2d()
functions in the pygram11 API are meant to provide an easy transition
from NumPy to pygram11. The next couple of sections summarize the
structure of the pygram11 API.
Core pygram11 Functions¶
pygram11 provides a simple set of functions for calculating histograms:
|
Histogram data with fixed (uniform) bin widths. |
|
Histogram data with multiple weight variations and fixed width bins. |
|
Histogram data with variable bin widths. |
|
Histogram data with multiple weight variations and variable width bins. |
|
Histogram the |
|
Histogram the |
You’ll see that the API specific to pygram11 is a bit more specialized than the NumPy histogramming API (shown below).
Histogramming a normal distribution:
>>> h, err = pygram11.fix1d(np.random.randn(10000), bins=25, range=(-3, 3))
See the API reference for more examples.
NumPy-like Functions¶
For convenience a NumPy-like API is also provided (not one-to-one, see the API reference).
|
Histogram data in one dimension. |
|
Histogram data in two dimensions. |
Benchmarks¶
Setup¶
There are a number Python modules providing APIs for histogram calculations. Here we see how pygram11 performs in comparison to numpy, fast-histogram, and boost-histogram. Fast-histogram does not provide calculations for variable width bins, so we only compare to NumPy and boost-histogram. The tests were performed on an Intel i7-8850H 2.60Gz processor (6 physical cores, 12 threads).
Results¶
The results clearly show that pygram11 is most useful for input arrays exceeding about 5,000 elements. This makes sense because the pygram11 backend has a clear and simple overhead: to take advantage of N available threads we make N result arrays, fill them individually (splitting the loop over the input data N times), and finally combine the results (one per thread) into a final single result that is returned.
For one dimensional histograms with fixed width bins pygram11 becomes the most performant calculation for arrays with about 5,000 or more elements (up to about 3x faster than the next best option and over 10x faster than NumPy). Fast-histogram is a bit more performant for smaller arrays, while pygram11 is always faster than NumPy and boost-histogram.
For two dimensional histograms with fixed width bins pygram11 becomes the most performant calculation for arrays with about 10,000 or more elements (up to about 3x faster than the next best option and almost 100x faster than NumPy). Fast-histogram is again faster for smaller inputs, while pygram11 is always faster than NumPy and almost always faster than boost-histogram.
For one dimensional histograms with variable width bins pygram11 becomes the most performant option for arrays with about 10,000 or more elements (up to about 8x faster than the next best option and about 13x faster than NumPy).
For two dimensional histograms with variable width bins pygram11 becomes the most performant option for arrays with about 5,000 or more elements (up to 10x faster than the next best option).
API Reference¶
pygram11.fix1d¶
-
pygram11.
fix1d
(x, bins=10, range=None, weights=None, density=False, flow=False)[source]¶ Histogram data with fixed (uniform) bin widths.
- Parameters
x (array_like) – Data to histogram.
bins (int) – The number of bins.
range ((float, float), optional) – The minimum and maximum of the histogram axis.
weights (array_like, optional) – The weights for each element of
x
.density (bool) – If True, normalize histogram bins as value of PDF such that the integral over the range is one.
flow (bool) – If True, the under and overflow bin contents are added to the first and last bins, respectively.
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A histogram of
x
with 20 bins between 0 and 100:>>> h, __ = fix1d(x, bins=20, range=(0, 100))
The same data, now histogrammed with weights:
>>> w = np.abs(np.random.randn(x.shape[0])) >>> h, h_err = fix1d(x, bins=20, range=(0, 100), weights=w)
pygram11.fix1dmw¶
-
pygram11.
fix1dmw
(x, weights, bins=10, range=None, flow=False)[source]¶ Histogram data with multiple weight variations and fixed width bins.
- Parameters
x (array_like) – data to histogram.
weights (array_like) – The weight variations for the elements of
x
, first dimension is the length ofx
, second dimension is the number of weights variations.bins (int) – The number of bins.
range ((float, float), optional) – The minimum and maximumm of the histogram axis.
flow (bool) – If True, the under and overflow bin contents are added to the first and last bins, respectively.
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
Multiple histograms of
x
with 50 bins between 0 and 100; using 20 different weight variations:>>> x = np.random.randn(10000) >>> twenty_weights = np.random.rand(x.shape[0], 20) >>> h, err = fix1dmw(x, w, bins=50, range=(-3, 3))
h
anderr
are now shape(50, 20)
. Each column represents the histogram of the data using its respective weight.
pygram11.fix2d¶
-
pygram11.
fix2d
(x, y, bins=10, range=None, weights=None)[source]¶ Histogram the
x
,y
data with fixed (uniform) binning.- Parameters
x (array_like) – first entries in data pairs to histogram
y (array_like) – second entries in data pairs to histogram
bins (int or iterable) – if int, both dimensions will have that many bins, if iterable, the number of bins for each dimension
range (iterable, optional) – axis limits to histogram over in the form [(xmin, xmax), (ymin, ymax)]
weights (array_like, optional) – weight for each \((x_i, y_i)\) pair.
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A histogram of (
x
,y
) with 20 bins between 0 and 100 in thex
dimention and 10 bins between 0 and 50 in they
dimension:>>> h, __ = fix2d(x, y, bins=(20, 10), range=((0, 100), (0, 50)))
The same data, now histogrammed weighted (via
w
):>>> h, err = fix2d(x, y, bins=(20, 10), range=((0, 100), (0, 50)), weights=w)
pygram11.var1d¶
-
pygram11.
var1d
(x, bins, weights=None, density=False, flow=False)[source]¶ Histogram data with variable bin widths.
- Parameters
x (array_like) – data to histogram
bins (array_like) – bin edges
weights (array_like, optional) – weight for each element of
x
density (bool) – normalize histogram bins as value of PDF such that the integral over the range is 1.
flow (bool) – if
True
the under and overflow bin contents are added to the first and last bins, respectively
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A simple histogram with variable width bins:
>>> x = np.random.randn(10000) >>> bin_edges = [-3.0, -2.5, -1.5, -0.25, 0.25, 2.0, 3.0] >>> h, __ = var1d(x, bin_edges)
pygram11.var1dmw¶
-
pygram11.
var1dmw
(x, weights, bins, flow=False)[source]¶ Histogram data with multiple weight variations and variable width bins.
- Parameters
x (array_like) – data to histogram
bins (array_like) – bin edges
weights (array_like) – weight variations for the elements of
x
, first dimension is the shape ofx
, second dimension is the number of weights.density (bool) – normalize histogram bins as value of PDF such that the integral over the range is 1.
flow (bool) – if
True
the under and overflow bin contents are added to the first and last bins, respectively
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
Using three different weight variations:
>>> x = np.random.randn(10000) >>> weights = np.abs(np.random.randn(x.shape[0], 3)) >>> bin_edges = [-3.0, -2.5, -1.5, -0.25, 0.25, 2.0, 3.0] >>> h, err = var1dmw(x, weights, bin_edges) >>> h.shape (6, 3) >>> err.shape (6, 3)
pygram11.var2d¶
-
pygram11.
var2d
(x, y, xbins, ybins, weights=None)[source]¶ Histogram the
x
,y
data with variable width binning.- Parameters
x (array_like) – first entries in the data pairs to histogram
y (array_like) – second entries in the data pairs to histogram
xbins (array_like) – bin edges for the
x
dimensionybins (array_like) – bin edges for the
y
dimensionweights (array_like, optional) – weights for each \((x_i, y_i)\) pair.
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A histogram of (
x
,y
) where the edges are defined by anumpy.logspace()
in both dimensions:>>> bins = numpy.logspace(0.1, 1.0, 10, endpoint=True) >>> h, __ = var2d(x, y, bins, bins)
pygram11.histogram¶
-
pygram11.
histogram
(x, bins=10, range=None, weights=None, density=False, flow=False)[source]¶ Histogram data in one dimension.
- Parameters
x (array_like) – data to histogram.
bins (int or array_like) – if int: the number of bins; if array_like: the bin edges.
range (tuple(float, float), optional) – the definition of the edges of the bin range (start, stop).
weights (array_like, optional) – a set of weights associated with the elements of
x
. This can also be a two dimensional set of multiple weights varitions with shape (len(x), n_weight_variations).density (bool) – normalize counts such that the integral over the range is equal to 1. If
weights
is two dimensional this argument is ignored.flow (bool) – if
True
, include under/overflow in the first/last bins.
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
A simple fixed width histogram:
>>> h, __ = histogram(x, bins=20, range=(0, 100))
And with variable width histograms and weights:
>>> h, err = histogram(x, bins=[-3, -2, -1.5, 1.5, 3.5], weights=w)
pygram11.histogram2d¶
-
pygram11.
histogram2d
(x, y, bins=10, range=None, weights=None)[source]¶ Histogram data in two dimensions.
This function provides an API very simiar to
numpy.histogram2d()
. Keep in mind that the returns are different.- Parameters
x (array_like) – Array representing the
x
coordinate of the data to histogram.y (array_like) – Array representing the
y
coordinate of the data to histogram.bins (int or array_like or [int, int] or [array, array], optional) –
- The bin specification:
If int, the number of bins for the two dimensions (
nx = ny = bins
).If array_like, the bin edges for the two dimensions (
x_edges = y_edges = bins
).If [int, int], the number of bins in each dimension (
nx, ny = bins
).If [array_like, array_like], the bin edges in each dimension (
x_edges, y_edges = bins
).
range (array_like, shape(2,2), optional) – The edges of this histogram along each dimension. If
bins
is not integral, then this parameter is ignored. If None, the default is[[x.min(), x.max()], [y.min(), y.max()]]
.weights (array_like) – An array of weights associated to each element \((x_i, y_i)\) pair. Each pair of the data will contribute its associated weight to the bin count.
- Returns
numpy.ndarray
– The bin counts.numpy.ndarray
– The standard error of each bin count, \(\sqrt{\sum_i w_i^2}\).
Examples
>>> h, err = histogram2d(x, y, weights=w)