pygram11

pygram11 is a small Python library for creating simple histograms quickly. The backend is written in C++11 (with some help from pybind11) and accelerated with OpenMP.

conda-forge PyPI PyPI - Python Version GitHub stars

Installation

Requirements

The only requirement to use pygram11 is NumPy. If you install binaries from conda-forge or PyPI, NumPy will be installed as a required dependency.

Extras for Source Builds

When building from source, all you need is a C++ compiler with C++11 support. The setup.py script will test to see if OpenMP is available. If it’s not, then the installation will abort. Most Linux distributions with modern GCC versions should provide OpenMP automatically (search the web to see how to install OpenMP from your distribution’s package manager). On macOS you’ll want to install libomp from Homebrew to use OpenMP with the Clang compiler shipped by Apple.

Install Options

PyPI

$ pip install pygram11

conda-forge

Installations from conda-forge provide a build that used OpenMP.

$ conda install pygram11 -c conda-forge

Note

On macOS the OpenMP libraries from LLVM (libomp) and Intel (libiomp) can clash if your conda environment includes the Intel Math Kernel Library (MKL) package distributed by Anaconda. You may need to install the nomkl package to prevent the clash (Intel MKL accelerates many linear algebra operations, but does not impact pygram11):

Source

$ pip install git+https://github.com/douglasdavis/pygram11.git@master

Quick Start

Jumping In

The main purpose of pygram11 is to be a faster near drop-in replacement of numpy.histogram() and numpy.histogram2d(). The NumPy functions always return the bin counts and the bin edges, while pygram11 functions return the bin counts and the standard error on the bin counts. Therefore, if one only cares about the bin counts, the libraries are completely interchangable.

These two funcion calls will provide the same result:

import numpy as np
import pygram11 as pg
counts, __ = np.histogram(np.random.randn(1000), bins=20, range=(-3, 3))
counts, __ = pg.histogram(np.random.randn(1000), bins=20, range=(-3, 3))

If one cares about the standard error on the bin counts, or the ability to retain under- and over-flow counts, then pygram11 is a great replacement. Checkout a blog post which describes how to recreate this behavior in pure NumPy, while pygram11 is as simple as:

data = np.random.randn(1000)
weights = np.random.uniform(0.5, 0.8, x.shape[0])
counts, err = pg.histogram(data, bins=10, range=(-3, 3), weights=weights, flow=True)

The pygram11.histogram() and pygram11.histogram2d() functions in the pygram11 API are meant to provide an easy transition from NumPy to pygram11. The next couple of sections summarize the structure of the pygram11 API.

Core pygram11 Functions

pygram11 provides a simple set of functions for calculating histograms:

pygram11.fix1d(x[, bins, range, weights, …])

Histogram data with fixed (uniform) bin widths.

pygram11.fix1dmw(x, weights[, bins, range, flow])

Histogram data with multiple weight variations and fixed width bins.

pygram11.var1d(x, bins[, weights, density, flow])

Histogram data with variable bin widths.

pygram11.var1dmw(x, weights, bins[, flow])

Histogram data with multiple weight variations and variable width bins.

pygram11.fix2d(x, y[, bins, range, weights])

Histogram the x, y data with fixed (uniform) binning.

pygram11.var2d(x, y, xbins, ybins[, weights])

Histogram the x, y data with variable width binning.

You’ll see that the API specific to pygram11 is a bit more specialized than the NumPy histogramming API (shown below).

Histogramming a normal distribution:

>>> h, err = pygram11.fix1d(np.random.randn(10000), bins=25, range=(-3, 3))

See the API reference for more examples.

NumPy-like Functions

For convenience a NumPy-like API is also provided (not one-to-one, see the API reference).

pygram11.histogram(x[, bins, range, …])

Histogram data in one dimension.

pygram11.histogram2d(x, y[, bins, range, …])

Histogram data in two dimensions.

Benchmarks

Setup

There are a number Python modules providing APIs for histogram calculations. Here we see how pygram11 performs in comparison to numpy, fast-histogram, and boost-histogram. Fast-histogram does not provide calculations for variable width bins, so we only compare to NumPy and boost-histogram. The tests were performed on an Intel i7-8850H 2.60Gz processor (6 physical cores, 12 threads).

Results

The results clearly show that pygram11 is most useful for input arrays exceeding about 5,000 elements. This makes sense because the pygram11 backend has a clear and simple overhead: to take advantage of N available threads we make N result arrays, fill them individually (splitting the loop over the input data N times), and finally combine the results (one per thread) into a final single result that is returned.

For one dimensional histograms with fixed width bins pygram11 becomes the most performant calculation for arrays with about 5,000 or more elements (up to about 3x faster than the next best option and over 10x faster than NumPy). Fast-histogram is a bit more performant for smaller arrays, while pygram11 is always faster than NumPy and boost-histogram.

_images/fixed1d.png

For two dimensional histograms with fixed width bins pygram11 becomes the most performant calculation for arrays with about 10,000 or more elements (up to about 3x faster than the next best option and almost 100x faster than NumPy). Fast-histogram is again faster for smaller inputs, while pygram11 is always faster than NumPy and almost always faster than boost-histogram.

_images/fixed2d.png

For one dimensional histograms with variable width bins pygram11 becomes the most performant option for arrays with about 10,000 or more elements (up to about 8x faster than the next best option and about 13x faster than NumPy).

_images/var1d.png

For two dimensional histograms with variable width bins pygram11 becomes the most performant option for arrays with about 5,000 or more elements (up to 10x faster than the next best option).

_images/var2d.png

API Reference

pygram11.fix1d

pygram11.fix1d(x, bins=10, range=None, weights=None, density=False, flow=False)[source]

Histogram data with fixed (uniform) bin widths.

Parameters
  • x (array_like) – Data to histogram.

  • bins (int) – The number of bins.

  • range ((float, float), optional) – The minimum and maximum of the histogram axis.

  • weights (array_like, optional) – The weights for each element of x.

  • density (bool) – If True, normalize histogram bins as value of PDF such that the integral over the range is one.

  • flow (bool) – If True, the under and overflow bin contents are added to the first and last bins, respectively.

Returns

Examples

A histogram of x with 20 bins between 0 and 100:

>>> h, __ = fix1d(x, bins=20, range=(0, 100))

The same data, now histogrammed with weights:

>>> w = np.abs(np.random.randn(x.shape[0]))
>>> h, h_err = fix1d(x, bins=20, range=(0, 100), weights=w)

pygram11.fix1dmw

pygram11.fix1dmw(x, weights, bins=10, range=None, flow=False)[source]

Histogram data with multiple weight variations and fixed width bins.

Parameters
  • x (array_like) – data to histogram.

  • weights (array_like) – The weight variations for the elements of x, first dimension is the length of x, second dimension is the number of weights variations.

  • bins (int) – The number of bins.

  • range ((float, float), optional) – The minimum and maximumm of the histogram axis.

  • flow (bool) – If True, the under and overflow bin contents are added to the first and last bins, respectively.

Returns

Examples

Multiple histograms of x with 50 bins between 0 and 100; using 20 different weight variations:

>>> x = np.random.randn(10000)
>>> twenty_weights = np.random.rand(x.shape[0], 20)
>>> h, err = fix1dmw(x, w, bins=50, range=(-3, 3))

h and err are now shape (50, 20). Each column represents the histogram of the data using its respective weight.

pygram11.fix2d

pygram11.fix2d(x, y, bins=10, range=None, weights=None)[source]

Histogram the x, y data with fixed (uniform) binning.

Parameters
  • x (array_like) – first entries in data pairs to histogram

  • y (array_like) – second entries in data pairs to histogram

  • bins (int or iterable) – if int, both dimensions will have that many bins, if iterable, the number of bins for each dimension

  • range (iterable, optional) – axis limits to histogram over in the form [(xmin, xmax), (ymin, ymax)]

  • weights (array_like, optional) – weight for each \((x_i, y_i)\) pair.

Returns

Examples

A histogram of (x, y) with 20 bins between 0 and 100 in the x dimention and 10 bins between 0 and 50 in the y dimension:

>>> h, __ = fix2d(x, y, bins=(20, 10), range=((0, 100), (0, 50)))

The same data, now histogrammed weighted (via w):

>>> h, err = fix2d(x, y, bins=(20, 10), range=((0, 100), (0, 50)), weights=w)

pygram11.var1d

pygram11.var1d(x, bins, weights=None, density=False, flow=False)[source]

Histogram data with variable bin widths.

Parameters
  • x (array_like) – data to histogram

  • bins (array_like) – bin edges

  • weights (array_like, optional) – weight for each element of x

  • density (bool) – normalize histogram bins as value of PDF such that the integral over the range is 1.

  • flow (bool) – if True the under and overflow bin contents are added to the first and last bins, respectively

Returns

Examples

A simple histogram with variable width bins:

>>> x = np.random.randn(10000)
>>> bin_edges = [-3.0, -2.5, -1.5, -0.25, 0.25, 2.0, 3.0]
>>> h, __ = var1d(x, bin_edges)

pygram11.var1dmw

pygram11.var1dmw(x, weights, bins, flow=False)[source]

Histogram data with multiple weight variations and variable width bins.

Parameters
  • x (array_like) – data to histogram

  • bins (array_like) – bin edges

  • weights (array_like) – weight variations for the elements of x, first dimension is the shape of x, second dimension is the number of weights.

  • density (bool) – normalize histogram bins as value of PDF such that the integral over the range is 1.

  • flow (bool) – if True the under and overflow bin contents are added to the first and last bins, respectively

Returns

Examples

Using three different weight variations:

>>> x = np.random.randn(10000)
>>> weights = np.abs(np.random.randn(x.shape[0], 3))
>>> bin_edges = [-3.0, -2.5, -1.5, -0.25, 0.25, 2.0, 3.0]
>>> h, err = var1dmw(x, weights, bin_edges)
>>> h.shape
(6, 3)
>>> err.shape
(6, 3)

pygram11.var2d

pygram11.var2d(x, y, xbins, ybins, weights=None)[source]

Histogram the x, y data with variable width binning.

Parameters
  • x (array_like) – first entries in the data pairs to histogram

  • y (array_like) – second entries in the data pairs to histogram

  • xbins (array_like) – bin edges for the x dimension

  • ybins (array_like) – bin edges for the y dimension

  • weights (array_like, optional) – weights for each \((x_i, y_i)\) pair.

Returns

Examples

A histogram of (x, y) where the edges are defined by a numpy.logspace() in both dimensions:

>>> bins = numpy.logspace(0.1, 1.0, 10, endpoint=True)
>>> h, __ = var2d(x, y, bins, bins)

pygram11.histogram

pygram11.histogram(x, bins=10, range=None, weights=None, density=False, flow=False)[source]

Histogram data in one dimension.

Parameters
  • x (array_like) – data to histogram.

  • bins (int or array_like) – if int: the number of bins; if array_like: the bin edges.

  • range (tuple(float, float), optional) – the definition of the edges of the bin range (start, stop).

  • weights (array_like, optional) – a set of weights associated with the elements of x. This can also be a two dimensional set of multiple weights varitions with shape (len(x), n_weight_variations).

  • density (bool) – normalize counts such that the integral over the range is equal to 1. If weights is two dimensional this argument is ignored.

  • flow (bool) – if True, include under/overflow in the first/last bins.

Returns

Examples

A simple fixed width histogram:

>>> h, __ = histogram(x, bins=20, range=(0, 100))

And with variable width histograms and weights:

>>> h, err = histogram(x, bins=[-3, -2, -1.5, 1.5, 3.5], weights=w)

pygram11.histogram2d

pygram11.histogram2d(x, y, bins=10, range=None, weights=None)[source]

Histogram data in two dimensions.

This function provides an API very simiar to numpy.histogram2d(). Keep in mind that the returns are different.

Parameters
  • x (array_like) – Array representing the x coordinate of the data to histogram.

  • y (array_like) – Array representing the y coordinate of the data to histogram.

  • bins (int or array_like or [int, int] or [array, array], optional) –

    The bin specification:
    • If int, the number of bins for the two dimensions (nx = ny = bins).

    • If array_like, the bin edges for the two dimensions (x_edges = y_edges = bins).

    • If [int, int], the number of bins in each dimension (nx, ny = bins).

    • If [array_like, array_like], the bin edges in each dimension (x_edges, y_edges = bins).

  • range (array_like, shape(2,2), optional) – The edges of this histogram along each dimension. If bins is not integral, then this parameter is ignored. If None, the default is [[x.min(), x.max()], [y.min(), y.max()]].

  • weights (array_like) – An array of weights associated to each element \((x_i, y_i)\) pair. Each pair of the data will contribute its associated weight to the bin count.

Returns

Examples

>>> h, err = histogram2d(x, y, weights=w)