Skip to content

Commit 94e0eca

Browse files
authored
Merge pull request #39 from histogrammar/1.0.x
v1.0.20
2 parents 7561183 + e75747f commit 94e0eca

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+13524
-2725
lines changed

.travis.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,10 @@ os:
44
- linux
55

66
python:
7-
- 2.7
8-
- 3.4
9-
- 3.5
107
- 3.6
118
- 3.7
9+
- 3.8
10+
- 3.9
1211

1312
addons:
1413
apt:

MANIFEST.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
include requirements.txt
2+
include LICENSE
3+
include NOTICE

NOTICE

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
################################################################################################
2+
#
3+
# NOTICE: pass-through licensing of bundled components
4+
#
5+
# Histogrammar gathers together a toolkit of pre-existing third-party
6+
# open-source software components. These software components are governed by their own licenses
7+
# which Histogrammar does not modify or supersede, please consult the originating
8+
# authors. These components altogether have a mixture of the following licenses: Apache 2.0, MIT.
9+
#
10+
# Although we have examined the licenses to verify acceptance of commercial and non-commercial
11+
# use, please see and consult the original licenses or authors.
12+
#
13+
# Here is the full list of license dependencies:
14+
#
15+
# numpy: https://github.com/numpy/numpy/blob/master/LICENSE.txt
16+
# tqdm: https://github.com/tqdm/tqdm/blob/master/LICENCE
17+
# matplotlib: https://github.com/matplotlib/matplotlib/blob/master/LICENSE/LICENSE
18+
# joblib: https://github.com/joblib/joblib/blob/master/LICENSE.txt
19+
# root: https://root.cern.ch/license
20+
# popmon: https://github.com/ing-bank/popmon/blob/master/LICENSE
21+
#
22+
# There are several functions/classes where code or techniques have been reproduced and/or modified
23+
# from existing open-source packages. We list these here:
24+
#
25+
# Package: popmon
26+
# popmon file: histogrammar/dfinterface/spark_histogrammar.py
27+
# Class: SparkHistogrammar
28+
# Reference: https://github.com/ing-bank/popmon/blob/master/popmon/hist/filling/spark_histogrammar.py
29+
# popmon file: histogrammar/dfinterface/pandas_histogrammar.py
30+
# Class: PandasHistogrammar
31+
# Reference: https://github.com/ing-bank/popmon/blob/master/popmon/hist/filling/pandas_histogrammar.py
32+
# popmon file: histogrammar/dfinterface/histogram_filler_base.py
33+
# Class: HistogramFillerBase
34+
# Reference: https://github.com/ing-bank/popmon/blob/master/popmon/hist/filling/histogram_filler_base.py
35+
# License: MIT
36+
# For details see: https://github.com/ing-bank/popmon/blob/master/LICENSE
37+
#
38+
################################################################################################

README.md

Lines changed: 0 additions & 65 deletions
This file was deleted.

README.rst

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
==================================
2+
histogrammar Python implementation
3+
==================================
4+
5+
histogrammar is a Python package for creating histograms. histogrammar has multiple histogram types,
6+
supports numeric and categorical features, and works with Numpy arrays and Pandas and Spark dataframes.
7+
Once a histogram is filled, it's easy to plot it, store it in JSON format (and retrieve it), or convert
8+
it to Numpy arrays for further analysis.
9+
10+
At its core histogrammar is a suite of data aggregation primitives designed for use in parallel processing.
11+
In the simplest case, you can use this to compute histograms, but the generality of the primitives
12+
allows much more.
13+
14+
Several common histogram types can be plotted in Matplotlib, Bokeh and PyROOT with a single method call.
15+
If Numpy or Pandas is available, histograms and other aggregators can be filled from arrays ten to a hundred times
16+
more quickly via Numpy commands, rather than Python for loops. If PyROOT is available, histograms and other
17+
aggregators can be filled from ROOT TTrees hundreds of times more quickly by JIT-compiling a specialized C++ filler.
18+
Histograms and other aggregators may also be converted into CUDA code for inclusion in a GPU workflow. And if
19+
PyCUDA is available, they can also be filled from Numpy arrays by JIT-compiling the CUDA code.
20+
This Python implementation of histogrammar been tested to guarantee compatibility with its Scala implementation.
21+
22+
Latest Python release: v1.0.20 (Feb 2021).
23+
24+
Announcements
25+
=============
26+
27+
Spark 3.0
28+
---------
29+
30+
With Spark 3.0, based on Scala 2.12, make sure to pick up the correct histogrammar jar file:
31+
32+
.. code-block:: python
33+
34+
spark = SparkSession.builder.config("spark.jars.packages", "io.github.histogrammar:histogrammar-sparksql_2.12:1.0.11").getOrCreate()
35+
36+
For Spark 2.X compiled against scala 2.11, in the string above simply replace "2.12" with "2.11".
37+
38+
February, 2021
39+
40+
Example notebooks
41+
=================
42+
43+
.. list-table::
44+
:widths: 80 20
45+
:header-rows: 1
46+
47+
* - Tutorial
48+
- Colab link
49+
* - `Basic tutorial <https://nbviewer.jupyter.org/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_basic.ipynb>`_
50+
- |notebook_basic_colab|
51+
* - `Detailed example (featuring configuration, Apache Spark and more) <https://nbviewer.jupyter.org/github/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_advanced.ipynb>`_
52+
- |notebook_advanced_colab|
53+
54+
Documentation
55+
=============
56+
57+
See `histogrammar-docs <https://histogrammar.github.io/histogrammar-docs/>`_ for a complete introduction to `histogrammar`.
58+
(A bit old but still good.) There you can also find documentation about the Scala implementation of `histogrammar`.
59+
60+
Check it out
61+
============
62+
63+
The `historgrammar` library requires Python 3.6+ and is pip friendly. To get started, simply do:
64+
65+
.. code-block:: bash
66+
67+
$ pip install histogrammar
68+
69+
or check out the code from our GitHub repository:
70+
71+
.. code-block:: bash
72+
73+
$ git clone https://github.com/histogrammar/histogrammar-python
74+
$ pip install -e histogrammar-python
75+
76+
where in this example the code is installed in edit mode (option -e).
77+
78+
You can now use the package in Python with:
79+
80+
.. code-block:: python
81+
82+
import histogrammar
83+
84+
**Congratulations, you are now ready to use the histogrammar library!**
85+
86+
Quick run
87+
=========
88+
89+
As a quick example, you can do:
90+
91+
.. code-block:: python
92+
93+
import pandas as pd
94+
import histogrammar as hg
95+
from histogrammar import resources
96+
97+
# open synthetic data
98+
df = pd.read_csv(resources.data('test.csv.gz'), parse_dates=['date'])
99+
df.head()
100+
101+
# create a histogram, tell it to look for column 'age'
102+
# fill the histogram with column 'age' and plot it
103+
hist = hg.Histogram(num=100, low=0, high=100, quantity='age')
104+
hist.fill.numpy(df)
105+
hist.plot.matplotlib()
106+
107+
# generate histograms of all features in the dataframe using automatic binning
108+
# (importing histogrammar automatically adds this functionality to a pandas or spark dataframe)
109+
hists = df.hg_make_histograms()
110+
print(hists.keys())
111+
112+
# multi-dimensional histograms are also supported. e.g. features longitude vs latitude
113+
hists = df.hg_make_histograms(features=['longitude:latitude'])
114+
ll = hists['longitude:latitude']
115+
ll.plot.matplotlib()
116+
117+
# store histogram and retrieve it again
118+
ll.toJsonFile('longitude_latitude.json')
119+
ll2 = hg.Factory().fromJsonFile('longitude_latitude.json')
120+
121+
122+
These examples also work with Spark dataframes. For more examples please see the notebooks and tutorials.
123+
124+
125+
Project contributors
126+
====================
127+
128+
This package was originally authored by DIANA-HEP and is now maintained by volunteers.
129+
130+
Contact and support
131+
===================
132+
133+
* Issues & Ideas & Support: https://github.com/histogrammar/histogrammar-python/issues
134+
135+
Please note that `histogrammar` is supported only on a best-effort basis.
136+
137+
License
138+
=======
139+
`histogrammar` is completely free, open-source and licensed under the `Apache-2.0 license <https://en.wikipedia.org/wiki/Apache_License>`_.
140+
141+
.. |notebook_basic_colab| image:: https://colab.research.google.com/assets/colab-badge.svg
142+
:alt: Open in Colab
143+
:target: https://colab.research.google.com/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_basic.ipynb
144+
.. |notebook_advanced_colab| image:: https://colab.research.google.com/assets/colab-badge.svg
145+
:alt: Open in Colab
146+
:target: https://colab.research.google.com/histogrammar/histogrammar-python/blob/master/histogrammar/notebooks/histogrammar_tutorial_advanced.ipynb

histogrammar/__init__.py

Lines changed: 32 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,47 @@
1+
# flake8: noqa
2+
13
#!/usr/bin/env python
24

35
# Copyright 2016 DIANA-HEP
4-
#
6+
#
57
# Licensed under the Apache License, Version 2.0 (the "License");
68
# you may not use this file except in compliance with the License.
79
# You may obtain a copy of the License at
8-
#
10+
#
911
# http://www.apache.org/licenses/LICENSE-2.0
10-
#
12+
#
1113
# Unless required by applicable law or agreed to in writing, software
1214
# distributed under the License is distributed on an "AS IS" BASIS,
1315
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1416
# See the License for the specific language governing permissions and
1517
# limitations under the License.
1618

17-
from histogrammar.defs import *
19+
from histogrammar.defs import Factory, Container
20+
21+
from histogrammar.primitives.average import Average
22+
from histogrammar.primitives.bag import Bag
23+
from histogrammar.primitives.bin import Bin
24+
from histogrammar.primitives.categorize import Categorize
25+
from histogrammar.primitives.centrallybin import CentrallyBin
26+
from histogrammar.primitives.collection import Collection, Branch, Index, Label, UntypedLabel
27+
from histogrammar.primitives.count import Count
28+
from histogrammar.primitives.deviate import Deviate
29+
from histogrammar.primitives.fraction import Fraction
30+
from histogrammar.primitives.irregularlybin import IrregularlyBin
31+
from histogrammar.primitives.minmax import Minimize, Maximize
32+
from histogrammar.primitives.select import Select
33+
from histogrammar.primitives.sparselybin import SparselyBin
34+
from histogrammar.primitives.stack import Stack
35+
from histogrammar.primitives.sum import Sum
1836

19-
from histogrammar.primitives.average import *
20-
from histogrammar.primitives.bag import *
21-
from histogrammar.primitives.bin import *
22-
from histogrammar.primitives.categorize import *
23-
from histogrammar.primitives.centrallybin import *
24-
from histogrammar.primitives.collection import *
25-
from histogrammar.primitives.count import *
26-
from histogrammar.primitives.deviate import *
27-
from histogrammar.primitives.fraction import *
28-
from histogrammar.primitives.irregularlybin import *
29-
from histogrammar.primitives.minmax import *
30-
from histogrammar.primitives.select import *
31-
from histogrammar.primitives.sparselybin import *
32-
from histogrammar.primitives.stack import *
33-
from histogrammar.primitives.sum import *
37+
from histogrammar.convenience import Histogram
38+
from histogrammar.convenience import SparselyHistogram
39+
from histogrammar.convenience import Profile
40+
from histogrammar.convenience import SparselyProfile
41+
from histogrammar.convenience import ProfileErr
42+
from histogrammar.convenience import SparselyProfileErr
43+
from histogrammar.convenience import TwoDimensionallyHistogram
44+
from histogrammar.convenience import TwoDimensionallySparselyHistogram
3445

35-
from histogrammar.specialized import Histogram
36-
from histogrammar.specialized import SparselyHistogram
37-
from histogrammar.specialized import Profile
38-
from histogrammar.specialized import SparselyProfile
39-
from histogrammar.specialized import ProfileErr
40-
from histogrammar.specialized import SparselyProfileErr
41-
from histogrammar.specialized import TwoDimensionallyHistogram
42-
from histogrammar.specialized import TwoDimensionallySparselyHistogram
46+
# handy monkey patch functions for pandas and spark dataframes
47+
import histogrammar.dfinterface

0 commit comments

Comments
 (0)