Skip to content

Commit 6543aae

Browse files
authored
Merge pull request #25 from KaveIO/inplace_build
Update setup to make it compatible with --use-feature=in-tree-build
2 parents 53cd8f7 + 303272f commit 6543aae

32 files changed

+2104
-805
lines changed

.github/workflows/inplace_build.yml

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: In tree build
2+
3+
on:
4+
workflow_dispatch:
5+
pull_request:
6+
push:
7+
branches:
8+
- master
9+
10+
jobs:
11+
build:
12+
name: ${{ matrix.platform }}
13+
strategy:
14+
fail-fast: false
15+
matrix:
16+
platform: [windows-latest, macos-latest, ubuntu-latest]
17+
18+
runs-on: ${{ matrix.platform }}
19+
20+
steps:
21+
- uses: actions/checkout@v2
22+
with:
23+
submodules: true
24+
25+
- uses: actions/setup-python@v2
26+
with:
27+
python-version: "3.8"
28+
29+
- name: Add requirements
30+
run: |
31+
python -m pip install --upgrade pip wheel setuptools jupyter
32+
33+
- name: Build and install
34+
run: pip install --use-feature=in-tree-build --verbose ".[test]"
35+
36+
- name: Unit test
37+
run: pytest tests/phik_python/test_phik.py -v
38+
39+
- name: Integration test
40+
run: pytest tests/phik_python/integration/test_notebooks.py -v

.github/workflows/test_matrix.yml

+6-2
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ on:
99

1010
jobs:
1111
build:
12+
name: ${{ matrix.platform }} Python ${{ matrix.python-version }}
1213
strategy:
1314
fail-fast: false
1415
matrix:
@@ -33,5 +34,8 @@ jobs:
3334
- name: Build and install
3435
run: pip install --verbose ".[test]"
3536

36-
- name: Test
37-
run: pytest
37+
- name: Unit test
38+
run: pytest tests/phik_python/test_phik.py -v
39+
40+
- name: Integration test
41+
run: pytest tests/phik_python/integration/test_notebooks.py -v

.github/workflows/wheels.yml

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ jobs:
5656
- name: Build wheel
5757
run: python -m cibuildwheel --output-dir wheelhouse
5858
env:
59+
CIBW_ENVIRONMENT: MACOSX_DEPLOYMENT_TARGET=10.13
5960
CIBW_BUILD: 'cp36-* cp37-* cp38-* cp39-*'
6061
CIBW_TEST_EXTRAS: test
6162
CIBW_TEST_COMMAND: pytest {project}/tests/phik_python/test_phik.py

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.so
2+
*egg-info*

CHANGES.rst

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
=============
2+
Release notes
3+
=============
4+
5+
Version 0.12.0, July 2021
6+
-------------------------
7+
8+
C++ Extension
9+
~~~~~~~~~~~~~
10+
11+
Phi_K contains an optional C++ extension to compute the significance matrix using the `hypergeometric` method
12+
(also called the`Patefield` method).
13+
14+
Note that the PyPi distributed wheels contain a pre-build extension for Linux, MacOS and Windows.
15+
16+
A manual (pip) setup will attempt to build and install the extension, if it fails it will install without the extension.
17+
If so, using the `hypergeometric` method without the extension will trigger a
18+
NotImplementedError.
19+
20+
Compiler requirements through Pybind11:
21+
22+
- Clang/LLVM 3.3 or newer (for Apple Xcode's clang, this is 5.0.0 or newer)
23+
- GCC 4.8 or newer
24+
- Microsoft Visual Studio 2015 Update 3 or newer
25+
- Intel classic C++ compiler 18 or newer (ICC 20.2 tested in CI)
26+
- Cygwin/GCC (previously tested on 2.5.1)
27+
- NVCC (CUDA 11.0 tested in CI)
28+
- NVIDIA PGI (20.9 tested in CI)
29+
30+
31+
Other
32+
~~~~~
33+
34+
* You can now manually set the number of parallel jobs in the evaluation of Phi_K or its statistical significance
35+
(when using MC simulations). For example, to use 4 parallel jobs do:
36+
37+
.. code-block:: python
38+
39+
df.phik_matrix(njobs = 4)
40+
df.significance_matrix(njobs = 4)
41+
42+
The default value is -1, in which case all available cores are used. When using ``njobs=1`` no parallel processing
43+
is applied.
44+
45+
* Phi_K can now be calculated with an independent expectation histogram:
46+
47+
.. code-block:: python
48+
49+
from phik.phik import phik_from_hist2d
50+
51+
cols = ["mileage", "car_size"]
52+
interval_cols = ["mileage"]
53+
54+
observed = df1[["feature1", "feature2"]].hist2d()
55+
expected = df2[["feature1", "feature2"]].hist2d()
56+
57+
phik_value = phik_from_hist2d(observed=observed, expected=expected)
58+
59+
The expected histogram is taken to be (relatively) large in number of counts
60+
compared with the observed histogram.
61+
62+
Or can compare two (pre-binned) datasets against each other directly. Again the expected dataset
63+
is assumed to be relatively large:
64+
65+
.. code-block:: python
66+
67+
from phik.phik import phik_observed_vs_expected_from_rebinned_df
68+
69+
phik_matrix = phik_observed_vs_expected_from_rebinned_df(df1_binned, df2_binned)
70+
71+
* Added links in the readme to the basic and advanced Phi_K tutorials on google colab.
72+
* Migrated the spark example Phi_K notebook from popmon to directly using histogrammar for histogram creation.
73+
74+
75+
76+
77+
Older versions
78+
--------------
79+
80+
* Please see documentation for full details: https://phik.readthedocs.io

MANIFEST.in

+5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
include NOTICE
22
include LICENSE
3+
include CMakeLists.txt
4+
include phik/simcore/CMakeLists.txt
5+
recursive-include phik *.hpp
6+
recursive-include phik *.cpp
37

48
global-include README.rst
9+
global-include CMakeLists.txt
510
global-exclude *.py[cod] __pycache__ *.so
611
exclude docs tests .readthedocs.yml
712
recursive-exclude tests *.py

README.rst

+23-38
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,40 @@
22
Phi_K Correlation Analyzer Library
33
==================================
44

5-
* Version: 0.11.2. Released: Mar 2021
6-
* Documentation: https://phik.readthedocs.io
5+
* Version: 0.12.0. Released: Jul 2021
6+
* Release notes: https://github.com/KaveIO/PhiK/blob/master/CHANGES.rst
77
* Repository: https://github.com/kaveio/phik
8+
* Documentation: https://phik.readthedocs.io
89
* Publication: `[offical] <https://www.sciencedirect.com/science/article/abs/pii/S0167947320301341>`_ `[arxiv pre-print] <https://arxiv.org/abs/1811.11440>`_
910

1011
Phi_K is a practical correlation constant that works consistently between categorical, ordinal and interval variables.
11-
It is based on several refinements to Pearson's hypothesis test of independence of two variables.
12+
It is based on several refinements to Pearson's hypothesis test of independence of two variables. Essentially, the
13+
contingency test statistic of two variables is interpreted as coming from a rotated bi-variate normal distribution,
14+
where the tilt is interpreted as Phi_K.
1215

1316
The combined features of Phi_K form an advantage over existing coefficients. First, it works consistently between categorical, ordinal and interval variables.
1417
Second, it captures non-linear dependency. Third, it reverts to the Pearson correlation coefficient in case of a bi-variate normal input distribution.
1518
These are useful features when studying the correlation matrix of variables with mixed types.
1619

17-
The presented algorithms are easy to use and available through this public Python library: the correlation analyzer package.
18-
Emphasis is paid to the proper evaluation of statistical significance of correlations and to the interpretation of variable relationships
20+
For details on the methodology behind the calculations, please see our publication. Emphasis is paid to the proper evaluation of statistical significance of correlations and to the interpretation of variable relationships
1921
in a contingency table, in particular in case of low statistics samples.
22+
The presented algorithms are easy to use and available through this public Python library.
2023

21-
For example, the Phi_K correlation analyzer package has been used to study surveys, insurance claims, correlograms, etc.
22-
For details on the methodology behind the calculations, please see our publication.
24+
Example notebooks
25+
=================
2326

27+
.. list-table::
28+
:widths: 60 40
29+
:header-rows: 1
30+
31+
* - Static link
32+
- Google Colab link
33+
* - `basic tutorial <https://nbviewer.jupyter.org/github/KaveIO/PhiK/blob/master/phik/notebooks/phik_tutorial_basic.ipynb>`_
34+
- `basic on colab <https://colab.research.google.com/github/KaveIO/PhiK/blob/master/phik/notebooks/phik_tutorial_basic.ipynb>`_
35+
* - `advanced tutorial (detailed configuration) <https://nbviewer.jupyter.org/github/KaveIO/PhiK/blob/master/phik/notebooks/phik_tutorial_advanced.ipynb>`_
36+
- `advanced on colab <https://colab.research.google.com/github/KaveIO/PhiK/blob/master/phik/notebooks/phik_tutorial_advanced.ipynb>`_
37+
* - `spark tutorial <https://nbviewer.jupyter.org/github/KaveIO/PhiK/blob/master/phik/notebooks/phik_tutorial_spark.ipynb>`_
38+
- no spark available
2439

2540
Documentation
2641
=============
@@ -29,7 +44,6 @@ The entire Phi_K documentation including tutorials can be found at `read-the-doc
2944
See the tutorials for detailed examples on how to run the code with pandas. We also have one example on how
3045
calculate the Phi_K correlation matrix for a spark dataframe.
3146

32-
3347
Check it out
3448
============
3549

@@ -56,35 +70,6 @@ You can now use the package in Python with:
5670
5771
**Congratulations, you are now ready to use the PhiK correlation analyzer library!**
5872

59-
Speedups
60-
--------
61-
62-
Phi_K can use the Numba JIT library for faster computation of certain operations.
63-
You can either install Numba separately or use the `numba` extra specifier while installing:
64-
65-
.. code-block:: bash
66-
67-
$ pip install phik[numba]
68-
69-
C++ Extension
70-
-------------
71-
72-
Phi_K contains an optional C++ extension to compute the significance matrix using the `hypergeometric` method.
73-
74-
Note that the PyPi distributed wheels contain a pre-build extension for Linux, MacOS and Windows.
75-
76-
The setup will attempt to build and install the extension, if it fails it will install without the extension.
77-
Using the `hypergeometric` method without the extension will trigger a NotImplementedError.
78-
79-
Compiler requirements through Pybind11:
80-
81-
- Clang/LLVM 3.3 or newer (for Apple Xcode's clang, this is 5.0.0 or newer)
82-
- GCC 4.8 or newer
83-
- Microsoft Visual Studio 2015 Update 3 or newer
84-
- Intel classic C++ compiler 18 or newer (ICC 20.2 tested in CI)
85-
- Cygwin/GCC (previously tested on 2.5.1)
86-
- NVCC (CUDA 11.0 tested in CI)
87-
- NVIDIA PGI (20.9 tested in CI)
8873

8974
Quick run
9075
=========
@@ -136,4 +121,4 @@ Contact and support
136121

137122
* Issues and Ideas: https://github.com/kaveio/phik/issues
138123

139-
Please note that KPMG provides support only on a best-effort basis.
124+
Please note that support is (only) provided on a best-effort basis.

docs/source/tutorials.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This section contains materials on how to use the Phi_K correlation analysis cod
66
There are additional side notes on how certain aspects work and where to find parts of the code.
77
For more in depth explanations on the functionality of the code-base, try the `API docs <phik_index.html>`_.
88

9-
The tutorials are available in the ``python/phik/notebooks`` directory. We have:
9+
The tutorials are available in the ``phik/notebooks`` directory. We have:
1010

1111
* A basic tutorial: this covers the basics of calculating Phi_K, the statistical significance, and interpreting the correlation.
1212
* An advanced tutorial: this shows how to use the advanced features of the ``PhiK`` library.

0 commit comments

Comments
 (0)