-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Compression complexity causality #162
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
===========================================
+ Coverage 53.75% 75.60% +21.85%
===========================================
Files 154 48 -106
Lines 4757 1025 -3732
===========================================
- Hits 2557 775 -1782
+ Misses 2200 250 -1950
... and 190 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
perhaps @AditiKathpalia is interested to review this PR? |
(I'm super swamped at the moment, but will return here in a couple of months) |
That would be awesome! If you're interested @AditiKathpalia, let me know, and I'll tag you here once it's ready for review. |
* Fix argument name Doc wording Simplify univariate ETC Print statements commented out Remove unneeded tests Fix tests Remove tests from scripts * Simplify, and add joint ETC Clean code and tests Function signature * Update docs * Sliding window joint ETC
…pression_complexity
Yes, sure, I will review it. Thanks for implementing this. |
Hey @AditiKathpalia! I'm trying to implement the full CCC algorithm, but when trying to compare my results with the implementation in the MATLAB toolbox from the supplement of the paper, I'm not able to run the examples in the code. The documentation for the I also tried to follow the instructions in the function documentation and run an example myself: N = 1000
INFO_CCC = struct('L',150,'w',15,'delta',80,'B',2,'N',N)
[y1, y2] = coupled_AR(N);
[CCC_value, k] = conditional_CCC([y1; y2], 1, 2, INFO_CCC) but get the error Index in position 2 exceeds array bounds. Index must not exceed 1.
Error in conditional_CCC (line 61)
ETC_ini(k)=ETC(data_reduced(:,i:i+LEN_past-1),Num_bins); I could spend more time deciphering the MATLAB code and figuring out what's wrong (or what I'm misunderstanding in the documentation string). However, given that there are files missing from the toolbox, I think it would be much faster if you provided an up-to-date, working example that I can use as a basis for comparison with my implementation. Do you have an example of a simple You can just post code snippet in a comment here, and I'll figure out the rest. EDIT: By the way, I'm running MATLAB version |
What is this PR?
This PR implements the effort-to-compress algorithm that underlies the Compression-Complexity Causality (CCC) in #161.
It also implements the CCC algorithm in #161.
TODO
Interface for compression complexity
I've decided on a dispatch-on-algorithm approach to allow a cleaner interface in the case that future compression complexity measures (e.g. Lempel-Siv complexity) are added.
The syntax is
compression_complexity(x, algorithm::CompressionComplexityAlgorithm)
.Input time series
x
(andy
, for joint estimation) must always be pre-symbolized.Effort-to-compress (ETC)
I've done an implementation of the ETC algorithm from scratch, based on the Nagaraj (2013; for single time series) and Kathpalia & Nagaraj (2019; for joint estimation). ETC is the only compression complexity measure included in this PR.
The syntax is:
compression_complexity(x, EffortToCompress())
. Computed on a single univariate (AbstractVector{Integer}
) or multivariate (AbstractDataset{D, Integer}
) pre-symbolized time seriesx
.compression_complexity(x, y, EffortToCompress())
. Also computed on pre-binned time seriesx
andy
.See the tests for some examples.
Performance
My initial implementation of ETC is pretty naive - it uses dictionaries for histogram estimation. The algorithm can probably be optimised substantially by using pre-allocated arrays and clever indexing during the recursion step. However, at this stage I'm keeping it simple, just to have a working algorithm. I haven't looked at the MATLAB/Python implementations yet, so not sure what the speed-ups could be. This will have to be tested at some point.
Testing
Tests are based on examples from the papers linked in #161, and by comparison to results from the MATLAB scripts provided in the references given in the original papers. Additional tests for obvious edge cases and examples have also been added to the test suite.
All tests pass, so I'm fairly sure the implementation is correct.
Below is a reproduction of figure 3 in Nagaraj (2013), which compares the (approximate) Lyapunov exponent and compression complexity (ETC) of 200-pt long logistic map time series with varying bifurcation parameters. For ETC, the time series was symbolized to a binary time series by splitting above and below
x = 0.5
. Lyapunov exponents are estimated directly on the time series.