|
1 | 1 | # scipy-maxentropy
|
2 | 2 |
|
3 |
| -The maxentropy package that was previously available as scipy.maxentropy prior |
4 |
| -to SciPy v0.11.0. |
| 3 | +================================================ |
| 4 | +Maximum entropy models (:mod:`scipy_maxentropy`) |
| 5 | +================================================ |
| 6 | + |
| 7 | +This is the former `scipy.maxentropy` package that was available in SciPy up to |
| 8 | +version 0.10.1. It was then removed in SciPy 0.11. It is now available as a |
| 9 | +separate package on PyPI for backward compatibility. |
| 10 | + |
| 11 | +For new projects, consider the `maxentropy` package instead, which offers a more |
| 12 | +modern scikit-learn compatible API. |
| 13 | + |
| 14 | +## Purpose |
| 15 | + |
| 16 | +This package fits "exponential family" models, including models of maximum |
| 17 | +entropy and minimum KL divergence to other models, subject to linear constraints |
| 18 | +on the expectations of arbitrary feature statistics. Applications include |
| 19 | +language models for natural language processing and understanding, machine |
| 20 | +translation, etc. Another application is environmental species modelling. |
| 21 | + |
| 22 | +## Quickstart |
| 23 | + |
| 24 | +Here is a quick usage example based on the trivial machine translation example |
| 25 | +from the paper 'A maximum entropy approach to natural language processing' by |
| 26 | +Berger et al., Computational Linguistics, 1996. |
| 27 | + |
| 28 | +Consider the translation of the English word 'in' into French. Assume we notice |
| 29 | +in a corpus of parallel texts the following facts: |
| 30 | + |
| 31 | + (1) p(dans) + p(en) + p(à) + p(au cours de) + p(pendant) = 1 |
| 32 | + (2) p(dans) + p(en) = 3/10 |
| 33 | + (3) p(dans) + p(à) = 1/2 |
| 34 | + |
| 35 | +This code finds the probability distribution with maximal entropy subject to |
| 36 | +these constraints. |
| 37 | + |
| 38 | +```python |
| 39 | +from scipy_maxentropy import Model # previously scipy.maxentropy |
| 40 | + |
| 41 | +samplespace = ['dans', 'en', 'à', 'au cours de', 'pendant'] |
| 42 | + |
| 43 | +def f0(x): |
| 44 | + return x in samplespace |
| 45 | + |
| 46 | +def f1(x): |
| 47 | + return x=='dans' or x=='en' |
| 48 | + |
| 49 | +def f2(x): |
| 50 | + return x=='dans' or x=='à' |
| 51 | + |
| 52 | +f = [f0, f1, f2] |
| 53 | + |
| 54 | +model = Model(f, samplespace) |
| 55 | + |
| 56 | +# Now set the desired feature expectations |
| 57 | +K = [1.0, 0.3, 0.5] |
| 58 | + |
| 59 | +model.verbose = False # set to True to show optimization progress |
| 60 | + |
| 61 | +# Fit the model |
| 62 | +model.fit(K) |
| 63 | + |
| 64 | +# Output the distribution |
| 65 | +print() |
| 66 | +print("Fitted model parameters are:\n" + str(model.params)) |
| 67 | +print() |
| 68 | +print("Fitted distribution is:") |
| 69 | +p = model.probdist() |
| 70 | +for j in range(len(model.samplespace)): |
| 71 | + x = model.samplespace[j] |
| 72 | + print("\tx = %-15s" %(x + ":",) + " p(x) = "+str(p[j])) |
| 73 | + |
| 74 | +# Now show how well the constraints are satisfied: |
| 75 | +print() |
| 76 | +print("Desired constraints:") |
| 77 | +print("\tp['dans'] + p['en'] = 0.3") |
| 78 | +print("\tp['dans'] + p['à'] = 0.5") |
| 79 | +print() |
| 80 | +print("Actual expectations under the fitted model:") |
| 81 | +print(f"\tp['dans'] + p['en'] = {p[0] + p[1]}") |
| 82 | +print(f"\tp['dans'] + p['à'] = {p[0] + p[2]}") |
| 83 | +``` |
| 84 | + |
| 85 | +## Models available |
| 86 | + |
| 87 | +These model classes are available: |
| 88 | +- `scipy_maxentropy.Model`: for models on discrete, enumerable sample spaces |
| 89 | +- `scipy_maxentropy.ConditionalModel`: for conditional models on discrete, enumerable sample spaces |
| 90 | +- `scipy_maxentropy.BigModel`: for models on sample spaces that are either continuous (and |
| 91 | +perhaps high-dimensional) or discrete but too large to enumerate, like all possible |
| 92 | +sentences in a natural language. This model uses conditional Monte Carlo methods |
| 93 | +(primarily importance sampling). |
| 94 | + |
| 95 | +## Background |
| 96 | + |
| 97 | +This package fits probabilistic models of the following exponential form: |
| 98 | + |
| 99 | +$$ |
| 100 | + p(x) = p_0(x) \exp(\theta^T f(x)) / Z(\theta; p_0) |
| 101 | +$$ |
| 102 | + |
| 103 | +with a real parameter vector $\theta$ of the same length $n$ as the feature |
| 104 | +statistics $f(x) = [f_1(x), ..., f_n(x)]$. |
| 105 | + |
| 106 | +This is the "closest" model (in the sense of Kullback's discrimination |
| 107 | +information or relative entropy) to the prior model $p_0$ subject to the |
| 108 | +following additional constraints on the expectations of the features: |
| 109 | + |
| 110 | +``` |
| 111 | + E f_1(X) = b_1 |
| 112 | + ... |
| 113 | + E f_n(X) = b_n |
| 114 | +``` |
| 115 | + |
| 116 | +for some constants $b_i$, such as statistics estimated from a dataset. |
| 117 | + |
| 118 | +In the special case where $p_0$ is the uniform distribution, this is the |
| 119 | +"flattest" model subject to the constraints, in the sense of having **maximum |
| 120 | +entropy**. |
| 121 | + |
| 122 | +For more background, see, for example, Cover and Thomas (1991), *Elements of |
| 123 | +Information Theory*. |
| 124 | + |
| 125 | + |
0 commit comments