Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing doctest #428

Closed
drbenvincent opened this issue Nov 19, 2024 · 5 comments · Fixed by #429
Closed

Fix failing doctest #428

drbenvincent opened this issue Nov 19, 2024 · 5 comments · Fixed by #429
Labels
bug Something isn't working

Comments

@drbenvincent
Copy link
Collaborator

With pymc 5.18.2, and pytensor 2.26.3 the doctest for PyMCModel is failing

class PyMCModel(pm.Model):
"""A wrapper class for PyMC models. This provides a scikit-learn like interface with
methods like `fit`, `predict`, and `score`. It also provides other methods which are
useful for causal inference.
Example
-------
>>> import causalpy as cp
>>> import numpy as np
>>> import pymc as pm
>>> from causalpy.pymc_models import PyMCModel
>>> class MyToyModel(PyMCModel):
... def build_model(self, X, y, coords):
... with self:
... X_ = pm.Data(name="X", value=X)
... y_ = pm.Data(name="y", value=y)
... beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1])
... sigma = pm.HalfNormal("sigma", sigma=1)
... mu = pm.Deterministic("mu", pm.math.dot(X_, beta))
... pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_)
>>> rng = np.random.default_rng(seed=42)
>>> X = rng.normal(loc=0, scale=1, size=(20, 2))
>>> y = rng.normal(loc=0, scale=1, size=(20,))
>>> model = MyToyModel(
... sample_kwargs={
... "chains": 2,
... "draws": 2000,
... "progressbar": False,
... "random_seed": rng,
... }
... )
>>> model.fit(X, y)
Inference data...
>>> X_new = rng.normal(loc=0, scale=1, size=(20,2))
>>> model.predict(X_new)
Inference data...
>>> model.score(X, y)
r2 0.390344
r2_std 0.081135
dtype: float64

@drbenvincent drbenvincent added the bug Something isn't working label Nov 19, 2024
@drbenvincent
Copy link
Collaborator Author

Calling model.score(X, y) multiple times results in different results each time

@drbenvincent
Copy link
Collaborator Author

drbenvincent commented Nov 19, 2024

model.predict(X) is not generating identical results each time it's called, despite the random_seed being passed as kwarg to pm.sample_posterior_predictive

@drbenvincent
Copy link
Collaborator Author

Every time we call model.score(X, y), it calls model.predict(X). This function generates posterior predictive samples of both mu and y_hat. It is y_hat that is used and this incorporates observation noise from the likelihood term.

So yes, every time we call model.score(X, y) we will get slightly different results, reflecting the observation noise of the likelihood.

Does it makes more sense for calculating goodness of fit to be based on the expectation, mu, rather than the noise-corrupted y_hat values.

@drbenvincent
Copy link
Collaborator Author

I've tried changing the score method to operate on the posterior mean:

def score(self, X, y) -> pd.Series:
    mu = self.predict(X)
    mu = az.extract(mu, group="posterior_predictive", var_names="mu").T.values
    return r2_score(y.flatten(), mu)

If I run this

import causalpy as cp
import numpy as np
import pymc as pm
from causalpy.pymc_models import PyMCModel

class MyToyModel(PyMCModel):
    def build_model(self, X, y, coords):
        with self:
            X_ = pm.Data(name="X", value=X)
            y_ = pm.Data(name="y", value=y)
            beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1])
            sigma = pm.HalfNormal("sigma", sigma=1)
            mu = pm.Deterministic("mu", pm.math.dot(X_, beta))
            pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_)
            
rng = np.random.default_rng(seed=42)
X = rng.normal(loc=0, scale=1, size=(20, 2))
y = rng.normal(loc=0, scale=1, size=(20,))

model = MyToyModel(
    sample_kwargs={
        "chains": 2,
        "draws": 2000,
        "progressbar": False,
        "random_seed": rng,
    }
)

model.fit(X, y)
model.score(X, y)

and repeatedly call model.score(X, y), then I get the same outputs.

However if I run the whole block multiple times then I get different outputs for model.score(X, y) each time.

@drbenvincent
Copy link
Collaborator Author

It works when you provide an integer seed in the sample_kwargs, but not when you provide the rng.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant