Fix failing doctest #428

drbenvincent · 2024-11-19T11:16:39Z

With pymc 5.18.2, and pytensor 2.26.3 the doctest for PyMCModel is failing

Lines 29 to 68 in 142694f

    
           class PyMCModel(pm.Model): 
        
               """A wrapper class for PyMC models. This provides a scikit-learn like interface with 
        
               methods like `fit`, `predict`, and `score`. It also provides other methods which are 
        
               useful for causal inference. 
        
               Example 
        
               ------- 
        
               >>> import causalpy as cp 
        
               >>> import numpy as np 
        
               >>> import pymc as pm 
        
               >>> from causalpy.pymc_models import PyMCModel 
        
               >>> class MyToyModel(PyMCModel): 
        
               ...     def build_model(self, X, y, coords): 
        
               ...         with self: 
        
               ...             X_ = pm.Data(name="X", value=X) 
        
               ...             y_ = pm.Data(name="y", value=y) 
        
               ...             beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1]) 
        
               ...             sigma = pm.HalfNormal("sigma", sigma=1) 
        
               ...             mu = pm.Deterministic("mu", pm.math.dot(X_, beta)) 
        
               ...             pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_) 
        
               >>> rng = np.random.default_rng(seed=42) 
        
               >>> X = rng.normal(loc=0, scale=1, size=(20, 2)) 
        
               >>> y = rng.normal(loc=0, scale=1, size=(20,)) 
        
               >>> model = MyToyModel( 
        
               ...             sample_kwargs={ 
        
               ...                 "chains": 2, 
        
               ...                 "draws": 2000, 
        
               ...                 "progressbar": False, 
        
               ...                 "random_seed": rng, 
        
               ...             } 
        
               ... ) 
        
               >>> model.fit(X, y) 
        
               Inference data... 
        
               >>> X_new = rng.normal(loc=0, scale=1, size=(20,2)) 
        
               >>> model.predict(X_new) 
        
               Inference data... 
        
               >>> model.score(X, y) 
        
               r2        0.390344 
        
               r2_std    0.081135 
        
               dtype: float64

The text was updated successfully, but these errors were encountered:

drbenvincent · 2024-11-19T11:34:36Z

Calling model.score(X, y) multiple times results in different results each time

drbenvincent · 2024-11-19T11:37:32Z

model.predict(X) is not generating identical results each time it's called, despite the random_seed being passed as kwarg to pm.sample_posterior_predictive

drbenvincent · 2024-11-19T11:50:34Z

Every time we call model.score(X, y), it calls model.predict(X). This function generates posterior predictive samples of both mu and y_hat. It is y_hat that is used and this incorporates observation noise from the likelihood term.

So yes, every time we call model.score(X, y) we will get slightly different results, reflecting the observation noise of the likelihood.

Does it makes more sense for calculating goodness of fit to be based on the expectation, mu, rather than the noise-corrupted y_hat values.

drbenvincent · 2024-11-19T13:31:08Z

I've tried changing the score method to operate on the posterior mean:

def score(self, X, y) -> pd.Series:
    mu = self.predict(X)
    mu = az.extract(mu, group="posterior_predictive", var_names="mu").T.values
    return r2_score(y.flatten(), mu)

If I run this

import causalpy as cp
import numpy as np
import pymc as pm
from causalpy.pymc_models import PyMCModel

class MyToyModel(PyMCModel):
    def build_model(self, X, y, coords):
        with self:
            X_ = pm.Data(name="X", value=X)
            y_ = pm.Data(name="y", value=y)
            beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1])
            sigma = pm.HalfNormal("sigma", sigma=1)
            mu = pm.Deterministic("mu", pm.math.dot(X_, beta))
            pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_)
            
rng = np.random.default_rng(seed=42)
X = rng.normal(loc=0, scale=1, size=(20, 2))
y = rng.normal(loc=0, scale=1, size=(20,))

model = MyToyModel(
    sample_kwargs={
        "chains": 2,
        "draws": 2000,
        "progressbar": False,
        "random_seed": rng,
    }
)

model.fit(X, y)
model.score(X, y)

and repeatedly call model.score(X, y), then I get the same outputs.

However if I run the whole block multiple times then I get different outputs for model.score(X, y) each time.

drbenvincent · 2024-11-19T13:46:51Z

It works when you provide an integer seed in the sample_kwargs, but not when you provide the rng.

drbenvincent added the bug Something isn't working label Nov 19, 2024

drbenvincent mentioned this issue Nov 19, 2024

Calculate goodness of fit (r2_score) based on posterior expectation rather than posterior prediction #429

Merged

drbenvincent closed this as completed in #429 Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix failing doctest #428

Fix failing doctest #428

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024 •

edited

Loading

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024

Fix failing doctest #428

Fix failing doctest #428

Comments

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024 • edited Loading

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024

drbenvincent commented Nov 19, 2024 •

edited

Loading