-
Notifications
You must be signed in to change notification settings - Fork 1
Benchmarks Branch [DO NOT MERGE] #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nto benchmarks
…nto benchmarks
…nto benchmarks
I'm happy to take a look at this, but do you think it would best belong in a different repository? |
I'm keeping it on the `benchmarks` branch. I just figured the easiest way to look at that would be through a PR. I'll put a do-not-merge warning on it.
|
Also, not sure why this would cause a failure as I didn't plug in any new builds or new tests. |
It looks like you added the stationarity example to cmake, but it’s missing a #include it needs |
Thanks---I hadn't looked into the cause, but I just added the include, and hopefully that will fix it. |
examples/eval_stan.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bob-carpenter -- you might want to check out #26, which has a target very similar to this with a few more features already
std::string sample_csv_file_numbered = prefix + "-walnuts-draws-" + std::to_string(trial) + ".csv"; | ||
test_adaptive_walnuts(stan_model, sample_csv_file_numbered, trial_seed, iter_warmup, iter_sampling); | ||
} | ||
std::quick_exit(0); // crashes without this---not stan_model dtor, prob dlclose_deleter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, does the preexisting example-stan program crash at exit as well? This is probably isn't relevant for the purposes of evaluation, but could be important for later python bindings etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be related to roualdes/bridgestan#111
Also, when stan is compiled using zig cc
it also segfaults on exit, and since zig adds some flags to add traps to avoid undefined behavior, maybe that is also related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems likely. We should probably just leak the handle for now, and try to figure out roualdes/bridgestan#111. My guess is we need a bs_finalize()
function that calls tbb::finalize()
and maybe also some math library specific stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things that stood out to me, mostly focusing on the files mentioned in "III. Gradients until within error bound"
|
||
print("Sampling") | ||
print(f"{0:5d}. min(ESS) = {0:.1e}") | ||
for b in range(1, max_blocks + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this as currently coded isn't much different from one big run in terms of disk usage -- cmdstanpy cleans up it's temporary directory at exit not when the fit object is gc'd.
To fix, you could use the TemporaryDirectory context manager and the output_dir argument to sample
da = xr.DataArray(a[np.newaxis, :, :], dims=("chain", "draw", "var")) | ||
ds = az.ess(da, method="bulk") | ||
data_var = next(iter(ds.data_vars)) | ||
vec = ds[data_var].values | ||
return np.asarray(np.squeeze(vec), dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty hard to read/verify it's doing the right thing without manually running it a few times. Assuming you're confident in it that's probably fine, but just wanted to flag it
DynamicStanModel stan_model(model_so_file_c_str, data_json_file_c_str, seed); | ||
for (int trial = 0; trial < trials; ++trial) { | ||
std::cout << "trial = " << trial << std::endl; | ||
unsigned int trial_seed = seed + static_cast<unsigned int>(17 * (trial + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure it matters, but curious why you're doing seed + 17 * trial here but seed + trial in eval-nuts.py?
for (Integer n = 0; n < iter_sampling; ++n) { | ||
auto draw = sampler(); | ||
model.constrain_draw(draw, draws.col(n)); | ||
lp_grads[static_cast<std::size_t>(n)] = logp_grad_calls; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why these static casts are necessary, and why you wouldn't just declare the loop as for(size_t n ...
instead
* `examples/stan-warmup.py` | ||
|
||
|
||
## III. Gradients until within error bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend collecting these steps into one bash or python script at some point to make running an experiment a bit less manual
…nto benchmarks
I coded up something to run a Stan program until the ESS for E[theta], E[theta^2], E[theta^4] are all above a given threshold, where
theta
includes all the parameters and the log densitylp__
. It just iterates blocks of draws, and keeps a running sum of ESS values and a running average. It then dumps the results in scientific notation in JSON to 10 decimal places, which is more than we should need for our test cases. Ideally, we'd dump to within a number of standard deviations from the mean, but I think this should be OK unless we have examples with small very small standard deviations relative to their means.I largely used ChatGPT to code this, so more than happy to get feedback on the Python.
Here's the output for a 10-dimensional normal target with scales 1 to 10.
It produces this JSON output: