You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pos_tests ~ binomial(tests, p_sample); // data model
225
225
// scale normal priors on sum_to_zero_vectors
226
226
beta_age ~ normal(0, s_age * sigma_age);
227
227
beta_eth ~ normal(0, s_eth * sigma_eth);
@@ -240,7 +240,7 @@ In the `generated quantities` block we use Stan's PRNG functions to populate
240
240
vthe true weights for the categorical coefficient vectors, and the relative percentages
241
241
of per-category observations.
242
242
Then we use a set of nested loops to generate the data for each demographic,
243
-
using the PRNG equivalent of the model likelihood.
243
+
using the PRNG equivalent of the data model.
244
244
245
245
The full data-generating program is in file [gen_binomial_4_preds.stan](https://github.com/stan-dev/example-models/tree/master/jupyter/sum-to-zero/stan/binomial_4_preds_ozs.stan).
246
246
The helper function `simulate_data` in file `utils.py` sets up the data-generating program
@@ -249,7 +249,7 @@ observations per category, and baseline disease prevalence, test specificity and
249
249
This allows us to create datasets for large and small populations
250
250
and for finer or more coarse-grained sets of categories.
251
251
The larger the number of strata overall, the more observations are needed to get good coverage.
252
-
Because the modeled data `pos_tests` is generated according to the Stan model's likelihood,
252
+
Because the modeled data `pos_tests` is generated according to the Stan model,
253
253
the model is a priori well-specified with respect to the data.
254
254
255
255
@@ -473,15 +473,15 @@ In the case of lots of observations and only a few categories they do a better j
473
473
* In almost all cases, estimates for each parameter are the same across implementations to 2 significant figures.
474
474
In a few cases they are off by 0.01; where they are off, the percentage of observations for that parameter is correspondingly low.
475
475
476
-
* The `sum_to_zero_vector` implementation has the highest number of effective samples per second,
476
+
* The `sum_to_zero_vector` implementation has the highest effective sample size per second,
477
477
excepting a few individual parameters for which the hard sum-to-zero performs equally well.
478
478
479
479
480
-
#### Model efficiency
480
+
#### Sampling efficiency
481
481
482
-
Model efficiency is measured by iterations per second, however, as the draws from the MCMC sampler
483
-
may be correlated, we need to compute the number of effective samples across all chains
484
-
divided by the total sampling time - this is *ESS_bulk/s*, the effective samples per second.
482
+
Sampling efficiency is measured by iterations per second, however, as the draws from the MCMC sampler
483
+
may be correlated, we need to compute the effective sample size across all chains
484
+
divided by the total sampling time - this is *ESS_bulk/s*, the effective sample size per second.
485
485
The following table shows the average runtime for 100 runs
486
486
of each of the three models on large and small datasets.
487
487
This data was generated by script `eval_efficiencies.py`.
0 commit comments