Skip to content

Commit 5b4698c

Browse files
authored
Merge pull request #232 from stan-dev/avehtari-patch-1
Some statistical terminology fixes
2 parents 52ce543 + d26a769 commit 5b4698c

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

jupyter/sum-to-zero/sum_to_zero_evaluation.qmd

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ The spatial models are taken from the a set of notebooks available from GitHub r
167167
In this section we consider a model which estimates per-demographic disease prevalence rates for a population.
168168
The model is taken from the Gelman and Carpenter, 2020
169169
[Bayesian Analysis of Tests with Unknown Specificity and Sensitivity](https://doi.org/10.1111/rssc.12435).
170-
It combines a model for multilevel regression and post-stratification with a likelihood that
170+
It combines a model for multilevel regression and post-stratification with a model that
171171
accounts for test sensitivity and specificity.
172172

173173
The data consists of:
@@ -221,7 +221,7 @@ transformed parameters {
221221
vector[N] p_sample = p * sens + (1 - p) * (1 - spec);
222222
}
223223
model {
224-
pos_tests ~ binomial(tests, p_sample); // likelihood
224+
pos_tests ~ binomial(tests, p_sample); // data model
225225
// scale normal priors on sum_to_zero_vectors
226226
beta_age ~ normal(0, s_age * sigma_age);
227227
beta_eth ~ normal(0, s_eth * sigma_eth);
@@ -240,7 +240,7 @@ In the `generated quantities` block we use Stan's PRNG functions to populate
240240
vthe true weights for the categorical coefficient vectors, and the relative percentages
241241
of per-category observations.
242242
Then we use a set of nested loops to generate the data for each demographic,
243-
using the PRNG equivalent of the model likelihood.
243+
using the PRNG equivalent of the data model.
244244

245245
The full data-generating program is in file [gen_binomial_4_preds.stan](https://github.com/stan-dev/example-models/tree/master/jupyter/sum-to-zero/stan/binomial_4_preds_ozs.stan).
246246
The helper function `simulate_data` in file `utils.py` sets up the data-generating program
@@ -249,7 +249,7 @@ observations per category, and baseline disease prevalence, test specificity and
249249
This allows us to create datasets for large and small populations
250250
and for finer or more coarse-grained sets of categories.
251251
The larger the number of strata overall, the more observations are needed to get good coverage.
252-
Because the modeled data `pos_tests` is generated according to the Stan model's likelihood,
252+
Because the modeled data `pos_tests` is generated according to the Stan model,
253253
the model is a priori well-specified with respect to the data.
254254

255255

@@ -473,15 +473,15 @@ In the case of lots of observations and only a few categories they do a better j
473473
* In almost all cases, estimates for each parameter are the same across implementations to 2 significant figures.
474474
In a few cases they are off by 0.01; where they are off, the percentage of observations for that parameter is correspondingly low.
475475

476-
* The `sum_to_zero_vector` implementation has the highest number of effective samples per second,
476+
* The `sum_to_zero_vector` implementation has the highest effective sample size per second,
477477
excepting a few individual parameters for which the hard sum-to-zero performs equally well.
478478

479479

480-
#### Model efficiency
480+
#### Sampling efficiency
481481

482-
Model efficiency is measured by iterations per second, however, as the draws from the MCMC sampler
483-
may be correlated, we need to compute the number of effective samples across all chains
484-
divided by the total sampling time - this is *ESS_bulk/s*, the effective samples per second.
482+
Sampling efficiency is measured by iterations per second, however, as the draws from the MCMC sampler
483+
may be correlated, we need to compute the effective sample size across all chains
484+
divided by the total sampling time - this is *ESS_bulk/s*, the effective sample size per second.
485485
The following table shows the average runtime for 100 runs
486486
of each of the three models on large and small datasets.
487487
This data was generated by script `eval_efficiencies.py`.
@@ -784,7 +784,7 @@ brklyn_qns_data = {"N":brklyn_qns_gdf.shape[0],
784784

785785
#### Model fitting
786786

787-
The BYM2 model requires many warmup iterations in order to reach convergence for all parameters,
787+
The BYM2 model requires many warmup iterations in order to MCMC to converge for all parameters,
788788
including hyperparameters `rho` and `sigma`.
789789
We run all three models using the same seed, in order to make the initial parameters as similar
790790
as possible.

0 commit comments

Comments
 (0)