-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNon-Parametric-Effect-Sizes.qmd
378 lines (256 loc) · 14.8 KB
/
Non-Parametric-Effect-Sizes.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
# Non-Parametric Tests
Sometimes the assumptions of parametric models (e.g., normality of model residuals) are suspect. This is often the case in psychology when using ordinal scales. In these cases a "non-parametric" approach may be helpful. A statistical test being non-parametric means that the parameters (i.e., mean and variance for "normal" Gaussian model) are not estimated; despite popular belief the data themselves are *never* non-parametric. Additionally, these tests are *not* tests of the median [@divine2018wilcoxon]. Rather one can consider than as rank based or proportional odds tests. If the scores you are analyzing are not metric (i.e., ordinal) due to the use of a Likert-Scale and you still use parametric tests such as t-tests, you run the risk of a high false-positive probability (e.g., @liddell2018).
If the scores you are analyzing are not metric (i.e., ordinal) due to the use of a Likert scale and you still use parametric tests such as t-tests, you run the risk of a high false-positive probability (e.g., Liddel & Kruschke, 2018). Note that in German, scale anchors have been developed that are very similar to Likert scale but can be interpreted as metric (e.g., Rohrmann, 1978).
We will briefly discuss here two groups of tests that can be applied to the independent and paired samples then present 3 effect sizes that can accompany these tests as well as their calculations and examples in R.
#### **Here is a table for every effect size discussed in this chapter:** {.unnumbered}
| Type | Description | Section|
|:------|:---------|:----:|
| **Rank-Biserial Correlation** | | @sec-rrb|
| $r_{rb}$ (dependent groups) - Rank-biserial correlation on dependent groups | A measure of dominance between dependent groups (i.e., repeated measure designs). | @sec-rrb-dep|
| $r_{rb}$ (independent groups) - Rank Biserial Correlation on independent groups | A measure of dominance between two independent groups. | @sec-rrb-dep|
| **Concordance Probability** | | @sec-conc-prob |
| $p_c$ - Concordance probability | A simple transformation of the rank-biserial correlation and it represents the probability of superiority in one group relative to the other group. This section shows R code for both independent and dependent samples. | @sec-conc-prob |
| **Wilcoxon-Mann-Whitney Odds** | | @sec-odds |
| $O_{WMW}$ - Wilcoxon-Mann-Whitney Odds | Also known as the Generalized Odds Ratio, it transforms the concordance probability to an Odds Ratio. This section shows R code for both independent and dependent samples. | @sec-odds |
## Wilcoxon-Mann-Whitney tests
A non-parametric alternative to the t-test is the Wilcoxon-Mann-Whitney (WMW) group of tests. When comparing two independent samples this is called a Wilcoxon rank-sum test, but sometimes referred to as a Mann-Whitney U Test. When using it on paired samples, or one sample, it is a signed rank test. These are generally referred to as tests of "symmetry" [@divine2018wilcoxon].
```{r,echo=TRUE, warning=FALSE,message=FALSE}
# Paired samples ----
data(sleep)
# wilcoxon test
wilcox.test(extra ~ group,
data = sleep,
paired = TRUE)
# Two Sample ------
# data import from likert
data(mass, package = "likert")
df_mass = mass |>
as.data.frame() |>
janitor::clean_names()
# function needs input as a numeric
# ordered factors can be converted to ranks
# Again, the warning can be ignored
wilcox.test(rank(math_relates_to_my_life) ~ gender,
data = df_mass)
```
## Brunner-Munzel Tests
Brunner-Munzel's tests can be used instead of the WMW tests. The primary reason is the interpretation of the test [@munzel2002; @brunner2000; @neubert2007]. Recently, @karch2021psychologists argued that the Mann-Whitney test is not a decent test of equality of medians, distributions or stochastic equality. The Brunner-Munzel test, on the other hand, provides a sensible approach to test for stochastic equality.
The Brunner-Munzel tests measure a rank based "relative effect" or "stochastic superiority probability". The test statistic ($\hat p$) is essentially the probability of a value in one condition being greater than other while splitting the ties[^8]. However, Brunner-Munzel tests can not be applied to the single group or one-sample designs.
[^8]: Note, for paired samples, this does not refer to the probability of an increase/decrease in paired sample but rather the probability that a randomly sampled value of X will be greater/less than Y. This is also referred to as the "relative" effect in the literature. Therefore, the results will differ from the concordance probability.
$$
\hat{p} = P(X<Y)+ \frac{1}{2} \cdot P(X=Y)
$$
These tests are relatively new so there are very few packages offer Brunner-Munzel. Moreover, @karch2021psychologists argues that the stochastic superiority effect size ($\hat{p}$) offers a nuanced way to interpret group differences by visualizing observations as competitors in a contest. Propounded by scholars like @cliff1993dominance and @divine2018wilcoxon, it views each observation from one group in a duel with every observation from another. If an observation from the first group surpasses its counterpart, it "wins," and the group garners a point; tied observations yield half a point to each group. This concept can be further elucidated through a bubble plot, where placement above, below, or on the diagonal indicates the dominance of one group's observation over the other. Other interpretations, like transforming p to the Wilcoxon-Mann-Whitney (WMW) odds or Cliff's δ offer deeper insights. There are implementations of the Brunner-Munzel test in a few packages in R (i.e. `lawstat`, `rankFD`, and `brunnermunzel`). @karch2021psychologists recommends the `brunnermunzel.permutation.test` function from the `brunnermunzel` package. The `TOSTER` R package can also provide coverage [@TOSTER; @caldwell].
```{r,echo=TRUE, warning=FALSE,message=FALSE}
# Install package for data cleaning
# install.packages('janitor')
library(janitor)
# Paired samples
library(TOSTER)
data(sleep)
# When sample sizes are small
# a permutation version should be used.
# When this is done a seed should be set.
set.seed(2124)
brunner_munzel(extra ~ group,
data = sleep,
paired = TRUE,
perm = TRUE)
# Two Sample
# data import from likert
data(mass, package = "likert")
df_mass = mass |>
as.data.frame() |>
clean_names()
# function needs input as a numeric
# ordered factors can be converted to ranks
# Again, the warning can be ignored
set.seed(24111)
TOSTER::brunner_munzel(
rank(math_relates_to_my_life) ~ gender,
data = df_mass,
paired = FALSE,
perm = TRUE
)
```
## Rank-Based Effect Sizes
Since the mean and standard deviation are not estimated for a WMW or Brunner-Munzel test, it would be inappropriate to present a standardized mean difference (e.g., Cohen's d) to accompany these tests. Instead, a rank based effect size (i.e., based on the ranks of the observed values) can be reported to accompany the non-parametric statistical tests.
### Rank-Biserial Correlation {#sec-rrb}
The rank-biserial correlation ($r_{rb}$) is considered a measure of dominance. The correlation represents the difference between the proportion of favorable and unfavorable pairs or signed ranks. Larger values indicate that more of $X$ is larger than more of $Y$, with a value of (−1) indicates that all observations in the second, $Y$, group are larger than the first, $X$, group, and a value of (+1) indicates that all observations in the first group are larger than the second.
#### Dependent Groups {#sec-rrb-dep}
1. Calculate difference scores between pairs:
$$
D = X_2 - X_1
$$
2. Calculate the positive and negative rank sums:
$$
\text{When } D_i>0,\;\; R_{\oplus} = \sum_{i=1} -1\cdot \text{sign}(D_i) \cdot \text{rank}(|D_i|)
$$
$$
\text{When } D_i<0,\;\; R_{\ominus} = \sum_{i=1} -1\cdot \text{sign}(D_i) \cdot \text{rank}(|D_i|)
$$
3. We can set a constant, $H$, to be -1 when the rank positive rank sum is greater than or equal to the negative rank sum ($R_{\oplus} \ge R_{\ominus}$) or we can set $H$ to 1 when the rank positive rank sum is less than the negative rank sum ($R_{\oplus} < R_{\ominus}$).
$$
H = \begin{cases} -1 & R_{\oplus} \ge R_{\ominus} \\ 1 & R_{\oplus} < R_{\ominus} \end{cases}
$$
4. Calculate rank-biserial correlation:
$$
r_{rb} = 4H\times \left| \frac{\min( R_{\oplus}, R_{\ominus}) - .5\times ( R_{\oplus} + R_{\ominus})}{n(n + 1)} \right|
$$
5. For paired samples, or one sample, the standard error is calculated as the following:
$$
SE_{r_{rb}} = \sqrt{ \frac {2(2 n^3 + 3 n^2 + n)} {6(n^2 + n)} }
$$
6. The confidence intervals can then be calculated by Z-transforming the correlation.
$$
Z_{rb} = \text{arctanh}(r_{rb})
$$
7. Calculate the standard error of the Z-transformed correlation
$$
SE_{Z_{rb}} = \frac{SE_{r_{rb}}}{1-r_{rb}^2}
$$
8. Then the confidence interval can be calculated and then back-transformed.
$$
CI_{r_{rb}} = \text{tanh}(Z_{rb} \pm 1.96 \cdot SE_{Z_{rb}})
$$
In R, we can use the `ses_calc()` function in `TOSTER` package [@TOSTER]. For the following example, we will calculate the rank-biserial correlation in the sleep dataset:
```{r,echo=TRUE, warning=FALSE,message=FALSE}
# Dependent groups
data(sleep)
library(TOSTER)
# When sample sizes are small
# a permutation version should be used.
# When this is done a seed should be set.
set.seed(2124)
ses_calc(extra ~ group,
data = sleep,
paired = TRUE)
```
The example shows a rank-biserial correlation is $r_{rb}$ = .982 [.938, .995]. This suggests that nearly every individual in the sample showed an increase in condition 2 relative to condition 1. As you can see from the figure below, only one individual showed a decline (individual shown in red).
```{r,echo=FALSE,message=FALSE,warning=FALSE}
library(ggplot2)
library(ggdist)
library(dplyr)
data(sleep)
df <- tidyr::pivot_wider(sleep,id_cols = "ID",names_from = "group",values_from = "extra")
Diff <- df$`2` - df$`1`
sleep <- cbind(sleep, Difference = c(Diff,Diff))
ggplot(data=sleep, aes(x = group, y = extra)) +
geom_line(aes(group = ID, color = Difference>0),alpha = .5,size=2) +
geom_point(aes( color = Difference>0), alpha = .5,size=3) +
theme_ggdist() + theme(aspect.ratio = 1,text=element_text(size=16)) +
ylab('Increase in Hours of Sleep')+
xlab('') +
scale_x_discrete(labels = c("Control", "Intervention"))
```
#### Independent Groups {#sec-rrb-ind}
1. Calculate the ranks for each observation across all observations of in group 1 and 2
$$
R = \text{rank}(X)
$$
2. Calculate the rank sums from each group
$$
U_1 = \left(\sum_{i=1}^{n_1} R_{1i}\right) - n_1 \cdot \frac{n_1 + 1}{2}
$$
$$
U_2 = \left(\sum_{i=1}^{n_2} R_{2i}\right) - n_2 \cdot \frac{n_2 + 1}{2}
$$
3. Calculate rank biserial correlation
$$
r_{rb} = \frac{U_1}{n_1 n_2} - \frac{U_2}{n_1 n_2}
$$
4. For independent samples, the standard error is calculated as the following:
$$
SE_{rb} = \sqrt{\frac {n_1 + n_2 + 1} { 3 n_1 n_2}}
$$
5. The confidence intervals can then be calculated by transforming the estimate.
$$
Z_{rb} = \text{arctanh}(r_{rb})
$$
6. Calculate the standard error of the Z-transformed correlation
$$
SE_{Z_{rb}} = \frac{SE_{r_{rb}}}{1-r_{rb}^2}
$$
7. Then the confidence interval can be calculated and then back-transformed.
$$
CI_{r_{rb}} = \text{tanh}(Z_{rb} \pm 1.96 \cdot SE_{Z_{rb}})
$$
In R, we can use `ses_calc` in the `TOSTER` package can be utilized to calculate $r_{rb}$.
```{r,echo=TRUE, warning=FALSE}
# Two Sample
# install the janitor package for data cleaning
# clean and import data from likert
data(mass, package = "likert")
df_mass = mass |>
as.data.frame() |>
janitor::clean_names()
# function needs input as a numeric
# ordered factors can be converted to ranks
# Again, the warning can be ignored
set.seed(24111)
ses_calc(
rank(math_relates_to_my_life) ~ gender,
data = df_mass,
paired = FALSE
)
```
The example shows a rank-biserial correlation is $r_{rb}$ = -.45 [-.78, .08].
### Concordance Probability {#sec-conc-prob}
In the two sample case, concordance probability is the probability that a randomly chosen subject from one group has a response that is larger than that of a randomly chosen subject from the other group. In the two sample case, this is roughly equivalent to the statistic of the Brunner-Munzel test. In the paired sample case, it is the probability that a randomly chosen difference score ($D$) will have a positive (+) sign plus 0.5 times the probability of a tie (no/zero difference). The concordance probability can go by many names. It is also referred to as the c-index, the non-parametric probability of superiority, or the non-parametric common language effect size (CLES).
The calculation of concordance can be derived from the rank-biserial correlation. The concordance probability ($p_c$) can be converted from the correlation.
$$
p_c = \frac{r_{rb} + 1 }{2}
$$
In R, we can use the `ses_calc()` function again along with the sleep data set. For repeated measures experiments, the concordance probability in dependent groups can be calculated utilizing the `paired=TRUE` argument in the `ses_calc()` function:
```{r,echo=TRUE}
# Dependent Groups
library(TOSTER)
data(sleep)
ses_calc(extra ~ group,
data = sleep,
paired = TRUE,
ses = "c")
```
For two independent groups, the concordance probability can be calculated similarly without specifying the `paired` argument:
```{r}
# Independent Groups
# data import from likert
data(mass, package = "likert")
df_mass = mass |>
as.data.frame() |>
janitor::clean_names()
ses_calc(rank(math_relates_to_my_life) ~ gender,
data = df_mass,
ses = "c")
```
### Wilcoxon-Mann-Whitney Odds {#sec-odds}
The Wilcoxon-Mann-Whitney odds [@o2006exploiting], also known as the "Generalized Odds Ratio"[@agresti1980generalized], essentially transforms the concordance probability into an odds ratio.
The odds can be converted from the concordance by taking the logit of the concordance. This will provide the log odds.
$$
O_{WMW} = \exp \left[\text{logit}(p_c)\right]
$$
The exponential value of the log-odds will provide the odds on a more interpretable scale. Taking just the logit of the concordance probability would give us the log odds such that,
$$
\log(O_{WMW}) = \text{logit}(p_c)
$$
In R, we can calculate $O_{WMW}$ by using the `ses_calc()` function from the `TOSTER` package:
```{r,echo=TRUE}
# Dependent Groups
data(sleep)
TOSTER::ses_calc(extra ~ group,
data = sleep,
paired = TRUE,
ses = "odds")
```
We can also calculate $O_{WMW}$ in independent groups using the same function:
```{r,echo=TRUE}
# Independent Groups
# data import from likert
data(mass, package = "likert")
df_mass = mass |>
as.data.frame() |>
janitor::clean_names()
TOSTER::ses_calc( rank(math_relates_to_my_life) ~ gender,
data = df_mass,
ses = "odds")
```