-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCogSci_Bachelor_Thesis.tex
747 lines (524 loc) · 95.9 KB
/
CogSci_Bachelor_Thesis.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
\documentclass[10pt,a4paper,onecolumn]{article}
\usepackage{marginnote}
\usepackage{graphicx}
%\usepackage{xcolor}
\usepackage[dvipsnames]{xcolor}
\usepackage{authblk,etoolbox}
\usepackage{titlesec}
\usepackage{calc}
\usepackage{tikz}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage{setspace}
\usepackage{hyperref}
\hypersetup{colorlinks,
urlcolor=NavyBlue,
linkcolor=Mulberry}
\usepackage{caption}
\usepackage{tcolorbox}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{seqsplit}
\usepackage{enumitem}
\usepackage{xparse}
\usepackage{balance}
\ExplSyntaxOn
\clist_new:N \g_mapo_allauthors_clist
\NewDocumentCommand\addauthor {m}
{
\clist_gput_right:Nn \g_mapo_allauthors_clist { #1 }
}
\NewDocumentCommand \printall { } { } % initialization
\DeclareExpandableDocumentCommand \printall { }
{
\clist_use:Nnnn \g_mapo_allauthors_clist { ~and~ } { ,~ } { ~and~ }
}
\ExplSyntaxOff
% \usepackage{fixltx2e} % provides \textsubscript
\usepackage[backend=biber,style=apa]{biblatex}
\addbibresource{master.bib}
\addbibresource{packages.bib}
% --- Page layout -------------------------------------------------------------
\usepackage[top=3.5cm, bottom=3cm, right=1.5cm, left=1.5cm,
headheight=2.2cm, reversemp, marginparwidth=0cm, marginparsep=0cm]{geometry}
% --- Default font ------------------------------------------------------------
% \renewcommand\familydefault{\sfdefault}
% --- Style -------------------------------------------------------------------
\renewcommand{\bibfont}{\small \sffamily}
\renewcommand{\captionfont}{\small\sffamily}
\renewcommand{\captionlabelfont}{\bfseries}
% --- Section/SubSection/SubSubSection ----------------------------------------
\titleformat{\section}
{\normalfont\sffamily\Large\bfseries}
{\thesection}{1em}{}
\titleformat{\subsection}
{\normalfont\sffamily\large\bfseries}
{\thesubsection}{1em}{}
\titleformat{\subsubsection}
{\normalfont\sffamily\bfseries}
{\thesubsubsection}{1em}{}
\titleformat*{\paragraph}
{\sffamily\normalsize}
% --- Header / Footer ---------------------------------------------------------
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}
%\renewcommand{\headrulewidth}{0.50pt}
\renewcommand{\headrulewidth}{0pt}
\addauthor{{Backström, L. (202004875, LB)}}
\addauthor{{Ring, L. (202009983, LR)}}
\fancyhead[L]{\footnotesize{\sffamily \printall}.}
\fancyhead[C]{}
\fancyhead[R]{\footnotesize{\sffamily Bachelor's Project (147201E020).}}
\renewcommand{\footrulewidth}{0.25pt}
\fancyfoot[L]{\footnotesize{\sffamily Harmony in Motion: Real-time Sonification Strategies for Joint Action Research, (2023).}}
\fancyfoot[R]{\sffamily \thepage}
\makeatletter
\let\ps@plain\ps@fancy
\fancyheadoffset[L]{0cm}
\fancyfootoffset[L]{0cm}
\fancypagestyle{plain}{%
\renewcommand{\headrulewidth}{0pt}%
\fancyhf{}%
\fancyfoot[L]{\footnotesize{\sffamily Harmony in Motion: Real-time Sonification Strategies for Joint Action Research, (2023).}}%
\fancyfoot[R]{\sffamily \thepage}%
}
% --- Macros ---------
\definecolor{linky}{rgb}{0.0, 0.5, 1.0}
\newtcolorbox{repobox}
{colback=red, colframe=red!75!black,
boxrule=0.5pt, arc=2pt, left=6pt, right=6pt, top=3pt, bottom=3pt}
\newcommand{\ExternalLink}{%
\tikz[x=1.2ex, y=1.2ex, baseline=-0.05ex]{%
\begin{scope}[x=1ex, y=1ex]
\clip (-0.1,-0.1)
--++ (-0, 1.2)
--++ (0.6, 0)
--++ (0, -0.6)
--++ (0.6, 0)
--++ (0, -1);
\path[draw,
line width = 0.5,
rounded corners=0.5]
(0,0) rectangle (1,1);
\end{scope}
\path[draw, line width = 0.5] (0.5, 0.5)
-- (1, 1);
\path[draw, line width = 0.5] (0.6, 1)
-- (1, 1) -- (1, 0.6);
}
}
% --- Title / Authors ---------------------------------------------------------
% patch \maketitle so that it doesn't center
\patchcmd{\@maketitle}{center}{flushleft}{}{}
\patchcmd{\@maketitle}{center}{flushleft}{}{}
% patch \maketitle so that the font size for the title is normal
\patchcmd{\@maketitle}{\LARGE}{\LARGE\sffamily}{}{}
% patch the patch by authblk so that the author block is flush left
\def\maketitle{{%
\renewenvironment{tabular}[2][]
{\begin{flushleft}}
{\end{flushleft}}
\AB@maketitle}}
\makeatletter
\renewcommand\AB@affilsepx{ \protect\Affilfont}
%\renewcommand\AB@affilnote[1]{{\bfseries #1}\hspace{2pt}}
\renewcommand\AB@affilnote[1]{{\bfseries #1}\hspace{3pt}}
\makeatother
\renewcommand\Authfont{\sffamily\bfseries}
\renewcommand\Affilfont{\sffamily\small\mdseries}
\setlength{\affilsep}{1em}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\PassOptionsToPackage{usenames,dvipsnames}{color} % color is loaded by hyperref
\hypersetup{unicode=true,
pdftitle={Harmony in Motion: Real-time Sonification Strategies for Joint Action Research},
pdfkeywords={Joint Action; Sonification; Real-time; Synchronization; Interpersonal Coordination},
colorlinks=true,
linkcolor=Mulberry,
citecolor=BrickRed,
urlcolor=NavyBlue,
}
\urlstyle{same} % don't use monospace font for urls
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{5}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
% tightlist command for lists without linebreak
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
% From pandoc table feature
\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{flafter}
\usepackage{biblatex}
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{array}
\usepackage{multirow}
\usepackage{wrapfig}
\usepackage{float}
\usepackage{colortbl}
\usepackage{pdflscape}
\usepackage{tabu}
\usepackage{threeparttable}
\usepackage{threeparttablex}
\usepackage[normalem]{ulem}
\usepackage{makecell}
\usepackage{xcolor}
\title{Harmony in Motion: Real-time Sonification Strategies for Joint Action Research}
\author[1]{Linus Backström}
\author[1]{Luke Ring}
\affil[1]{Aarhus University}
\date{\vspace{-5ex}}
\begin{document}
\newgeometry{includemp, reversemp, left=1.0cm, marginparwidth=4.5cm, marginparsep=0.5cm}
\maketitle
% \thispagestyle{empty}% suppress header and footer on title page
\marginpar{
\sffamily\small
{\bfseries Programme}\\BSc Cognitive Science\\[1mm]
{\bfseries Course}\\Bachelor's Project (147201E020)\\[1mm]
{\bfseries Supervisor}\\Anna Zamm, Assistant Professor\\[1mm]
{\bfseries Faculty}\\Faculty of Arts\\
Aarhus University\\[2mm]
{\bfseries Submitted:} 15 February 2023\\[2mm]
{\bfseries Student Details}
\begin{itemize}[align=parleft,left=1em..2em]
\setlength\itemsep{0em}
\item Linus Backström\\ ID: 202004875\\ Initials: LB
\item Luke Ring\\ ID: 202009983\\ Initials: LR
\end{itemize}
\vspace{2mm}
{\bfseries Software}
\begin{itemize}[align=parleft,left=1em..2em]
\setlength\itemsep{0em}
\item \href{https://github.com/zeyus/QTM\_Bela\_Sonification}{\color{NavyBlue}{Repository}} \ExternalLink
\end{itemize}
\vspace{2mm}
{\bfseries License}\\
Authors of papers retain copyright and release the work under a MIT Licence (\href{https://github.com/zeyus/QTM\_Bela\_Sonification/blob/main/LICENSE.md}{\color{NavyBlue}{MIT}}).
}
\begin{abstract}
Joint actions involving high levels of coordination often require individuals to represent and monitor their own actions as well as their partner's actions in parallel, but current research is unclear on how this occurs under various circumstances. Using different movement sonification mapping strategies, we enhance attention towards either individual or joint outcomes of actions, and separate them into experimental conditions. Five subject pairs participated in a pilot experiment investigating whether synchrony is optimized when focusing on self-other or joint outcome representations. In the experiment, blindfolded subjects moved sleds along a track, while attempting to remain as synchronous as possible. The sled movements were captured with a motion capture system which sent 3D positional data to a low-latency sonification pipeline to implement the mapping strategies. The results showed that there were significant differences between the two sonification strategies. Notably, the No Sonification control condition consistently outperformed both sonification conditions, possibly due to environmental auditory localization that may have been masked during the sonification conditions. This pilot experiment successfully implemented a novel paradigm for joint action research that can be used in further studies in the field.
\end{abstract}
\bigbreak
\textbf{Keywords:} {Joint Action; Sonification; Real-time; Synchronization; Interpersonal Coordination}
\restoregeometry
\clearpage
\section*{Summary}
Joint actions, where two or more people synchronize their actions in pursuit of a shared goal, are a common aspect of human behavior. Understanding the mechanisms that enable individuals to work together is therefore highly valuable to cognitive scientists. Joint actions frequently require simultaneous actions by the participants, creating the need for agents to monitor both their own and their partner's actions in parallel. By utilizing movement sonification, we attempt to facilitate monitoring of external information, thereby optimizing synchronization during a novel joint action task.
In order to investigate this aspect of joint action synchrony, we used data from motion tracking cameras to sonify positions of sleds on a track, blindfolded subjects moved the sleds from end to end and attempted to synchronize their movements. The experiment consisted of three conditions: a No Sonification condition, where only the sound of the sleds moving along the track could be used, and two sonification conditions which employed different strategies. The task-oriented strategy used the position of the sleds on the track as a means of self-other representation and the synchronization-oriented strategy used the distance between sleds for joint-outcome representation.
We gathered experimental data from motion tracking recordings and analyzed them using three methods of assessing synchrony and applied mixed-effect linear models to the data. The results showed that the No Sonification condition performed the best, with the Synchronization-oriented strategy performing only slightly worse, and the Task-oriented strategy performing significantly worse.
Various reasons for the particular results are discussed, in particular, this may be due to the fact that auditory localization was more accurate when there were only the sounds of the sleds on the track, and this spatial information was masked by the sonification in both sonification conditions. Options for expansion and improvement on the experiment are suggested. This pilot experiment successfully demonstrated that low-latency real-time sonification with low-cost hardware is a viable and effective novel method for use in joint action research, and further research may provide more evidence for the effect of sonification strategy choice on synchrony and learning.
\clearpage
\twocolumn
{
\hypersetup{linkcolor=Black}
\setcounter{tocdepth}{3}
\tableofcontents
}
\clearpage
\hypertarget{harmony-in-motion-real-time-sonification-strategies-for-joint-action-research-lb-lr}{%
\section{Harmony in Motion: Real-time Sonification Strategies for Joint Action Research (LB, LR)}\label{harmony-in-motion-real-time-sonification-strategies-for-joint-action-research-lb-lr}}
Joint action tasks form an integral part of everyday life for humans \autocite{vanderwelUnderstandingJointAction2021} and other species \autocite{ferrari-tonioloTwoBrainsAction2019}. Examples include games and sports, such as football, where it is vital to work with other team members to outplay opponents; construction work, where people may be holding a wall panel up while others fix it to a frame; and music and dancing, where pairs of people may interact in elaborate ways to the rhythm of a song, creating a joint performance from their individual movements. The mechanisms underlying this cooperative ability to work together towards a common goal are of particular interest for research in cognition, creativity, and learning. A huge part of humanity's progress can be attributed to joint action, which has allowed us to build our modern society with all of its infrastructure and technological advancements. An essential part of successful human cooperation is made up of our unique ability of speech, and many types of cooperation involve perception and production of sound as a key aspect. Auditory perception, or hearing, in humans refers to our ability to perceive changes in air pressure as sound by detecting vibrations with our ears and interpreting them using our brains. The topic is particularly interesting for cognitive science because of the large amount of perceptual-cognitive processing that occurs from the moment our ears pick up vibrations to when a perception arises.
Sound as a key aspect in cooperation usually belongs to one of two categories: either as the focus of the task, as is the case for musicians in a band, or as a component that can be leveraged for increasing situational awareness or synchronization, for example with a steady beat that members of military corps lock step to. This study investigates the relationship between joint actions and sounds by utilizing sonification as a way to facilitate the monitoring of individual and joint outcomes during joint action. These two primary concepts of the current paper -- joint action and sonification -- are introduced and briefly defined here, while a more in-depth discussion of each concept, the terminology around them, and previous research into them, follows in the Background section. We refer to joint action as any situation where two or more people synchronize their actions in pursuit of a shared goal \autocite{knoblichPsychologicalResearchJoint2011}. Sonification is defined as ``the use of nonspeech audio to convey information'' \autocite[p.~4]{kramerSonificationReportStatus1999}.
Previous research indicates two basic features of auditory perception that provide good arguments for representing data as sound \autocite{kramerSonificationReportStatus1999}. First, auditory perception is especially useful for detecting temporal characteristics, i.e.~variations in sound over time \autocite{hildebrandtShortPaperEnhancing2014}. Sonification can thus be useful for monitoring or understanding complex temporal data. Second, our sense of hearing does not require us to be oriented toward the sound source. Unlike visual perception, which allows us to perceive approximately 180 degrees of our environment in front of us while we remain blind to the other 180 degrees behind us, auditory perception allows perception of 360 degrees. This makes auditory signals particularly useful for situations where our visual system is occupied with another task and we cannot afford to look around constantly, such as surveillance and alarm applications. Other benefits of auditory perception that speak for sonification are parallel listening (the ability to monitor and process multiple audio sources), affective response (increased learning and engagement), and finally, rapid detection -- humans can react faster to sound than to any other type of stimulus, achieving reaction times of around 160 ms in simple reaction time experiments \autocite{kosinskiLiteratureReviewReaction2008,kramerSonificationReportStatus1999}.
This study explores how synchronization during a novel joint action task is affected by different methods of sonification. When performing joint actions an individual can either focus on themselves and their partner as separate entities, which we refer to as self-other representations or instead focus primarily on the effect that their combined actions have, which we call joint outcome representations. The current study aims to investigate whether learning and synchronization during joint action can be optimized by enhancing attention towards one of these representations using sonification. Movement sonification can be used to facilitate synchronization by providing auditory feedback for actions, allowing individuals to adjust their movements in real-time to achieve a more synchronized state. More specifically, sonification can help individuals better perceive and coordinate their movements, leading to improved joint performance and increased levels of synchrony \autocite{dotovEntrainingChaoticDynamics2018}. Our research question is thus the following:
\begin{quote}
Is synchrony optimized when focusing on self-other representations or joint outcome representations?
\end{quote}
When using auditory feedback in a joint action context, latency is particularly important since there is a relatively small window where an event and a related sound are perceived as synchronous. Although studies report varying results \autocite{keetels2012perception}, asynchrony is detectable at as little as 6 ms, and more likely around 30 ms for continuous movement \autocite{mcphersonActionSoundLatencyAre2016}, meaning any pipeline with a higher latency is likely to introduce confounding variables in measurements. Only a relatively limited number of studies have investigated the effects of sonification on joint action\footnote{A Google Scholar search (14 February 2023) for `+''joint action'' +sonification' only yielded 193 results, compared to over 326,000 results for `+''joint action''' alone}, and this thesis aims to expand the current body of research by presenting a flexible low-latency sonification framework that uses real-time positional data for joint action research. To this end, the present study implements a novel method for sonifying joint actions in a pilot study investigating how different representations affect synchronization. By comparing subject synchronization during an experiment employing self-other-represented (task-oriented) or joint outcome-represented (synchronization-oriented) strategies, we attempt to show differences that highlight the importance of selecting appropriate mapping patterns for sonification and provide a pathway for further investigation.
\hypertarget{background}{%
\section{Background}\label{background}}
\hypertarget{sonification-lb-lr}{%
\subsection{Sonification (LB, LR)}\label{sonification-lb-lr}}
The current study investigates whether sonification can be used to optimize synchronization during joint action by enhancing attention towards either self-other or joint outcome representations. Sonification is defined as ``the use of nonspeech audio to convey information'' \autocite[p.~4]{kramerSonificationReportStatus1999}. More specifically, sonification is ``the transformation of data relations into perceived relations in an acoustic signal for the purposes of facilitating communication or interpretation'' \autocite[p.~4]{kramerSonificationReportStatus1999}. According to \textcite{dubusInteractiveSonificationMotion2013} sonification is the use of sound to communicate, interpret and perceive data. Sonification is especially suitable for tasks with time constraints, such as monitoring and synchronizing \autocite{dubusInteractiveSonificationMotion2013}. Sonification can also be characterized as a segment of augmented reality that reveals information otherwise hidden, with the help of sound \autocite{dubusInteractiveSonificationMotion2013}. According to \textcite{dubusInteractiveSonificationMotion2013}, that is done through clear connections between data dimensions and auditory dimensions of the sonification display. The layperson may confuse sonification with music, but according to \textcite{dubusInteractiveSonificationMotion2013}, there is a clear difference between the two: sonification is meant to communicate objective data, and music is instead often used to communicate more subjective things, such as emotions. Nevertheless, the differences between music and sonification are not fully agreed upon among researchers, and thus there is no clear consensus on the distinction in the academic discourse \autocite{dubusInteractiveSonificationMotion2013}. As mentioned above, sonification is a relatively young field of research. Sonification studies are plagued by a lack of consistency in terminology and the arbitrary nature of sonification mappings \autocite{dubusInteractiveSonificationMotion2013,dubusSystematicReviewMapping2013}.
Although concepts around sonification and audification were not formalized until around the year 1992, when the first International Conference on Auditory Display (ICAD) was held \autocite{dubusSonificationPhysicalQuantities2011}, practical examples of sonification can be found throughout history \autocite{dubusInteractiveSonificationMotion2013}. Water clocks in ancient Greece and medieval China were sometimes constructed to produce sounds and thereby provide auditory information about the passage of time \autocite{dubusSonificationPhysicalQuantities2011}. The stethoscope, which is used for listening to sounds made by the heart and lungs as well as other internal sounds of the body, was invented in 1816 by the French physician and amateur musician Rene Laënnec \autocite{roguinReneTheophileHyacinthe2006}. The Geiger counter developed in 1928 provides perhaps the most characteristic example of sonification through its function of sonifying levels of radiation. The device detects ionizing radiation and translates it into audible clicks, where a faster tempo signifies a higher level of radiation (Figure \ref{fig:geiger-counter}). \textcite{dubusSonificationPhysicalQuantities2011} describe the value of the Geiger counter as ``transposing a physical quantity which is essentially non-visual and pictured in everyone's imagination as very important because life-threatening, to the auditory modality through clicks with a varying pulse'' \autocite[p.~1]{dubusSonificationPhysicalQuantities2011}.
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/Geiger_counter_usage}
}
\caption{Photograph depicting a Geiger counter being used to detect levels of radiation. Geiger counters use sonification to represent radioactivity by producing audible clicks that increase in frequency as the level of measured ionizing radiation increases \autocite{dobsonDetailsPublicHealth1963}.}\label{fig:geiger-counter}
\end{figure}
\hypertarget{monitoring-via-auditory-feedback-lb}{%
\subsubsection{Monitoring via auditory feedback (LB)}\label{monitoring-via-auditory-feedback-lb}}
In a review of mapping strategies for sonification, \textcite{dubusSystematicReviewMapping2013} identifies several applications for sonification, including monitoring, motion perception, data exploration, accessibility, art and aesthetics, the study of psychoacoustics, and as a complement to visualization. \textcite{debashiSonificationNetworkTraffic2018} specify that sonification is a particularly useful tool for conveying the type of information that changes over time. \textcite{kramerSonificationReportStatus1999} points out that sonification can allow the user to make sense of large amounts of data by utilizing modern powerful media technologies. Out of the various applications for sonification, monitoring of external information is the most relevant for the current study.
Using sonification for external monitoring can for example mean that there is a sound that the user listens to while simultaneously working on something else, such as when medical staff in operating rooms rely on auditory cues from their equipment to monitor the patient's vital signs \autocite{dubusInteractiveSonificationMotion2013}. In such instances, a change to the monitored state causes a corresponding change in the sound, allowing the user to quickly become aware of the change and react as needed. One of the clear advantages of using sonification for external monitoring is then that the user is free to work on a different task than the monitoring while still maintaining the ability to detect and react to changes \autocite{vickersSonificationProcessMonitoring2011}.
Compared to visualization, sonification can have certain advantages that make it suitable as a complement or replacement to visualization. This can be observed in practice in the health sector, where real-time sonification using parameter mapping methods are used; one study identified a high potential and found positive results for the use of real-time auditory feedback-oriented training devices in physical rehabilitation and fitness training to increase awareness of physiological responses \autocite{yangRealtimeSonificationBiceps2015}. The fact that humans are very sensitive to changes in rhythm or sequences of sounds lends further support to the idea of complementing visualizations with sonification \autocite{hildebrandtShortPaperEnhancing2014}. A recent study by \textcite{debashiSonificationNetworkTraffic2018} comparing sonification and visual methods of monitoring found that the visual method alone performed significantly worse than a combination of both, and further that using sonification resulted in reduced visual fatigue rates. In summary, the scientific literature clearly indicates that sonification has an important part to play in the context of monitoring external information and that its use cases extend to several different sectors, and would benefit from further research.
\hypertarget{movement-sonification-lb-lr}{%
\subsubsection{Movement sonification (LB, LR)}\label{movement-sonification-lb-lr}}
As previously mentioned, sonification involves the transformation of all types of data into sound \autocite{kramerSonificationReportStatus1999}. The term movement sonification specifically refers to the transformation of movement -- typically that of a human -- into sound \autocite{vinkenAuditoryCodingHuman2013}. \textcite{effenbergMovementSonificationEffects2005} states that perception and reproduction accuracy of gross motor patterns can be improved with the help of movement sonification, indicating a wide range of potential applications for artificial auditory movement information in sports and rehabilitation. Based on the idea that perceiving gross motor patterns is facilitated when more senses are active, sports scientists in particular have tried to take advantage of this effect by creating and conveying an increased amount of auditory movement information \autocite{brockIfMotionSounds2012,kosBiofeedbackSportChallenges2015,schmitzPerceptualEffectsAuditory2012,vinkenAuditoryCodingHuman2013}. In order to achieve multisensory integration benefits, the additional auditory movement information must correspond to the structure of the perceptual features of another modality (visual, kinesthetic, or tactile) \autocite{schmitzSoundJoinedActions2017}. When visual motion perception is the reference with which bi- or multimodal convergence is to be achieved, movement sonification needs to be based on kinematic parameters \autocite{schmitzSoundJoinedActions2017}. These kinematic parameters refer to the spatiotemporal features of a movement pattern or pose. This acoustic enhancement of motor perception became known as ``movement sonification'' when \textcite{effenbergMovementSonificationEffects2005} took the sonification approach of the early 1990s and adapted it to the kinematics and dynamics of human motor actions.
In the empirical section of the current study, we describe how we used movement sonification to emphasize different joint action strategies and manipulate synchronization during a joint action task. The use of movement sonification in joint action research is supported by the finding that movement sonification enhances the perception of movement and improves motor performance \autocite{schmitzObservationSonifiedMovements2013}. Other studies in sports science have found that when movements are mapped onto sound, i.e.~sonified, predictions can be facilitated \autocite{effenbergMovementSonificationEffects2005,schmitzPerceptualEffectsAuditory2012}. Movement sonification may also support synchronization in joint action by addressing central motor representations, specifically by making the movements of athletes more predictable to their teammates \autocite{schmitzPerceptualEffectsAuditory2012}. Furthermore, sonification is well suited to support applications for physical training, as seen in a study by \textcite{dubusEvaluationFourModels2012} where professional rowers were able to use kinetic and kinematic cues to optimize their rowing speed. The author concluded that rowing performance could be improved with the help of interactive augmented feedback \autocite{dubusEvaluationFourModels2012}. Finally, \textcite{schmitzSoundJoinedActions2017} found that complementing visualizations of a swimmer with kinematic sonification allowed for more accurate perceptions of differences in swimming stroke frequency.
With sonification being such a recent field of research, its subfield movement sonification has had even less time to be researched \autocite{vinkenAuditoryCodingHuman2013}. As such, the question of how to map movement parameters onto sound in an optimal way remains uncertain due to a lack of an adequate theoretical background \autocite{effenbergAccelerationDecelerationConstant2018}. With this uncertainty in mind, \textcite{effenbergAccelerationDecelerationConstant2018} suggests that movement sonification can function as an accessible form of information similar to visual information when coded properly. Along the same lines, \textcite{vinkenAuditoryCodingHuman2013} states that movement sonification can improve motor processes, as well as add information to parts of movements that are typically silent. By contrast, \textcite{vinkenAuditoryCodingHuman2013} also explains that despite these potential use cases there is hardly any empirical data from scientific research that clarifies how to sonify gross motor human movement to achieve information-rich sound sequences. For these reasons, it is important to gather more data about both sonification in general, as well as movement sonification specifically.
\textcite{vinkenAuditoryCodingHuman2013} identify three main areas for movement sonification that are lacking in empirical proof: the selection of appropriate movement features, the optimal mapping patterns between kinetic and acoustic features, and the appropriate number of dimensions for sonification. The current study adds to the existing body of research in movement sonification by implementing a flexible low-latency sonification pipeline and describing our strategy selection based on movement and acoustic features, how they were mapped, and which dimensions were used.
In addition to the aforementioned contribution to movement sonification research, this study also adds to the literature by investigating sonification in the context of learning a novel joint action task and improving performance as measured by synchronization. Music belongs to the relatively small set of joint action behaviors that support a high degree of coordination, because of how suitable our auditory system is for temporal coordination \autocite{hildebrandtShortPaperEnhancing2014}. This raises the question of whether we can optimize temporal coordination by using movement sonification, or said differently: can sonification help with joint action? The existing body of research falls short on this question, and the present study aims to open a discussion with the help of a pilot experiment conducted at Aarhus University. In the next section, we will present the theory behind joint action with a focus on representations, action monitoring, and action prediction. Then, before presenting our pilot experiment, we will further discuss how and why the two concepts of sonification and joint action are integrated.
\hypertarget{joint-action-lb}{%
\subsection{Joint action (LB)}\label{joint-action-lb}}
Joint actions, where two or more people synchronize their actions in pursuit of a shared goal \autocite{knoblichPsychologicalResearchJoint2011}, are a regular part of human behavior. A longer definition refers to joint action as ``any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment'' \autocite[p.~1]{sebanzJointActionBodies2006}. With these definitions in mind, examples of joint action can include an extremely wide range of activities, such as handshakes, conversations, musical performances, and partner dances, as well as bank robberies and the building of the pyramids. To avoid overwhelming the reader, a typical example that is found in the literature \autocite{sebanzJointActionBodies2006} and that fits both definitions of joint action is when two people work together to carry a table from point A to point B.
As mentioned in the introduction, the progress of human civilization is largely based on working together. Joint actions constitute a significant part of the human experience, and they form an important and intriguing research topic for the field of cognitive science. Studies of joint actions such as putting together furniture or playing a piano duet, for instance, have shed light on how speech is used to establish who will do what and to agree on the details of the joint performance \autocite{clarkCoordinatingEachOther2005}. Additionally, research on how people solve problems of spatial coordination has shown that humans are capable of creating new symbol systems to coordinate their actions when conventional communication is not available \autocite{galantucciExperimentalSemioticsNew2009}.
Joint actions can be divided into two categories: emergent and planned coordination \autocite{knoblichPsychologicalResearchJoint2011}. Emergent coordination describes coordinated behavior arising from perception-action connections that lead to similar actions among individuals, independently of prior planning; an example of this is when pedestrians end up walking in step with each other without explicitly planning to do so \autocite{knoblichPsychologicalResearchJoint2011}. With regards to planned coordination, the behavior of agents is driven by representations that describe the desired outcomes of joint action and the respective role of the agent in achieving these outcomes \autocite{knoblichPsychologicalResearchJoint2011}. Next, we will describe the cognitive processes that are involved in the more relevant, planned coordination, category of joint action. In addition to representations, recent theory identifies action monitoring and action prediction as the other main cognitive processes involved in planned joint action \autocite{loehrMonitoringIndividualJoint2013,sebanzJointActionBodies2006,vesperMinimalArchitectureJoint2010}.
\hypertarget{representations-lb}{%
\subsubsection{Representations (LB)}\label{representations-lb}}
According to the minimal architecture for joint action proposed by \textcite{vesperMinimalArchitectureJoint2010}, an agent involved in joint action must, at a minimum, have a representation of their own task and of the shared goal. An assumption is also that the shared goal cannot be achieved without the contribution of both parties \autocite{vesperMinimalArchitectureJoint2010}. In the model developed by \textcite{vesperMinimalArchitectureJoint2010}, the shared goal is expressed as ``ME + X'', where ``ME'' stands for the agent's own contribution, and ``X'' stands for the contribution that is not produced by the agent themselves. A minimal version of including the other in one's representations is theorized to be the understanding that the source of ``X'' -- that which is not produced by an agent themselves -- is the joint action partner \autocite{loehrSoundYouMe2016,vesperMinimalArchitectureJoint2010}.
Although not required, it is often helpful to also represent the other's task, as it allows for more precise predictions of what the other will do next \autocite{boltSensoryAttenuationAuditory2021,wenkeWhatSharedJoint2011}. As an example, consider two singers performing a duet together. Each singer must fully know their own part, while also representing the shared goal of synchronized singing. Although these two main representations can be sufficient for performing a duet, professional singers typically familiarize themselves with their singing partner's part in addition to their own, as it allows for a more polished and cohesive musical performance. The benefits of representing the other's task were demonstrated by a study \autocite{kellerPianistsDuetBetter2007} in which pianists were asked to record one part from a selection of piano duets, and then play the complementary part in synchrony with either their own or other participants' recordings. The results showed that the pianists synchronized better with recordings of themselves than those of others, indicating that synchronization is facilitated by having a more precise representation of the auditory stimuli with which one is coordinating actions. A study by \textcite{loehrSoundYouMe2016}, which had piano novices practice a duet with a more experienced pianist, found that the novices' representations consisted of the duet participants' shared goal, and to a small extent of the novices' own personal goal. Further insight into the role of representations in joint action comes from an EEG study by \textcite{kourtisPredictiveRepresentationOther2012}, which found that partners represented each other's actions in advance when passing an object, thereby facilitating coordination. Having these shared representations of actions and their underlying goals allows individuals to establish a procedural common ground for joint action without needing to rely on symbolic communication \autocite{sebanzJointActionBodies2006}.
Two music-related studies have explored both shared and individual goals. First, \textcite{kellerMusicalMeterAttention2005} demonstrated that musicians performing duets can attend to and recall both their own part and a combination of their own and a complementary part. More recently, \textcite{loehrMonitoringIndividualJoint2013} reported that duetting pianists prioritize shared goals (the musical harmony arising from both pianists' combined pitches) over individual action goals (the individual pitches played by each pianist), as demonstrated by stronger neural responses to pitch errors that impact the former compared to the latter. Based on their research, \textcite{loehrSoundYouMe2016} argues that shared goals are more salient compared to individual goals in novel joint actions performed by non-experts.
Empirical research of representations has largely focused on how people represent each individual's contributions to the joint action \autocite{knoblichPsychologicalResearchJoint2011,loehrSoundYouMe2016}. The details of \emph{how} people represent shared goals remain mostly unclear, however \autocite{loehrSoundYouMe2016}. Findings by \textcite{loehrSoundYouMe2016} indicate that novices in joint action contexts that promote minimal representations represent their actions concerning the shared goal, which supports the argument that joint action participants represent the shared goal of the task \autocite{vesperMinimalArchitectureJoint2010}. Still, researchers highlight the need for further research, for instance by pointing out the lack of joint action studies teasing apart representations of shared and individual goals \autocite{loehrSoundYouMe2016}. The present study thus aims to fill this gap in joint action research by addressing both representations in the form of self-other- and joint outcome -strategies as separate experimental conditions.
\hypertarget{action-monitoring-lb}{%
\subsubsection{Action Monitoring (LB)}\label{action-monitoring-lb}}
Another cognitive process involved in joint action is known as action monitoring, or simply, monitoring. Representations, specifically shared task representations, are intrinsically linked with both monitoring and predicting processes, with all of them working together to enable interpersonal coordination in real time \autocite{knoblichPsychologicalResearchJoint2011}. \textcite{knoblichPsychologicalResearchJoint2011} describe this interplay of cognitive processes by stating that shared task representations determine how agents monitor and plan their actions. A simple way to consider this is that in order to effectively monitor an action, a basic idea of what it should resemble -- i.e., a task representation -- is required.
Monitoring processes are used to assess the extent to which a task or goal is being accomplished and whether actions are proceeding as intended \autocite{botvinickConflictMonitoringCognitive2001}. In terms of assessing task and goal progress, three things can be monitored: the agent's own task, the other's task, and the shared goal. The agent must at least monitor the progress of their own task and the shared goal. It is not strictly necessary to monitor the other's task, and it depends on the type of joint action that is performed. For example, consider a very simple task such as lifting an object straight up in the air together with a partner. It is entirely possible to do so successfully even if both agents only monitor their own task (``lift this side of the object'') and the shared goal (``lift this object together''). Nevertheless, it is likely true that monitoring what one's partner is doing will improve joint action performance -- especially for tasks that require precise synchronization \autocite{vesperMinimalArchitectureJoint2010}.
Concerning monitoring the sensory consequences or outcomes of joint actions, a distinction can be made between monitoring the individual outcomes vs joint outcomes. A study by \textcite{loehrMonitoringIndividualJoint2013} distinguished between individual and joint outcomes of actions with the help of a clever experiment, where experienced pianists played a pre-rehearsed duet on a digital piano while the outcomes of certain keypresses were manipulated by the researchers. In the individual outcome condition, the produced tones of keypresses were manipulated so that the harmony of the resulting chord remained the same. In the joint outcome condition, the produced tones were manipulated so that the harmony of the chord changed. The researchers found that the musicians in their study were able to monitor both individual and joint outcomes while maintaining a distinction between the two. Furthermore, the musicians were able to monitor the outcomes of both their own and their partner's actions in parallel, while also differentiating between the two \autocite{loehrMonitoringIndividualJoint2013}. To summarize, it appears that agents involved in joint action can represent and monitor their own and their partner's actions, as well as the joint outcome of their actions. Nevertheless, research into how individual vs joint outcomes in joint action are monitored is extremely scarce, and the present study sheds light on this particular issue by conducting an experiment where individual and joint outcomes are sonified separately.
\hypertarget{action-prediction-lb}{%
\subsubsection{Action prediction (LB)}\label{action-prediction-lb}}
The crucial final feature of joint action relates to the manner in which individuals adapt their own actions to those of others in time and space, and doing so requires making predictions of the other's actions. In order to avoid constantly being one step behind during joint action, interacting partners cannot simply respond to observed actions, but must rather plan their own actions in relation to what they predict their partner will do \autocite{sebanzJointActionBodies2006}. This prediction process is achieved through motor simulation, which uses internal models to determine the sensory consequences of actions as well as their effect on the environment \autocite{schmitzSoundJoinedActions2017,vesperMinimalArchitectureJoint2010}. Simulating the actions of others as they occur may be especially beneficial when engaging in joint action, and it has been suggested that such motor simulation influences perception and assists in predicting the consequences and timing of others' actions \autocite{vesperMinimalArchitectureJoint2010}. The idea that internal predictive models contribute to the ability to anticipate others' actions is supported by findings that short-term predictions of others' actions are based on one's own motor experience \autocite{agliotiActionAnticipationMotor2008,calvo-merinoActionObservationAcquired2005}. The data from \textcite{loehrSoundYouMe2016} complement previous joint action research by strengthening the notion that agents can predict the consequences of others' actions in parallel with their own \autocite{loehrMonitoringIndividualJoint2013,vandersteenADaptationAnticipationModel2013,vesperOurActionsMy2014,wolpertUnifyingComputationalFramework2003} and also incorporate these predictions of other's actions when planning and executing their own actions \autocite{knoblichActionCoordinationGroups2003,kourtisPredictiveRepresentationOther2012,loehrTemporalCoordinationPerforming2011,vesperAreYouReady2013}.
It is not fully clear yet whether similar mechanisms as those mentioned above exist specifically for predicting the joint outcome of an agent's and their partner's actions. Some support for predicting joint outcomes comes from a study by \textcite{knoblichActionCoordinationGroups2003}, which demonstrated the ability to predict combined outcomes through improved joint task performance with practice. The results showed that participants initially struggled with the joint task of controlling a cursor together to track a moving target on a computer screen, but with practice, performance reached the level of individual performance. Furthermore, participants who were provided with an external cue tone about the state of their partner's action were more successful at the task, indicating that auditory feedback can facilitate coordination \autocite{knoblichActionCoordinationGroups2003}. This is particularly interesting in the context of the present study, as it suggests a potential benefit of using sonification in the context of joint action.
\hypertarget{integrating-sonification-and-joint-action-lb}{%
\subsection{Integrating sonification and joint action (LB)}\label{integrating-sonification-and-joint-action-lb}}
In this section, we will focus on how the concepts of sonification and joint action relate to each other. To briefly restate what has been previously discussed, sonification is defined as the transformation of data into sound \autocite{kramerSonificationReportStatus1999}, and joint action refers to situations where two or more people synchronize their actions to achieve a shared goal \autocite{knoblichPsychologicalResearchJoint2011}. The cognitive processes related to joint action include representation, monitoring, and prediction \autocite{loehrMonitoringIndividualJoint2013,sebanzJointActionBodies2006,vesperMinimalArchitectureJoint2010}. We will now discuss how sonification can make use of these three different processes in light of previous research.
The first cognitive process in joint action to be considered is representation. There is no clear consensus in the academic literature on the details of representations for an agent involved in joint action, but previous research indicates that the agent must, at the very least, represent their own task and the shared goal \autocite{vesperMinimalArchitectureJoint2010}. There are also several studies supporting the idea that representing the other's task can be beneficial for joint action by making prediction and synchronization easier \autocite{boltSensoryAttenuationAuditory2021,kellerPianistsDuetBetter2007,kourtisPredictiveRepresentationOther2012,sebanzJointActionBodies2006,wenkeWhatSharedJoint2011}. \textcite{loehrSoundYouMe2016} points out the need for future research to investigate which factors influence whether or not an agent represents their partner as the other source contributing to the shared goal, and how those representations of their partner may change while learning a joint action. In their study, \textcite{loehrSoundYouMe2016} found that novices have the ability to integrate the auditory effects of their partner's actions into their sensorimotor action representations while learning to play musical pieces together. The notion that such an ability is not limited only to experts is pertinent to the current study because it allows for the exploration of learning novel joint action tasks using sonification. Specifically, sonification can be used to sonify an agent's own actions, their partner's actions, or the joint outcome of both participants. This allows us to direct participants' attention towards specific types of representations, namely self-other and joint outcome representations.
The next cognitive process to discuss in the context of joint action and sonification is monitoring. As stated earlier, a popular and useful application for sonification is the monitoring of external information \autocite{dubusSystematicReviewMapping2013}. Designing a sonification system for monitoring purposes requires careful consideration of various conditions and requirements. The sound must be capable of supporting extended periods of listening, changes in status have to be salient, and unexpected events have to be immediately apparent \autocite{kimotoDesignImplementationStetho2002}. In joint action, monitoring one's own actions and the progress towards the shared goal is crucial for success, and the other's actions must also be monitored when precise synchronization is required \autocite{vesperMinimalArchitectureJoint2010}. When discussing monitoring in joint actions, it is important to take into account the divided nature of joint action and identify the challenges that come with it. One of the main challenges arises from the fact that joint actions often require simultaneous actions by the participants, which may create the need for agents to monitor both their own and their partner's actions in parallel \autocite{loehrMonitoringIndividualJoint2013}. A closely related challenge is that monitoring an action, whether one's own or the other's, is dependent on having a representation of that action \autocite{knoblichPsychologicalResearchJoint2011}. The other main challenge relates to the fact that joint action outcomes are often more than the sum of individual action outcomes \autocite{loehrMonitoringIndividualJoint2013}. An example of this is how the same tones played by one musician can take on different qualities and become part of different harmonies, depending on what tones another musician is simultaneously playing \autocite{loehrMonitoringIndividualJoint2013}. This leads to the consideration of whether agents monitor their own or their partner's actions in relation to individual action goals (those required to achieve each individual's own task) or in relation to shared action goals (the joint outcome of their actions) \autocite{loehrMonitoringIndividualJoint2013}. In the current study, we attempt to address this challenge of individual vs joint outcomes by creating two different sonification schemes, where one scheme sonifies the individual action outcomes, and the other scheme sonifies the joint outcome of both participants' combined actions. We can therefore use sonification to encourage and facilitate monitoring of either individual (self-other) or joint outcomes. We theorize that the previously identified benefits of sonification in monitoring, such as the ability to work on another task while monitoring with one's ears \autocite{vickersSonificationProcessMonitoring2011} and the human auditory system's sensitivity to changes in sequences of sound \autocite{hildebrandtShortPaperEnhancing2014}, should improve performance of related joint action tasks by reducing the cognitive load required for monitoring. This reduction in cognitive load would then allow joint action participants to also focus on other points of interest that support progress towards the shared goal, such as fine motor control and planning their next actions. For these reasons, we postulate that sonification holds substantial promise for both practical applications and academic research of joint action monitoring.
The third and final cognitive process involved with joint action is prediction. Previous research in the field of sports science has revealed that sonification can improve the perception accuracy of movements \autocite{effenbergMovementSonificationEffects2005,schmitzObservationSonifiedMovements2013}, revealing one potential mechanism by which predictions in joint action may be facilitated using sonification. As predictions play an important role in joint action \autocite{sebanzJointActionBodies2006}, especially for tasks that require a high degree of synchronization \autocite{vesperMinimalArchitectureJoint2010}, we suggest that joint action performance can be improved by facilitating action prediction with the use of sonification. This is substantiated by the findings of \textcite{knoblichActionCoordinationGroups2003}, which revealed that joint action performance in a task requiring participants to predict joint outcomes improved when participants were provided with an external cue tone relating to their partner's actions. Further research needs to be conducted in order to determine whether auditory cues using sonification can facilitate prediction in other joint action contexts, and the present study aims to contribute to this body of research by using a joint action task that emulates the need for a high degree of synchronization.
Some of the questions that have been investigated in recent joint action research concern the aforementioned cognitive processes (representations, action monitoring, and action predicting) and how they relate to agency (self vs other) and outcome (individual vs joint). Researchers have studied whether agents involved in joint action represent both their own task and their partner's task \autocite{loehrSoundYouMe2016}, whether they monitor individual outcomes or joint outcomes \autocite{loehrMonitoringIndividualJoint2013}, and how predictions of other's actions are incorporated when planning and executing actions \autocite{knoblichActionCoordinationGroups2003,kourtisPredictiveRepresentationOther2012,loehrTemporalCoordinationPerforming2011,vesperAreYouReady2013}. Based on the literature, a common denominator between sonification and joint action appears to be synchronization. For this reason, we identify a potential application for sonification, particularly in joint action tasks that require a high degree of synchronization. Previous research has found that sonification can improve synchronization in joint action by addressing central motor representations \autocite{schmitzPerceptualEffectsAuditory2012} and monitoring \autocite{vesperMinimalArchitectureJoint2010}. More generally, several researchers have argued that having more precise representations may be key to improving synchronization in joint action \autocite{boltSensoryAttenuationAuditory2021,kellerPianistsDuetBetter2007,kourtisPredictiveRepresentationOther2012,sebanzJointActionBodies2006,wenkeWhatSharedJoint2011}. Sonification also appears to be useful for sensorimotor learning by providing auditory feedback of movements \autocite{bevilacquaSensoriMotorLearningMovement2016} and in the present study we address this aspect by giving subjects a novel joint action task with varying methods of auditory feedback.
In summary, this study adds to the discussion about strategies relating to joint action representations, namely self-other and joint outcome representations, by investigating their potential effect on synchronization. Participants in our study performed a novel joint action task under three different conditions -- individual outcome, joint outcome, and a control condition -- where the sensory consequences were manipulated through the real-time sonification of movement. The sonification was used to prime the participants' attention towards either individual or joint, while joint task synchronization was recorded.
\hypertarget{research-question-lb-lr}{%
\subsubsection{Research question (LB, LR)}\label{research-question-lb-lr}}
Is synchrony optimized when focusing on self-other representations or joint outcome representations?
\hypertarget{low-latency-motion-capture-sonification-validation-experiment-lr}{%
\section{Low Latency Motion Capture Sonification Validation Experiment (LR)}\label{low-latency-motion-capture-sonification-validation-experiment-lr}}
A pilot experiment was conducted to assess the viability of the sonification framework in a laboratory setting. This experiment required blindfolded subjects to move their assigned sleds along parallel tracks and use sounds they hear to remain as spatially synchronized on their tracks as possible.
\hypertarget{participants-lr}{%
\subsection{Participants (LR)}\label{participants-lr}}
An availability sample of ten subjects (age range 20-29 years; 5 female, 4 male, 1 gender-fluid; 7 right-handed, 2 left-handed, 1 ambidextrous) were recruited to participate in pairs. Subjects optionally reported basic demographic information regarding age range (intervals of 10, i.e.~10-19, 20-29, \ldots, 90-99), gender, handedness, years of formal music training, and reported if they were known to be tone-deaf (6 not tone-deaf, 4 unknown). Subjects reported a mean of 2.4 years of formal music training (SD=4.2; min=0.0; max = 12.0 years). Due to this experiment being a pilot, five subject pairs were regarded as sufficient to validate the experimental setup as well as gather preliminary data on movement synchronization for the three conditions. Additionally, the number of possible participants was constrained by limited access to the motion capture system laboratory, which is shared by other researchers, meaning that subjects needed to be available at the scheduled lab times within the study timeframe.
\hypertarget{track-and-sleds-lr}{%
\subsection{Track and Sleds (LR)}\label{track-and-sleds-lr}}
Two parallel tracks were designed with a sigmoid curve shape and surfaced with a smooth veneer that allowed for free movement along the length of the track. Two identical sleds were constructed from LEGO parts, and three felt adhesive pads were attached to the underside of each sled to reduce resistance during movement. The sleds were coated with a matte black paint to limit near-infrared reflectivity \autocite{benedictSurveyMaterialsCoatings2016}, as prior tests with unpainted LEGO bricks introduced artifacts into the motion capture system that were incorrectly identified as markers.
\hypertarget{frequency-range-selection-lr-lb}{%
\subsection{Frequency Range Selection (LR, LB)}\label{frequency-range-selection-lr-lb}}
Two distinct, continuous frequency ranges were selected for application in the sonification conditions. These ranges are offset by a perfect fifth and span eight semitones. The \emph{overtone} (the tone with the higher frequency) range was chosen based on a center frequency of 220 Hz (A3) and the range was limited to avoid a large overlap with the \emph{undertone} (the tone with the lower frequency) range during normal operation (Table \ref{tab:frequency-ranges}). Consideration was also given to creating ranges that were not sufficiently high to cause discomfort at a consistent amplitude. \textcite{setharesSoundSound2005} shows that the perceived dissonance between two tones varies by the lower tone's frequency, indicating that the selected undertone range of approximately 116 -- 227 Hz would result in an increased perceived dissonance as the interval distance decreased (see Figure \ref{fig:sensory-dissonance}). The selected ranges were tested during various simulation trials and were harmonious sounding when they were at the perfect fifth interval, and were dissonant as the tones deviated from a perfect fifth.
\begin{table}[!h]
\begin{threeparttable}
\caption{\label{tab:frequency-ranges}Frequency ranges for the two tones used in the sonification conditions.}
\centering
\fontsize{7}{9}\selectfont
\begin{tabular}[t]{>{}l>{}l>{}r>{}c>{}r>{}c>{}r}
\toprule
\multicolumn{1}{c}{} & \multicolumn{2}{c}{Lower Bound} & \multicolumn{2}{c}{Center} & \multicolumn{2}{c}{Upper Bound} \\
\cmidrule(l{3pt}r{3pt}){2-3} \cmidrule(l{3pt}r{3pt}){4-5} \cmidrule(l{3pt}r{3pt}){6-7}
& Freq & Note\textsuperscript{a} & Freq & Note\textsuperscript{a} & Freq & Note\textsuperscript{a}\\
\midrule
Overtone & 174.614 & F3 & 220.000 & A3 & 277.183 & C\#4\\
Undertone & 116.409 & A\#2 & 146.666 & D3 & 184.788 & F\#3\\
\bottomrule
\end{tabular}
\begin{tablenotes}
\small
\item []
\rightskip2em
{\footnotesize \sffamily \textsuperscript{a}Note names are in International Pitch Notation, and are the closest approximation to the frequencies used}
\item []
\rightskip2em
{\footnotesize \sffamily \textit{Note.} Overtone frequencies were calculated to have a center frequency of 220Hz, and undertone frequencies are two-thirds of their overtone counterparts.}
\end{tablenotes}
\end{threeparttable}
\end{table}
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/setharesTuningTimbreSpectrum2010_p47}
}
\caption{Sensory dissonance of sine waves by interval for five frequencies. Figure \autocite[p.~47]{setharesSoundSound2005}}\label{fig:sensory-dissonance}
\end{figure}
\hypertarget{task-and-procedure-lr}{%
\section{Task and Procedure (LR)}\label{task-and-procedure-lr}}
Participants were asked to sit on opposite sides of the track structure and familiarize themselves with the movement of the sleds along the tracks. They were instructed to continuously move the sleds along the track from end to end, as rapidly as possible while remaining spatially synchronized with their partner's position on their respective track, using sounds they may hear during the various conditions to assist them. Once the participants had given their informed consent and been briefed, they would indicate when they were ready and were blindfolded for the duration of all trials within each condition. The experiment flow control was automated, starting with a practice trial of 30 seconds, then a 15-second break, followed by three experimental trials running for 90 seconds each with 15-second breaks between trials. Three tones played immediately before each trial to indicate the start of the trial, and a single sustained tone played at the end of the trials to indicate the completion, after which participants were asked to return their sleds to the start of the track. Sonification and recording were paused between conditions to allow sufficient time for subjects to rest. When the subjects were ready, a hardware button on the Bela was pressed to begin the next condition.
\hypertarget{sonification-strategy-conditions-lr-lb}{%
\subsection{Sonification Strategy Conditions (LR, LB)}\label{sonification-strategy-conditions-lr-lb}}
The experiment consisted of three conditions that employed different sonification mapping strategies, namely: a control condition with no sonification, a task-oriented sonification strategy for self-other representation, and a synchronization-oriented sonification strategy for joint-outcome representation. Each condition consisted of one practice trial of 30 seconds duration and three main trials of 90 seconds each. Before each practice trial, subjects were reminded that it was a shorter trial and that they could use it to experiment with the sonification.
\hypertarget{no-sonification-lr}{%
\subsubsection{No sonification (LR)}\label{no-sonification-lr}}
In the No Sonification condition, only the motion capture data from participants' sleds were recorded, and subjects could use the audible sounds of the sleds moving along the track to align themselves with their partners.
\hypertarget{task-oriented-sonification-strategy-lr-lb}{%
\subsubsection{Task-oriented sonification strategy (LR, LB)}\label{task-oriented-sonification-strategy-lr-lb}}
The task-oriented sonification represented the position of each sled along the length of the track as a synthesized tone that varied in frequency from highest to lowest at the start and end of the track respectively. One sled produced a higher frequency overtone, while the other produced a lower frequency undertone. If the sleds were at the same x-coordinate, the two tones would be a perfect fifth apart, creating a harmonious interval; if the sleds drifted further apart, the frequency difference would deviate from the perfect fifth and create a more dissonant sound. Figure \ref{fig:task-illustration} illustrates the implementation of the task-oriented sonification strategy. This strategy was selected for sonifying the movement along the track, i.e.~the task required of subjects. By using different tones for the two sleds, both self and other were continuously represented, and the rate of change in tone frequency (a proxy for velocity) and the consonance may be used by participants in order to maintain synchrony.
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/task_sonif_illustration}
}
\caption{Illustration of parameters used for the self-other representation (task-oriented) strategy. Each sled is mapped to a tone, the tone frequency is modulated based on the sled position along the track. When subject sled positions are synchronized, a perfect fifth is produced.}\label{fig:task-illustration}
\end{figure}
\hypertarget{synchronization-oriented-sonification-strategy-lr-lb}{%
\subsubsection{Synchronization-oriented sonification strategy (LR, LB)}\label{synchronization-oriented-sonification-strategy-lr-lb}}
The sonification strategy oriented around synchronization represented the position of the sleds relative to each other so that the two sleds at the same x-coordinates would create a harmonious perfect fifth interval. If sleds drifted apart, the overtone amplitude decreased, and the undertone frequency changed based on the distance between the two sleds. Figure \ref{fig:sync-illustration} illustrates the implementation of the synchronization-oriented sonification strategy. This strategy was selected to represent the joint outcome (level of synchrony), and by drawing attention to the distance between the two sleds, the overtone amplitude and the consonance may be used by participants for relative localization and synchrony.
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/sync_sonif_illustration}
}
\caption{Illustration of parameters used for the joint-outcome representation (sync-oriented) strategy. The amplitude of a constant-frequency overtone was modulated based on relative sled distance, while the undertone frequency was modulated. When the sleds aligned, a perfect fifth is produced.}\label{fig:sync-illustration}
\end{figure}
\hypertarget{hardware-and-software-implementation}{%
\section{Hardware and Software Implementation}\label{hardware-and-software-implementation}}
\hypertarget{motion-capture-lr}{%
\subsection{Motion Capture (LR)}\label{motion-capture-lr}}
Motion capture data were collected using a 9 camera (8 Qualisys Miqus M3 and 1 Qualisys Miqus Video) system connected to a Qualisys Camera Sync Unit. Marker data were acquired at a sampling rate of 300 Hz and video data were acquired at a sampling rate of 25 Hz. Qualisys Track Manager (QTM) software version 2022.2 (build 7700) was used to collect and process the data with real-time 3D tracking data output. QTM options for `processing of every frame' and `2D data preprocessing' were disabled for real-time output to ensure minimal latency.
\hypertarget{markers-lr}{%
\subsubsection{Markers (LR)}\label{markers-lr}}
For the experimental setup, one passive marker was placed on each car, and two additional passive reference markers were placed on the front corners of the track (see Figure \ref{fig:track-setup} for a visual representation of the track and marker placement). These additional markers provided reference points for the 3D orientation of the track and the cars across trials in case of accidental track movement. Four preliminary sessions of two minutes were recorded of variable speed sled movements in QTM, and unique labels were given to the four passive markers. The recordings were used for training a QTM Automatic Identification of Marker (AIM) model. AIM models were applied to recordings and real-time output to apply known labels to marker data, allowing the sonification to read the current position of both sleds.
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/track_dimensions}
}
\caption{Illustration depicting the experimental track setup and dimensions. Photos depict the constructed track as well as the sleds with the passive motion tracking markers.}\label{fig:track-setup}
\end{figure}
\hypertarget{sonification-lr}{%
\subsection{Sonification (LR)}\label{sonification-lr}}
Motion capture data were sent via UDP packets over USB networking to a Bela Mini device running version 0.3.8g running a custom C++ program\footnote{Source, data, and analyses are available at \url{https://github.com/zeyus/QTM_Bela_Sonification}}. The main program loop was configured to execute every 32 samples, with an output sample rate of \ensuremath{4.41\times 10^{4}} Hz for 2 audio channels. The two audio output channels were connected to a pair of Genelec G Two active speakers. The main program used the latest available Bela platform framework \footnote{Commit ID \texttt{42bbf18c3710ed82cdf24b09ab72ac2239bf148e} from 10 August 2022: \url{https://github.com/BelaPlatform/Bela/commit/42bbf18c3710ed82cdf24b09ab72ac2239bf148e}}. Figure \ref{fig:exp-graph} outlines the flow of data from motion capture to sonification.
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/exp-graph}
}
\caption{Diagram desgribing the low-latency sonification pipeline. Motion tracking cameras collected recordings of passive markers which were processed in QTM and 3D trajectory data were sonified on the Bela device which played audio on the connected loudspeakers.}\label{fig:exp-graph}
\end{figure}
Two 16-bit 44.1 kHz wave files were prepared from the output of a MIDI synthesizer at the lowest frequency for both of the tones and were cut off at 113145 samples where the zero-crossing of both files aligned. A 5ms fade-in and fade-out were applied to the start and end of the files to minimize DC pop during sample looping. To allow for dynamic frequency changes, sound file playback used a floating point read pointer which was incremented sequentially as the 32ms buffer was populated. When the playback frequency was increased, the step size of the read position would increase by the change in frequency, and would proportionally interpolate between samples to provide a smooth-sounding frequency transition. This method was implemented to ensure that the audio resolution was never below the original file's resolution.
\hypertarget{real-time-3d-data-lr}{%
\subsubsection{Real-time 3D Data (LR)}\label{real-time-3d-data-lr}}
A version of the Qualisys C++ SDK using protocol version 1.23 was modified to be compatible with the Bela platform and was used for communicating with QTM. To reduce latency, connection to the QTM server was made over UDP, and round-trip communication latency was verified by performing 1000 requests to the QTM server and logging the elapsed round-trip time, resulting in a mean latency of 0.25ms (SD 0.03ms, min 0.23ms, max 0.43ms).
Using the SDK, 3D streaming was initiated at the start of each sonification condition, and labeled markers were used to obtain the current position of each sled. The coordinates of the sleds were stored in a buffer containing the current and last recorded coordinates.
\hypertarget{experiment}{%
\subsection{Experiment}\label{experiment}}
\hypertarget{workflow-lr}{%
\subsubsection{Workflow (LR)}\label{workflow-lr}}
The experiment flow control was automated via the main C++ application running on the Bela mini. Before each experiment started, the condition order was configured in the application, and after compilation, the suite of conditions and trials would run. Before the commencement of each condition, the execution of the application halted to allow sufficient time for subjects to rest, after which a hardware button on the Bela could be pressed to continue. After the commencement of a condition, all trials for that condition were run consecutively with 15-second breaks between them.
\hypertarget{event-labels-lr}{%
\subsubsection{Event Labels (LR)}\label{event-labels-lr}}
From the main Bela application, event labels indicating the start of an experiment suite, the start and end of a condition, and the start and end of individual trials were sent to the QTM server. These labels appear in the recorded 3D data and were exported alongside the marker positions for use in analysis and enable data to be segmented into their respective conditions and trials.
\hypertarget{analyses}{%
\section{Analyses}\label{analyses}}
\hypertarget{data-preprocessing}{%
\subsection{Data Preprocessing}\label{data-preprocessing}}
\hypertarget{qtm-lr}{%
\subsubsection{QTM (LR)}\label{qtm-lr}}
Each session recorded had the AIM model applied to the duration of the recording, and labeled markers were manually verified and adjusted as required to ensure that for each completed trial, there was 100\% coverage of the marker data.
\hypertarget{d-data-lr}{%
\subsubsection{3D Data (LR)}\label{d-data-lr}}
3D data were exported from QTM and several preprocessing scripts were developed using the R programming language. Data were imported and collated by unique subject pair, condition, and trial using the indices of the associated event labels. Subsequently, practice trials, data outside of trials, and invalid trials were removed. Invalid trials were defined as trials that did not have both a start and an end event label. Trajectories were then created from marker x-coordinate time series using the R package \texttt{mousetrap} \autocite{mousetrap2021}, which was designed to aid analyses of mouse movement trajectories, and is able to be applied to arbitrary spatial data. The starting positions of trajectories were aligned to account for track movement between trials as well as axis misalignment, and x-axis trajectories were standardized within trials to have a mean of zero and a range from -1 to 1 allowing comparison between subject pairs, conditions, and trials. Visual inspection of trajectory data was performed, and six trials where participants had lost control of the sleds were truncated to the time of the incident. This left a total of 44 experimental trials (38 complete and 6 truncated, partial trials), meaning observation data were available for all subject pairs in all conditions, with a single trial unavailable for the No Sonification condition from one subject pair.
\hypertarget{subject-synchronization-lr}{%
\subsection{Subject Synchronization (LR)}\label{subject-synchronization-lr}}
Three methods were used on data collected from the experiment to assess the level of synchrony between the participants in the various sonification conditions. The first method, absolute spatial distance (delta), provides a simple measure of the overall synchrony between the sleds but does not account for the temporal dynamics of the movement. The second method, Instantaneous Phase Angle Difference, provides real-time information about the synchrony of the movement, but may be sensitive to noise in the data and requires a good understanding of the mathematics involved. The third method, Dynamic Time Warping, offers a detailed analysis of the temporal dynamics of the movement and can handle differences in the speed of the movement, but may be computationally intensive and sensitive to the choice of the time-warping parameter. Each of the three methods has its strengths and limitations, and can provide insight into different aspects of subject synchronization during the experimental conditions.
\hypertarget{absolute-spatial-distance-lr}{%
\subsubsection{Absolute Spatial Distance (LR)}\label{absolute-spatial-distance-lr}}
Distance between subject sleds is a useful proxy for determining the level of success of synchronization, where a trial where subjects move perfectly together would result in a delta of zero for each time point, and large distances would indicate that they were unable to synchronize their sled movements. Absolute distance deltas between the standardized x-coordinates of subject pairs were calculated for each time point by trial and condition. These delta values were used to compute mean and standard deviation values for the experimental conditions. Furthermore, a linear mixed model was fit to the data.
\hypertarget{trajectory-mean-relative-instantaneous-phase-angle-lr}{%
\subsubsection{Trajectory Mean Relative Instantaneous Phase Angle (LR)}\label{trajectory-mean-relative-instantaneous-phase-angle-lr}}
Instantaneous phase is a commonly used method for assessing synchrony between two or more signals. It involves analyzing the time-varying phase of signals and computing the phase difference between them at each point. It is often used to analyze neural data such as EEG signals and other data where rhythmic fluctuations are present, including movement \autocite{varletComputationContinuousRelative2011a}. The phase relationship between signals can provide insight into the degree of synchronization between the signals. For the present study trajectory data were processed using a Hilbert transform, which resulted in the calculation of the instantaneous phase angle for each subject at each time point. The angle differences between subject pairs were calculated for each time point and the absolute differences in instantaneous phase angles were used to determine the mean angle difference and standard deviation for each condition. Although using the absolute value in the calculations removes information about the leader-follower dynamics, it allows a more useful mean value to be calculated from these data due to the continuous change in direction at the ends of the track creating jumps from 0 to 180 degrees and vice-versa, making the mean values converge towards 90 degrees. Additionally, a linear mixed model was fit to the resulting angles.
\hypertarget{dynamic-time-warping-lr}{%
\subsubsection{Dynamic Time Warping (LR)}\label{dynamic-time-warping-lr}}
Dynamic Time Warping (DTW) is a method used to compare the similarity of two or more sequences of data. DTW works by finding the optimal alignment between the two sequences by stretching or compressing one sequence so that the difference between the sequences is minimized \autocite{mullerDynamicTimeWarping2007}. This alignment is represented by a mapping function that indicates the relationship between the two sequences at each point in time. DTW has found applications in a wide range of fields, including speech recognition, music analysis, and in joint action \autocite{hochDancingTogetherInfant2021}. For the present study, DTW was used to compare the trajectories of the two sleds and assess the level of synchrony between them over time, allowing for a more detailed analysis of the movement patterns. For the analysis, data were decimated, resulting in a sample rate of 30 Hz which reduced the computation time significantly and comparisons between several full-rate and down-sampled DTW analyses resulted in comparable normalized path distances. DTW path distances were calculated with the R package \texttt{dtw} \autocite{R-dtw} using the Sakoe-Chiba windowing method \autocite{gelerDynamicTimeWarping2019} with a window size of 90 samples (3 seconds) and using the \texttt{symmetric2} step pattern. Furthermore, a linear mixed model was fit to the normalized path distances.
\hypertarget{results}{%
\section{Results}\label{results}}
\hypertarget{absolute-spatial-distance-lr-1}{%
\subsection{Absolute Spatial Distance (LR)}\label{absolute-spatial-distance-lr-1}}
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/parwise_position_delta}
}
\caption{Bar plot of mean normalized x-coordinate deltas between subject pairs for each condition. Error bars show standard deviation. No Sonification condition normalized delta mean = 0.169 (SD = 0.069), task sonification condition normalized delta mean = 0.227 (SD = 0.164), sync sonification condition normalized delta mean = 0.161 (SD = 0.067).}\label{fig:pairwise-position-delta}
\end{figure}
Analysis of normalized x-coordinate deltas showed 0.169 ± 0.069 for the No Sonification condition,
0.227 ± 0.164 for the Task condition, and
0.161 ± 0.067 for the Sync condition. (Figure \ref{fig:pairwise-position-delta}).
We fitted a linear mixed model (estimated using REML and nloptwrap optimizer) to predict Position Delta with Condition (formula: \texttt{Position\ Delta} \textasciitilde{} Condition). The model included Subject Pair as random effects (formula: list(\textasciitilde1 \textbar{} Subject Pair, \textasciitilde1 \textbar{} Trial)). The model's total explanatory power is moderate (conditional R2 = 0.19) and the part related to the fixed effects alone (marginal R2) is of 0.02. The model's intercept, corresponding to Condition = No Sonification, is at 0.17 (95\% CI {[}0.08, 0.25{]}, t(2370020) = 3.78, p \textless{} .001). Within this model:
\begin{itemize}
\tightlist
\item
The effect of Condition {[}Task{]} is statistically significant and positive (beta = 0.06, 95\% CI {[}0.06, 0.06{]}, t(2370020) = 190.11, p \textless{} .001; Std. beta = 0.27, 95\% CI {[}0.27, 0.28{]})
\item
The effect of Condition {[}Sync{]} is statistically significant and positive (beta = 1.19e-03, 95\% CI {[}5.29e-04, 1.85e-03{]}, t(2370020) = 3.54, p \textless{} .001; Std. beta = 5.19e-03, 95\% CI {[}2.31e-03, 8.07e-03{]})
\end{itemize}
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95\% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution approximation.
\hypertarget{trajectory-mean-relative-instantaneous-phase-angle-lr-1}{%
\subsection{Trajectory Mean Relative Instantaneous Phase Angle (LR)}\label{trajectory-mean-relative-instantaneous-phase-angle-lr-1}}
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/mean_condition_phase_angles_mean_sd}
}
\caption{Plot of mean absolute instantaneous phase angles of experimental conditions with the length of the needles representing the standard deviation as a percentage of 90 degrees. No Sonification condition mean phase angle = 8.861 (SD = 0.067) degrees, task sonification condition mean phase angle = 14.211 (SD = 0.216) degrees, sync sonification condition mean phase angle = 8.749 (SD = 0.075) degrees.}\label{fig:mean-instantaneous-phase-angle-circular-plot}
\end{figure}
Analysis of the mean absolute instantaneous phase angle difference between subject pair Hilbert transformed trajectories gave
8.861° ± 0.067° (SD) for the No Sonification condition,
14.211° ± 0.216° (SD) for the Task condition, and
8.749° ± 0.075° (SD) for the Sync condition. (Figure \ref{fig:mean-instantaneous-phase-angle-circular-plot}).
We fitted a linear mixed model (estimated using REML and nloptwrap optimizer) to predict Relative IPA with Condition (formula: \texttt{Relative\ IPA} \textasciitilde{} Condition). The model included Subject Pair as random effects (formula: list(\textasciitilde1 \textbar{} Subject Pair, \textasciitilde1 \textbar{} Trial)). The model's total explanatory power is weak (conditional R2 = 0.10) and the part related to the fixed effects alone (marginal R2) is of 0.01. The model's intercept, corresponding to Condition = No Sonification, is at 11.20 (95\% CI {[}4.48, 17.91{]}, t(2370020) = 3.27, p = 0.001). Within this model:
\begin{itemize}
\tightlist
\item
The effect of Condition {[}Task{]} is statistically significant and positive (beta = 6.65, 95\% CI {[}6.58, 6.73{]}, t(2370020) = 168.83, p \textless{} .001; Std. beta = 0.25, 95\% CI {[}0.25, 0.26{]})
\item
The effect of Condition {[}Sync{]} is statistically significant and positive (beta = 0.54, 95\% CI {[}0.46, 0.62{]}, t(2370020) = 13.44, p \textless{} .001; Std. beta = 0.02, 95\% CI {[}0.02, 0.02{]})
\end{itemize}
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95\% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution approximation.
\hypertarget{dynamic-time-warping-lr-1}{%
\subsection{Dynamic Time Warping (LR)}\label{dynamic-time-warping-lr-1}}
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/dtw_summary}
}
\caption{Distribution density plot of normalized DTW path distance between paired trajectories by condition. The mean normalized path distance is shown as a point (+), the 20th and 80th percentiles are shown as dotted lines and the 50th percentile is shown as a solid line.No Sonification condition normalized path distance mean = 0.014 (SD = 0.01), task sonification condition normalized path distance mean = 0.028 (SD = 0.025), sync sonification condition normalized path distance mean = 0.016 (SD = 0.007).}\label{fig:dtw-plot}
\end{figure}
Analysis of DTW results gave a mean normalized path distance of 0.014 ± 0.01 for the No Sonification condition,
0.028 ± 0.025 for the Task condition, and
0.016 ± 0.007 for the Sync condition. (Figure \ref{fig:dtw-plot}).
We fitted a linear mixed model (estimated using REML and nloptwrap optimizer) to predict Normalized Distance with Condition (formula: \texttt{Normalized\ Distance} \textasciitilde{} Condition). The model included Subject Pair as random effect (formula: \textasciitilde1 \textbar{} \texttt{Subject\ Pair}). The model's total explanatory power is substantial (conditional R2 = 0.34) and the part related to the fixed effects alone (marginal R2) is of 0.12. The model's intercept, corresponding to Condition = No Sonification, is at 0.01 (95\% CI {[}3.01e-03, 0.02{]}, t(39) = 2.58, p = 0.014). Within this model:
\begin{itemize}
\tightlist
\item
The effect of Condition {[}Task{]} is statistically significant and positive (beta = 0.01, 95\% CI {[}2.81e-03, 0.02{]}, t(39) = 2.55, p = 0.015; Std. beta = 0.80, 95\% CI {[}0.16, 1.44{]})
\item
The effect of Condition {[}Sync{]} is statistically non-significant and positive (beta = 2.06e-03, 95\% CI {[}-8.80e-03, 0.01{]}, t(39) = 0.38, p = 0.703; Std. beta = 0.12, 95\% CI {[}-0.52, 0.76{]})
\end{itemize}
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95\% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution approximation.
\hypertarget{discussion-lr-lb}{%
\section{Discussion (LR, LB)}\label{discussion-lr-lb}}
Although we could not completely isolate the joint action representations, partially due to sound created from the movement of the sleds, our sonification strategies primed attention towards either self-other monitoring by sonification of both sled positions independently, or joint outcome by sonification of the positions of each sled relative to the other. The results of the models fit for calculated synchrony measures all showed that the No Sonification condition performed best, with the lowest distance delta, relative instantaneous phase angle, and DTW path length, and the Task condition performed the worst. The model estimates were reported as statistically significant for all three measures. The Sync condition was marginally worse than the No Sonification condition, but the model results were only statistically significant for the models applied to distance delta and relative instantaneous phase angle.
Despite the fact that these results may initially seem surprising, we identify potential contributing factors that may have affected the outcome. All participants reported during debriefing that they subjectively felt that the No Sonification condition was the easiest for them, even though they were blindfolded for all the conditions. One of the more obvious possible reasons is therefore that the sound of the sleds sliding on the track, although not very loud, provided spatial information to the subjects in a familiar way, i.e., binaural input that allows for auditory localization. Individuals with unimpaired hearing can use auditory cues to discriminate the spatial location of movement, and the level of accuracy is dependent on the velocity of movement (degrees/second), with lower velocity generally resulting in higher accuracy \autocite{carlilePerceptionAuditoryMotion2016}. In our experiment, the velocity of the sleds in degrees/second is estimated to be relatively low\footnote{Given the track length of 1130mm and a head positioned 700mm perpendicular from the track, there are approximately 78 degrees between the start and end of the track, giving an approximate velocity of 26 degrees/second and 4 degrees/second for the highest and lowest frequency subjects respectively.}, meaning that the naturally occurring sounds may have been better suited for this particular task compared to the sonifications. In the experimental conditions, the environmentral auditory localization is then somewhat masked by the sonification, which comes from speakers either in front or behind the subject, depending on the side of the track they were seated at. This issue could be addressed in several ways. First, by adding a condition where, in addition to vision, the subjects' hearing is also restricted, giving a baseline level of synchrony. Second, by using some form of spatial cues for the sonification conditions, in the form of stereo separation that mirrors track position. Finally, by requiring headphones for all the conditions and implementing a real-time binaural synthesis \autocite{tommasiniComputationalModelImplement2019}.
Another potential confounding factor is the implementation of the sonification strategies. We initially planned for them to have at least a stereo component, but due to hardware constraints and time limitations, the sonification was restricted to using speakers situated in the room. Both subjects heard the same audio during the experiment, which makes it impossible to sonify the position of a single subject relative to the other. In the Task condition, this is less important, as their absolute position on the track is sonified, but in the Sync condition, this would provide subject-specific information about their relative location. Although the mapping of the pitch was consistent across all trials because participants did not change seats during the experiment, this limitation meant that during the Sync condition, subjects had to learn the mapping of the undertone to their sled in order to extract information about their relative location. The strategies also differed in terms of the sound that was produced under circumstances where the two sleds moved in synchrony; although both sonification strategies played a perfect fifth interval, the Sync condition's two tones held the same pitch, whereas the Task condition's two tones changed pitch in parallel. This was intentional, as the self-other representation needed to convey information on the location of both sleds as opposed to the joint-outcome representation, which focused on the level of synchronization, but this also meant that there was no direct connection between the amplitude modulation of the overtone in the Sync condition and the frequency modulation of the overtone in the Task condition. These issues may also be mitigated by the use of headphones, allowing additional parameters to be mapped to panning instead. It may also be beneficial to investigate other parameters such as sound brightness, which may perform better than pitch \autocite{mcdermottMusicalIntervalsRelative2010}.
One further limitation is the frequency range that was used in the experiment. Although testing demonstrated that the frequency range sounded harmonious and not irritating, a mistake in the software synthesizer tone generation process nevertheless resulted in the output frequency becoming one octave lower than the initially selected note\footnote{The selected software synthesizer was a bass virtual instrument that outputs a midi note one octave below (i.e.~A4, 440Hz would be rendered as A3, 220Hz)} (see Figure \ref{fig:stimuli-spectra}). As such, it may have resulted in sonification that was perceived as generally more dissonant than the originally intended center frequency of 440Hz (Figure \ref{fig:sensory-dissonance}). It may also be useful to try strategies where synchrony creates an octave interval or a unison, rather than a perfect fifth interval. Both behavioral and neural research indicate that there are dips in perceived consonance (inversely, an increase in perceived dissonance) on either side of perfect fifth and octave intervals, but the difference is larger at an octave interval, even in non-musicians \autocite{setharesLocalConsonanceRelationship1993,bidelmanNeuralCorrelatesConsonance2009}. This may help subjects recognize when they have achieved perfect synchrony, although it may also make slight variances become overly pronounced.
\begin{figure}[h]
{\centering \includegraphics[width=1\linewidth]{figures/spec_tones}
}
\caption{Spectral analysis of the two base frequency audio files from 10 to 10,000 Hz}\label{fig:stimuli-spectra}
\end{figure}
Due to the small sample size, no analyses were done on learning effects, but we propose that sonification strategy selection may also have an impact on the rate of learning over the course of multiple trials, and that it may be of interest for practical applications of sonification, especially in the field of sports science. Additionally, collecting information on subjective assessments of synchrony (such as the Inclusion of Other in Self survey) may provide further valuable insight into where measured and perceived synchrony align.
Some final issues that should be noted include the insufficient number of reference markers in our setup, making 3D transformations impossible. The addition of markers sufficient for 3D plane calculation would make the experimental setup more robust and portable, as it would enable recording of trajectories from a track rotated to any angle and performing the 3D transformation required to align them. Additional markers at the start and end of each track would make it possible to align trajectories when the track was displaced, and they would also facilitate the segmentation of trials into end-to-end runs, which may provide further avenues for analysis, as participants in our experiment frequently aligned at either end of the track.
\hypertarget{conclusion}{%
\section{Conclusion}\label{conclusion}}
Our study has successfully demonstrated that real-time sonification of motion tracking data is achievable in a joint action lab experiment. This method uses low-cost hardware for the sonification pipeline, along with open source code; it can therefore be used for research in the field of joint action with minimal budgetary requirements, and be adapted to sonify data from any available source, including video feeds, remote devices (e.g.~Wiimotes) and touch devices. The mapping of sonification parameters may also be extended to arbitrary audio features including virtualized 3D sound. We also showed that sonification strategy selection does have an impact on synchrony, and that, while further investigation is needed, joint-outcome sonification may be a promising strategy, especially if some of the auditory localization aspects of naturally occurring sounds can be implemented.
\balance
\clearpage
\printbibliography[title=References,heading=bibintoc]
\end{document}