1 Introduction
This paper introduces the Magdeburg Tool for Video Experiments (MTVE). MTVE is free citeware intended to assist researchers who want to capture audio and video data in laboratory or online experiments. By capturing the videos locally, MTVE avoids frequent issues, e.g., changing resolution rates, and enhances the high quality of transcription of communication. The tool is browser-based and compatible with typical software, e.g., z-Tree (Fischbacher, Reference Fischbacher2007) or oTree (Chen et al., Reference Chen, Schonger and Wickens2016). All files (video, audio, transcriptions) are stored on the servers of the respective laboratory. Thus, MTVE enables better protection of the data of the participants.
Although communication has been investigated in experimental economics from early on (Isaac & Walker, Reference Isaac and Walker1988, Reference Isaac, Walker and Palfrey1991) even recent high-quality reviews on the implementation of communication in the laboratory do not yield unambiguous findings (Brandts et al., Reference Brandts, Cooper, Rott, Schram and Ule2019). Instead, two insights can be made. First, if researchers are interested in one specific aspect of communication, it is valuable to restrict all parameters but the one of interest. This may refer to analyzing the effects of communication medium (Bochet et al., Reference Bochet, Page and Putterman2006; Brosig et al., Reference Brosig, Weimann and Ockenfels2003; Cason & Khan, Reference Cason and Khan1999; Greiner et al., Reference Greiner, Güth and Zultan2012, Reference Greiner, Caravella and Roth2014; Isaac & Walker, Reference Isaac and Walker1988), direction (Cooper & Kühn, Reference Cooper and Kühn2014), order (Cooper et al., Reference Cooper, DeJong, Forsythe and Ross1989, Reference Cooper, DeJong, Forsythe and Ross1992; Ottaviani & Sørensen, Reference Ottaviani and Sørensen2001), duration (Karagözoğlu & Kocher, Reference Karagözoğlu and Kocher2016).
Second, unless there are precise reasons to restrict communication, free-form communication should be preferred as it is more externally valid. Yet, implementing free-form face-to-face communication in the lab leads to the question of how to analyze it (Brandts et al., Reference Brandts, Cooper, Rott, Schram and Ule2019). A simple alternative is to use video communication. Conducting video communication enables researchers to record and later analyze the entire communication. This goes beyond the typical content analysis (Penczynski, Reference Penczynski2019; Xiao & Houser, Reference Xiao and Houser2005) but may include analyzing facial expressions or voices as it is known that these are relevant for economic decisions (Antonakis et al., Reference Antonakis, d’Adda, Weber and Zehnder2021; Bershadskyy, Reference Bershadskyy2023; Centorrino et al., Reference Centorrino, Djemai, Hopfensitz, Milinski and Seabright2015; Hopfensitz & Mantilla, Reference Hopfensitz and Mantilla2019).
To analyze videos from experiments automatically, researchers can either use proprietary software such as FaceReader (Serra-Garcia & Gneezy, Reference Serra-Garcia and Gneezy2021) or specially trained algorithms (Othman et al., Reference Othman, Saxen, Bershadskyy, Werner, Al-Hamadi and Weimann2019). Both approaches have different pros and cons.Footnote 1 However, both require high-quality data. This is where MTVE will help. In contrast to highly specialized software designed to record videos for research purposes, e.g., Noldus VisoFootnote 2 or Mangold VideoSyncPro,Footnote 3 MTVE is open-source and can be used without specialized hardware. While other companies are offering to design similar products, such solutions are costly.Footnote 4 An alternative would be to use video conference software, e.g., Zoom, Skype, or Big Blue Button (open source) as is done in some publications (Dulleck et al., Reference Dulleck, He, Kidd and Silva-Goncalves2017; Kachelmeier & Rimkus, Reference Kachelmeier and Rimkus2022; Li et al., Reference Li, Leider, Beil and Duenyas2021). Still, such software is designed for conferencing and is not specialized in the requirements for analyzing communication in experiments, which we discuss in the following.
The remaining article is structured as follows. In Sect. 2, we will describe the requirements for data from video conferences such that it can be analyzed through automatic tools. In Sect. 3, we explain how MTVE tackles these issues. Sections 4 and 5 focus on technical requirements and limitations respectively. Section 6 concludes.
2 Quality of video data analysis
In this section, we briefly discuss what is required to obtain an integrated analysis of video communication in experiments. We distinguish three major research factors: (i) video, (ii) audio, (iii) content and extend by a fourth factor—(iv) user-friendliness of the tool as it contributes to the replicability of experiments.
The most evident parameter of video data is resolution. High resolution is required to obtain an appropriate level of detail, e.g., participants’ facial expressions. Yet, what is appropriate is mostly unclear. A good rule of thumb is to say the higher the resolution the better, given that it is always possible to decrease the resolution afterward as in Dudzik et al. (Reference Dudzik, Columbus, Hrkalovic, Balliet and Hung2021). What is equally important, is the framerate measured in frames per second (FPS). Recording at 60 FPS provides twice the data as at 30 FPS and increases the chances of algorithms detecting very short-lived movements. Similar to resolution, it is possible to decrease the FPS later.Footnote 5 Yet, for the analysis, it is important to keep FPS and resolution as constant as possible which normal videoconference tools (e.g., Zoom) cannot achieve as both parameters depend on the bandwidth.
Concerning audio, different parameters can be measured (e.g., volume, prosody). To measure these well, it is important to establish a silent surrounding to avoid acoustic disturbances, which is usually easy to achieve. The problem arises when more than one person takes part in the communication. This concerns the so-called “Cocktail Party Problem” (Cherry, Reference Cherry1953) which refers to the remarkable ability of humans to identify individual sources of acoustic input (e.g., voices) in a noisy environment. At the same time, it poses problems for neural networks (Haykin & Chen, Reference Haykin and Chen2005) making it difficult to separate speakers solely based on audio input (Ephrat et al., Reference Ephrat, Mosseri, Lang, Dekel, Wilson, Hassidim, Freeman and Rubinstein2018). Therefore, it would be useful to solve this problem beforehand.
Referring to content, experimental economics has established a few different approaches summarized in Brandts et al. (Reference Brandts, Cooper, Rott, Schram and Ule2019). Independent of the chosen approach, the first step is to obtain the conversation as a text. This requires the experimenters to transcribe the text manually, outsource it to transcription companies, or use speech-to-text software. Yet, such software also suffers from the Cocktail Party Problem.
Finally, we consider user-friendliness from the perspective of replicability of experiments. The easier it is to replicate an experiment, the more likely is a replication. Further, as video conferences handle personalized data (e.g., faces), it is preferable not to share participants’ data with other companies (e.g., Skype, Zoom). Therefore, the goal is an open-source tool with an intuitive interface that is operable for normal laboratory and online experiments and where the data does not leave the digital space of the laboratory.
3 The structure of MTVE
MTVE consists of three apps (Meetings App, Video App, and Transcription App). The Meetings App is used to organize the rooms. When participants join the created room, the Video App starts. After communication ends, the experimenters can start transcribing data using the Transcription App. Altogether they tackle the issues discussed in Sect. 2. In this section, we briefly discuss the structure of MTVE and refer to a more detailed description on GitHub (https://github.com/MaXLab-OVGU/MTVE).
Using the simple user interface (UI) of MTVE (see Fig. 1a), experimenters can create a room with a few clicks and configure different features (see Fig. 1b). After the communication room is created, the experimenter receives a simple link that can now be implemented into the experiment. In oTree this will start a new browser tab and in z-Tree it will start a new browser window. Since the rooms can be reused, nothing has to be changed between the sessions. Further, we integrated the option for the experimenters to close the room remotely.
All that is required to do, is that laboratory managers install MTVE on their servers. MTVE comes along with online documentationFootnote 6 that explains how to do so. Laboratory managers become administrators of MTVE on their local server and guide the experimenters on how to register an account. MTVE provides a simple registration procedure where the experimenters create their accounts using e-mail and password. After confirmation, the experimenter can log in and create their experiments. The laboratory manager as the administrator is permitted access to all accounts, enabling them to support novice experimenters.
From the perspective of experimental participants, all they see is a waiting screen (see Fig. 1c) as long as the predetermined group size is not reached. The communication starts (see Fig. 1d) once every participant has joined the room. The tab can be closed automatically using z-Tree or oTree without the need for participants’ engagement.
The most important feature of MTVE is how it avoids the Cocktail Party Problem. Before the audio–video signal is uploaded to the server, combining all audio–video streams into a joint communication, MTVE captures an additional copy of the signal that is saved as a separate file and is automatically transferred to the laboratory server separately. This approach (see Fig. 2) does not only solve the Cocktail Party Problem but further stabilizes the quality (e.g., resolution) of the individual video files.
These individual files can now be transcribed using a variety of different voice-to-text AI models (e.g., Whisper, VOSKFootnote 7). The quality of transcription depends on the audio quality (e.g., headphones, background noises) and the model itself. We provide one possible solution within our tool, yet stress that researchers could try different models and foremost should recheck the transcriptions.Footnote 8 Still, even if transcribing individual voices worked well, it is necessary to merge individual transcriptions. Here, MTVE offers a solution that merges transcriptions of individual video files into one CSV, generating the original chat structure of communication.
Altogether, MTVE is easy to use for all groups of users (laboratory managers, experimenters, and experimental participants) and yields the experimenters a standardized set of communication data that can be analyzed. This leads to the remaining question of whether the technological benchmark for the laboratories is sufficiently low.
4 Technical requirements and license
In our discussion of the technical requirements of MTVE, we focus on three parts: Meetings App (UI), Video App, and Transcription Utility. We stress that the requirements for the first ones are moderate yet can be high for the last one. The next paragraphs shall discuss these issues in more detail and we provide additional information on GitHub.
The UI consists of an app created in Python with the Django framework. Both can be considered very common components and do not impose major challenges on a typical experimental laboratory. The same is true for the MySQL server used to store and retrieve the data. Second, the video app consists of an express.js server created and ran using Node js, and an OpenVidu instance running on a separate server to handle video streams. The Meetings App and the Video App connect to the same MySQL server to store and retrieve data. Concerning technical equipment, we indicate that, when researchers run too many rooms simultaneously, this may affect the quality of the recordings (e.g., saved videos may be shorter than set). This issue depends on the server capacity. We provide performance tests for certain reasonable combinations of room sizes and number of rooms on GitHub to give researchers an estimate on the required server setup. Still, we advise researchers to run a few tests on their own.
The largest technical requirement comes from the transcription tool. The first step consists of a Python module Pydub (FFMPEG) to convert video files to wav files. In the second step, it applies a predefined transcription model. These models, however, can impose more or less requirements. For instance, Whisper requires up to 10 GB VRAM while VOSK models require up to 4.4 GB for the German language model. We stress, that researchers can implement the model of their choice. In the end, the script that puts the separate texts together depends on the filenames that are generated implying that these shall not be changed.
MTVE is open-source, licensed under an adaptation of the MIT license.Footnote 9 We ask researchers to cite this paper when using MTVE for academic or other publications. The source code for MTVE can be downloaded for free from GitHub. Contributions and improvements to the source code are welcome and should be submitted using GitHub, too.
5 Limitations
As of the current version (MTVE 1.1), the tool has certain limitations, which we present in this chapter. It is our goal to improve on these. We refer to the GitHub projectFootnote 10 for possible updates and pull requests.
While MTVE supports resolution up to 3840 × 2160 and 60FPS it is currently not possible to combine both. Recording with 60 FPS is only possible for HD resolution and lower.
Further, we stress again that the quality of the transcription does not depend on MTVE but on the actual model chosen. We highlight that whatever model is chosen, has to produce time stamps, so MTVE can merge the individual text files into one.
MTVE was tested on Chrome and Firefox. To guarantee a simultaneous entry to the conversation for all participants, it is essential to configure the browser in such a way that the application of the webcam does not need further acceptance from the user. Otherwise, some participants join the conversation but cannot be seen by others until they accept manually.
6 Conclusion
All in all, MTVE enables experimenters to gather communication data from experiments with video conferences in an integrated approach. The tool is easily implementable in z-Tree, oTree, and other software. It saves high-quality videos. Further, applying a simple trick, it avoids the Cocktail Party Problem and enables more sophisticated analysis of the voices. MTVE comes with a simple user interface for the experimenter and comparatively low technical requirements for the laboratory. The tool is open-source (citeware) and can easily be adapted to local servers of any laboratory. Doing so enables researchers to keep subjects’ data on private servers and strengthens local data protection regulations.
Acknowledgements
We acknowledge financial support from the German Research Foundation (DFG) through the project number 468478819. The authors would like to thank the editor Lionel Page and two anonymous reviewers for their comments and suggestions. Moreover, we acknowledge valuable advices on functionalities of MTVE from: Ayoub Al-Hamadi, René Degenkolbe, Laslo Dinges, Marc-André Fiedler, Nina Ostermaier, Myra Spiliopoulou, and Joachim Weimann. We further acknowledge the support of Adrija Ghosh, Raviteja Sutrave, and Sourima Dey during a scientific student project.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Appendix A
Use Case 1
MTVE is used in z-Tree to enable communication between the participant and experimenter to record participants’ answers in the seminal dice rolling experiment (Fischbacher & Föllmi-Heusi, Reference Fischbacher and Föllmi-Heusi2013). Here, after rolling the dice, the participants start the communication and inform the experimenter of what they claim to have rolled.
Use Case 2
MTVE is used in oTree to analyze the experiment from Belot and van de Ven (Reference Belot and van de Ven2017) where there are two types of players. Player 1 has the incentive to lie about the true color of their card (red or black) and Player 2 has the incentive to detect the lie. Per session, there are five Players 1 and five Players 2. Using the round robin scheme all Players 1 and Players 2 are matched. Player 1 indicates the color to Player 2 using MTVE in a predefined time of 10 s. After that, the communication stops.
Use Case 3
MTVE is used in z-Tree to allow repeated communication between two subjects. Further, it is used to allow the experimenter to close the communication room remotely when needed.