Empathic accuracy in design: Exploring design outcomes through empathic performance and physiology

Álvaro M. Chang-Arana; Matias Piispanen; Tommi Himberg; Antti Surma-aho; Jussi Alho; Mikko Sams; Katja Hölttä-Otto

doi:10.1017/dsj.2020.14

Empathic accuracy in design: Exploring design outcomes through empathic performance and physiology

Part of: The Future of Design Cognition Analysis

Published online by Cambridge University Press: 03 July 2020

Álvaro M. Chang-Arana

Matias Piispanen ,

Tommi Himberg ,

Antti Surma-aho ,

Jussi Alho ,

Mikko Sams and

Katja Hölttä-Otto

Show author details

Álvaro M. Chang-Arana*: Affiliation:
Department of Mechanical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
Matias Piispanen: Affiliation:
Department of Neuroscience and Biomedical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
Tommi Himberg: Affiliation:
Department of Neuroscience and Biomedical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
Antti Surma-aho: Affiliation:
Department of Mechanical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
Jussi Alho: Affiliation:
Department of Neuroscience and Biomedical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
Mikko Sams: Affiliation:
Department of Neuroscience and Biomedical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland Department of Computer Science, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
Katja Hölttä-Otto: Affiliation:
Department of Mechanical Engineering, Aalto University, 02150, Betonimiehenkuja 5 C, Espoo P.O. Box 17700, FI-00076 AALTO, Finland
*: Email address for correspondence: [email protected]

Article contents

Abstract
Background
The current study
Method
Results
Discussion
Conclusion
Financial support
References

Rights & Permissions

Abstract

Empathic design highlights the relevance of understanding users and their circumstances in order to obtain good design outcomes. However, theory-based quantitative methods, which can be used to test user understanding, are hard to find in the design science literature. Here, we introduce a validated method used in social psychological research – the empathic accuracy method – into design to explore how well two designers perform in a design task and whether the designers’ empathic accuracy performance and the physiological synchrony between the two designers and a group of users can predict the designers’ success in two design tasks. The designers could correctly identify approximately 50% of the users’ reported mental content. We did not find a significant correlation between the designers’ empathic accuracy and their (1) performance in design tasks and (2) physiological synchrony with users. Nevertheless, the empathic accuracy method is promising in its attempts to quantify the effect of empathy in design.

Keywords

Empathic design empathy empathic accuracy user understanding EMG

Type: Research Article
Information: Design Science , Volume 6 , 2020 , e16

DOI: https://doi.org/10.1017/dsj.2020.14 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright: Copyright © The Author(s) 2020

1 Background

1.1 Empathy in design and engineering

The use of the term empathy has steadily increased over the past two decades in academic journals dealing with the business world (Köppen & Meinel Reference Köppen, Meinel, Plattner, Meinel and Leifer2015). This term is widely used in design approaches such as human-centred design or design thinking, both of which have been associated with successful projects or businesses (Brown Reference Brown2009; Kramer, Agogino & Roschuni Reference Kramer, Agogino and Roschuni2016). However, there is no widely accepted and consistently used definition of empathy in design. Empathy is defined in multiple ways: as a mindset, as a way of understanding others, as a method or as behaviour.

Extensive literature reviews (Kouprie & Sleeswijk Visser Reference Kouprie and Sleeswijk Visser2009; Strobel et al. Reference Strobel, Hess, Pan and Wachter Morris2013; Walther, Miller & Sochacka Reference Walther, Miller and Sochacka2017), borrowing definitions from psychology (Wong et al. Reference Wong, Norris, Siddique, Altan, Baldwin and Merchan-Merchan2016; Surma-aho, Björklund & Hölttä-Otto Reference Surma-aho, Björklund and Hölttä-Otto2018) and based on interviews with designers (Strobel et al. Reference Strobel, Hess, Pan and Wachter Morris2013; Hess, Strobel & Pan Reference Hess, Strobel and Pan2016) and observing designers (Hess & Fila Reference Hess and Fila2016) suggest that empathy is commonly equated with some type of comprehensive user understanding. For instance, empathy in design has been associated with user-understanding methods like immersing oneself in the dreams of a future user (Battarbee et al. Reference Battarbee, Baerten, Hinfelaar, Irvine, Loeber, Munro and Pederson2002), imposing extreme user-like features on designers (Vaughan, Seepersad & Crawford Reference Vaughan, Seepersad and Crawford2014; Pang & Seepersad Reference Pang and Seepersad2016) or on non-extreme users (Lin & Seepersad Reference Lin and Seepersad2007), understanding users through a combination of survey and sensor data (Ghosh et al. Reference Ghosh, Olewnik, Lewis, Kim and Lakshmanan2017), and projecting into a user’s life through using one’s imagination (Koskinen & Battarbee Reference Koskinen, Battarbee, Koskinen, Battarbee and Mattelmäki2003). Some studies define designer empathy as an outcome of user interaction – an increased ability to understand users and solve their issues (Raviselvam et al. Reference Raviselvam, Sanaei, Blessing, Hölttä-Otto and Wood2017; Raviselvam et al. Reference Raviselvam, Anderson, Hölttä-Otto and Wood2018). However, it is not clear when and how user understanding can be considered empathic. Some studies have attempted to clarify this situation by adopting more rigorous definitions of designer empathy, typically based on psychology research. One notable conceptualisation of designer empathy was developed by Kouprie and Sleeswijk Visser (Reference Kouprie and Sleeswijk Visser2009). They depicted a stepwise structure for designers to develop and use empathy with end users that involved the designer putting herself or himself in situations typical for the end user and doing tasks as if they were the user, eliciting information directly from users through various types of interaction and combining these two sources of information to achieve comprehensive and empathic understanding (Kouprie & Sleeswijk Visser Reference Kouprie and Sleeswijk Visser2009).

Several other aspects of empathy inherent to design have been identified. Experienced designers value empathy more than their younger colleagues (Hess et al. Reference Hess, Strobel, Pan and Wachter Morris2017). Designers should empathise with both their peers and the end users (Strobel et al. Reference Strobel, Hess, Pan and Wachter Morris2013; Köppen & Meinel Reference Köppen, Meinel, Plattner, Meinel and Leifer2015). Designers should alternate between empathic thinking and analytical thinking (Walther et al. Reference Walther, Miller and Sochacka2017). It has also been suggested that empathy for users is not only important when designers are gathering user information but also during other activities such as requirement definition and concept generation (Hess & Fila Reference Hess and Fila2016). However, both instructions to active designers (IDEO 2015) and preliminary case studies (Smeenk, Tomico & Van Turnhout Reference Smeenk, Tomico and Van Turnhout2016) indicate that to develop successful products, designers must use their own insight in combination with comprehensive user understanding. Ultimately, even when what empathy comprises and how it is created are not well defined, all depictions of empathy in design agree on the aim of achieving an accurate, comprehensive understanding of the user and using this understanding to make future design decisions.

In psychology, empathy is not fully understood. However, it is usually conceptualised as a bidimensional construct including cognitive empathy and affective empathy (Shamay-Tsoory Reference Shamay-Tsoory2011). Cognitive empathy involves top-down processes that allow an individual to imagine and cognitively share what someone else could be thinking or feeling. In design, cognitive empathy is usually understood as perspective taking (Koskinen & Battarbee Reference Koskinen, Battarbee, Koskinen, Battarbee and Mattelmäki2003; Postma et al. Reference Postma, Zwartkruis-Pelgrim, Daemen and Du2012; Köppen & Meinel Reference Köppen, Meinel, Plattner, Meinel and Leifer2015). Affective empathy involves bottom-up processes that allow an individual to recognise someone else’s emotions and even share similar or equal emotional states. It includes several mechanisms such as emotional contagion (Preston & de Waal Reference Preston and de Waal2002), sharing the experience of pain or distress with others (Singer et al. Reference Singer, Seymour, O’doherty, Kaube, Dolan and Frith2004; Jackson, Meltzoff & Decety Reference Jackson, Meltzoff and Decety2005), reacting to someone else’s facial expressions (Carr et al. Reference Carr, Iacoboni, Dubeau, Mazziotta and Lenzi2003), empathic concern (Light et al. Reference Light, Moran, Swander, Le, Cage, Burghy, Westbrook, Greishar and Davidson2015) etc. Furthermore, autism and psychopathy research suggests that in some clinical cases, individuals may only have the capacity for one form of empathy (Baron-Cohen & Wheelwright Reference Baron-Cohen and Wheelwright2004; Bird & Viding Reference Bird and Viding2014; Ellis et al. Reference Ellis, Schroder, Patrick and Moser2017; Moreira, Azeredo & Barbosa Reference Moreira, Azeredo and Barbosa2019). But this division is not clear since the two components of empathy interact more than previously thought (Cuff et al. Reference Cuff, Brown, Taylor and Gowat2016). For instance, empathic concern has been measured using questionnaires such as the Interpersonal Reactivity Index (IRI, Davis Reference Davis1980) and thus asking for the top-down reasoning for how a person generally feels when perceiving someone else in distress or pain. However, empathic concern (and other mechanisms) can also be measured through bottom-up procedures such as physiological synchrony. This approach has been used in fields such as psychotherapy (Kleinbub Reference Kleinbub2017) and dyadic interactions between parent–child and couples (Palumbro et al. Reference Palumbro, Marraccini, Weyandt, Wilder-Smith, McGee, Liu and M. S.2017) in order to measure different outcome variables such as the patient rating of a therapist’s empathy (Marci et al. Reference Marci, Ham, Moran and Orr2007) and the occurrence of child behavioural problems (Lunkenheimer et al. Reference Lunkenheimer, Tiberio, Buss, Lucas-Thompson, Boker and Timpe2015) and marital conflict (Gates et al. Reference Gates, Gatzke-Kopp, Sandsten and Blandon2015).

In addition to problems brought about by the ambiguity in the definition of empathy, another key limitation in current empathy research in design is the lack of quantitative studies connecting empathy to design outcomes. Quantitative studies could be used to create predictive models of how empathy – be it defined as a mindset, understanding, method or behaviour – influences design outcomes. Existing quantitative research on empathy in design has used validated self-report measures from psychology to show that design students learn empathy in project classes (Surma-aho et al. Reference Surma-aho, Björklund and Hölttä-Otto2018) and that engineering students typically have lower dispositional empathy than students of psychology and social work (Rasoal, Danielsson & Jungert Reference Rasoal, Danielsson and Jungert2012). Another notable example is the Empathy and Care Questionnaire, which is used to assess practitioners’ self-reported perceptions of empathy (Hess et al. Reference Hess, Strobel, Pan and Wachter Morris2017). However, just a few quantitative studies have been carried out on empathy in design, and no research has truly tested whether empathy translates into improved design outcomes such as correct needs understanding, better ideas, user satisfaction, product usability or perceived effectiveness.

1.2 From empathy in design to empathic accuracy in design

Empathy in design is targeted towards a specific user group in a specific context. For instance, designers working with a group of musicians will try to understand their pains and joys, likes and dislikes about their instruments or playing music. This understanding entails careful observation, interviews aimed at uncovering different nuances of their context and other methods that can be used to inform decision-making. Therefore, studying empathy in design is challenging in the sense of establishing general rules for good approaches given its context-specific nature.

Most research providing important information about the role of empathy in design is qualitative (e.g. Kouprie & Sleeswijk Visser Reference Kouprie and Sleeswijk Visser2009; Kankainen et al. Reference Kankainen, Vaajakallio, Kantola and Mattelmäki2012; Smeenk et al. Reference Smeenk, Sturm and Eggen2017). While the qualitative approach allows us to delve into the specific context and understand the experience of the agents involved within, it does not allow us to make quantitative predictions. Therefore, qualitative approaches need to be complemented by quantitative ones that allow us to predict, explain and control the role of empathy in the design process.

The empathic accuracy method is a performance-based method for measuring the degree of understanding between two or more people interacting in a specific context in real time. It provides a quantitative measurement of the understanding of another person without self-rating empathic skills. There are three versions of the paradigm, all of which require video recording a conversation between a dyad (e.g. a user and a designer): the dyadic interaction paradigm, the standard stimulus paradigm and the shared physiology paradigm. The first two estimates the degree of similarity between lists of mental contents provided by either or both member of a dyad or from external perceivers of the interacting dyad. The higher the similarity of reported mental contents, the higher the understanding between the members of the dyad or between perceivers and members of the dyad. The third paradigm make use of physiological synchrony instead of reported mental contents to estimate the understanding between members of a dyad or perceivers. This paradigm equates higher physiological synchrony with higher accuracy when inferring someone else’s feelings. Given the task of each paradigm, they may lie closer to the cognitive or affective component of empathy. Broadly, we can locate the dyadic interaction paradigm and the standard stimulus paradigm under the cognitive empathy component and the shared physiology paradigm under the affective empathy component.

1.2.1 The dyadic interaction paradigm

In the dyadic interaction paradigm (Ickes et al. Reference Ickes, Bissonette, Garcia and Stinson1990), the members of a dyad are separately asked to rewatch their videoed interaction. One of the participants is asked to pause the recording every time they remember having had a specific thought or feeling during the interaction and to write down this thought or feeling. The second member of the dyad then watches the same video, but now it pauses at the same time points where the first participant paused it to report a specific thought or feeling. The second participant must write down what they think the first person was thinking or feeling. The two lists are compared by a group of independent participants who rate how similar the items on the two lists are. The higher the similarity, the higher the empathic accuracy of the second participant.

1.2.2 The standard stimulus paradigm

In the standard stimulus paradigm (Marangoni et al. Reference Marangoni, Garcia, Ickes and Teng1995), the videoed dyadic interaction is used as a standard stimulus from which a group of perceivers infer the thoughts and feelings of either one of or both dyad members. The perceivers do not have direct contact with either one of members of the videoed dyad.

The advantage of the dyadic interaction paradigm and the standard stimulus paradigm is that they allow us to directly compare what a user thinks or feels with what a designer thinks the user is thinking or feeling. Importantly, it also allows the study of whether the measured accuracy is similar to a designer’s self-rated accuracy in regard to identifying a user’s mental contents. Previous studies have shown that people tend to have a low degree of empathic accuracy (Stueber Reference Stueber2018). For instance, when inferring another person’s thoughts or feelings, an approximate accuracy of 20% was achieved between strangers and about 30% between people who had known each other for at least one year (Ickes & Hodges Reference Ickes, Hodges, Simpson and Campbell2013). Obviously, people are rather bad at inferring what someone is thinking or feeling when the topic of the discussion is open.

Because participants are instructed to infer what someone else might be thinking or feeling, these tasks measure cognitive empathy. That is, they measure imagining someone else’s thoughts and feelings in a given circumstance or seeing the world from someone else’s psychological perspective (Shamay-Tsoory Reference Shamay-Tsoory2011; Zaki & Ochsner Reference Zaki and Ochsner2012). In this case, seeing the world from someone else’s perspective is operationalised as the degree of similarity between the actual mental contents and inferred contents, a similar concept to Davis’ perspective-taking factor on his IRI (Reference Davis1980).

1.2.3 The shared physiology paradigm

The other version of the empathic accuracy method also records an interacting dyad, with the addition of monitoring physiological responses (such as heart rate, skin conductance and facial muscle activity) to capture affective empathy (Levenson & Gottman Reference Levenson and Gottman1983; Levenson & Ruef Reference Levenson and Ruef1992). Modern versions of this paradigm have incorporated brain imaging as an additional measure of affective empathy (Zaki et al. Reference Zaki, Weber, Bolger and Ochsner2009). In essence, the paradigm measures how accurately a participant identifies the ongoing feelings of someone else; the synchrony of physiological responses is used to estimate the similarity of felt emotions (Levenson & Ruef Reference Levenson and Ruef1992).

Interest in physiological synchronisation has increased in recent years in both psychology and neuroscience studies (Kreibig Reference Kreibig2010; Quintana & Heathers Reference Quintana and Heathers2014; Massaro & Pecchia Reference Massaro and Pecchia2019). Studies on social interactions show that physiological, behavioural and emotional reactions tend to be shared or synchronised during interaction. Synchronisation has been observed in situations such as recognising the emotions of a person from another culture (Soto & Levenson Reference Soto and Levenson2009), the interaction of married couples (Levenson & Gottman Reference Levenson and Gottman1983) or simply sharing the same space while watching emotional movies (Golland, Arzouan & Levit-Binnun Reference Golland, Arzouan and Levit-Binnun2015).

2 The current study

This study aims at addressing the shortage of quantitative studies connecting empathy to user understanding in a specific design context and testing whether empathy is relevant for design. We combined elements from all of the above-mentioned paradigms in order to study if empathic accuracy plays a role in an early-phase design and ideation task.

Within this context, our aim was to measure empathic accuracy as a quantitative indicator for a designer’s empathic capability. We analysed the interaction between two professional designers and five musicians. We formulated the following research questions:

(1) How accurately can the designers understand the group of musicians?
(2) Does the designers’ accuracy in regard to the musicians’ mental contents and emotions positively correlate with design outcomes?
(3) Does the similarity of the facial emotional expressions of the designers and musicians correlate with the designer’s empathic accuracy?

3 Method

3.1 Participants

Two designers were recruited. The interviewing designer (Designer 1) had 13 years of experience, including six years of design education (gaining a bachelor’s degree and a Master of Science degree), four years of human-centred design work and three years of design research. He also received weekly teaching from a piano instructor for 12 years when he was young. Although he had no professional training on the instrument, his previous experience in music was assumed to be an important requirement for understanding the musicians. In addition, the first author of this study assisted the designer during the planning phases of the interviews. He has played piano for 15 years and has a Master of Arts degree in music psychology, which aided in formulating relevant interview questions.

The second designer (Designer 2), a co-author of this study, had 5.5 years of experience, including three years of design education (MS in product development) and 2.5 years of design research. He did not have musical education except that gained in regular primary school. He was asked to watch the interviews and perform the dyadic interaction paradigm task for two reasons. First, to control for the effects that Designer 1’s design experience and musical background could have on his performance. Second, to test whether indirect contact with the users would translate into considerably different empathic accuracy scores compared with those obtained by Designer 1.

Five professional musicians (three females: two clarinettists, two saxophonists and one oboist; mean age $=$ 23.60, $\mathit{SD}=1.52$) with a mean playing-time experience of 15 years ($\mathit{SD}=1.41$) participated. The musicians were recruited using their musical institution’s mailing list. The musicians belonged to four different nationalities and only one had English as her mother tongue. The rest had at least B1 level English according to the Common European Framework of Reference for Languages, as demanded by their music institution.

3.2 Design brief

The designers’ task was to understand and ideate accessories to improve the musicians’ experiences with their instruments. The musicians involved in the study were professional woodwind players, many of whom experience similar challenges related to their instruments, most importantly those associated with the use of reeds. Reeds are small strips of wood or plastic that vibrate with air pressure and influence the airflow into the instrument. They affect the production of tone and the expressive and technical range of the musician (Thompson Reference Thompson1979; Ledet Reference Ledet1981; Almeida et al. Reference Almeida, George, Smith and Wolfe2013). Besides these music-related features, reeds present additional challenges such as their limited lifespan, the personal preference of each musician, the high cost of purchasing them or manufacturing them from scratch and the considerable amount of time reed making takes (Ledet Reference Ledet1981). The problems around reeds and their potential impact on the performance and well-being of musicians (Nagel Reference Nagel2010; Kenny Reference Kenny2011) are an important challenge for design. In addition to reeds, the designers were given the freedom to focus on other accessories that the musicians might need such as solutions for transporting and storing their instruments or cleaning equipment. This design brief and the associated tasks, while not spanning the entire design process, provide a realistic starting point and a set of initial actions taken by design practitioners in various open-ended projects.

3.3 Tasks and procedures

Before describing our methods in detail, we describe a simplified version for illustration. From a videoed interview between a designer and a user wearing physiological electrodes, two lists of mental contents are obtained: a list of remembered mental contents from the user and a list of inferred mental contents from the designer. These lists are rated on their content similarity by external raters, thus assigning an ‘empathic accuracy’ to a designer. Then, the designer completes two design tasks that are rated by the interviewed user. Behavioural and physiological outcomes are correlated with design outcomes in order to test whether higher empathic accuracy and physiological synchronisation correlate with higher performance in design outcomes. An overview of the approach is shown in Figure 1.

Figure 1. An overview of the study procedure.

3.3.1 Interview

Designer 1, together with the first author, developed guidelines for a 20–30-minute semi-structured interview (see Appendix 1). Designer 1 was in charge of conducting the interview given his extensive design and needs-finding experience, and the first author did not participate in it. During the interview, the musicians manipulated their instruments for demonstration purposes. This included, for example, setting up the instrument for playing and demonstrating cleaning the instrument. This was done to mimic a more contextual interview. Both Designer 1 and the musician, and later Designer 2, wore the same set of physiological sensors to record an electrocardiogram (ECG), facial electromyography (EMG) and galvanic skin response (GSR). Designer 2 was presented with the design brief and encouraged to place himself in the position of the interviewer and watch the interaction from a design perspective. We focus here on the EMG signals from the designers’ and musicians’ eyebrow muscles (corrugator supercilii muscles) and cheek muscles (zygomaticus major muscles). The activity of these muscles serves to provide indices for frowning and smiling (proxies for negative and positive emotional valence, respectively).

Before starting each major phase of the study, participants filled in the Positive and Negative Affect Schedule (PANAS; Watson, Clark & Tellegen Reference Watson, Clark and Tellegen1988). It was used to gauge the participants’ emotional states before the interview and before completing two empathic accuracy tasks (here we only report Ickes’ dyadic interaction paradigm). Because a single session with one musician lasted approximately four hours, we needed to control their mood in order that it was as constant as possible and would not be a confounding factor in their task performance. Designer 1 only spend 30 min per musician at this stage of the study. Thus, we assumed that fatigue would not have a noticeable detrimental effect on his performance and did not control for his mood changes. Before starting the interview, the participants were reminded about the topic of the interview and its approximate duration. Then they were instructed to be silent with their eyes closed for three minutes in order to stabilise their physiological signals. During the interview, the physiological signals from both members of the dyad were continuously recorded.

The interview with User 3 (U3) had to be restarted after the first five minutes due to an unexpected problem with the recording equipment. After solving the problem, the interview was resumed by summarising the prior discussion’s content. The interview lasted for a total of 15 minutes. Data from U3 was otherwise collected and included in the analysis in a similar way to other users.

3.3.2 Logging in remembered mental contents: The musicians’ phase

Before starting, the participants filled in PANAS once again. Following Ickes’ validated protocol (Reference Ickes, Hall and Bernieri2001), the musicians were asked to pause the video every time they remembered having a specific thought or feeling. They had to write down the thoughts and feelings they remembered instead of new thoughts or feelings that they might have while rewatching the video. The participants were presented with a practice trial and instructed in how to use the standard thought-or-feeling sheet. They were asked to write down the timing of where they paused the video, write down the content and choose whether it was a positive, neutral or negative thought or feeling. Although the emotional-valence choice does not allow identifying the exact emotion (i.e., choosing a negative emotion for a specific entry could be due to either anger or frustration), each valence choice is paired with specific content, detailed by the participant. Thus, it is possible to infer what the actual emotional experience was using the entry’s content. The experimenter ensured participants fully understood the task before instructing them to begin as soon as the video started. Responses were registered using a digital standard response sheet based on Ickes’ design (Reference Ickes, Hall and Bernieri2001). Instructions were presented in printed form and answers were registered on the ‘inferred thoughts or feelings’ response sheet (see Appendixes 2A and 2C for examples of the instructions and response sheets).

3.3.3 The dyadic interaction paradigm: Designer 1’s phase

Before starting this phase, Designer 1 was only aware of the objective of the interview, that it will be video recorded and that he will rewatch it while performing an unspecified task. After obtaining the list of thoughts and feelings from the musicians, Designer 1 was invited to rewatch the five interviews approximately one month after the first interview and three days after the last interview. Before starting the task, Designer 1 filled in the PANAS. He was then instructed to infer as accurately as possible what a particular musician was thinking or feeling when she reported her thoughts and feelings as well as inferring what the emotional valence of that specific entry would be. At this point, Designer 1 was aware that every time the video was paused it was annotated by each musician since knowing this was a crucial requirement for this phase.

3.3.4 The standard stimulus paradigm: Designer 2’s phase

Designer 2 completed the same task described above approximately five months after the last interview and was not aware of the content reported by the musicians or Designer 1 at the time of completing it. Distinctively, Designer 2 did not have direct contact with any of the musicians. Instructions were presented in printed form and answers were registered on the ‘inferred thoughts or feelings’ response sheet (see Appendixes 2B and 2D for examples of the instructions and response sheets).

3.3.5 Assessing the similarity of contents

Fourteen native speakers of English with completed undergraduate education or a higher level of education were recruited to rate the similarity of the content: the remembered thoughts and feelings, and the inferred thoughts and feelings of both designers (eight for Designer 1 and six for Designer 2). Following Ickes’ protocol (Reference Ickes, Hall and Bernieri2001), the similarity of content was assessed using a three-point Likert scale, ranging from 0 to 2. Raters assigned a 0 if both lists had ‘essentially different content’, 1 if they had ‘somehow similar, but not the same content’ and 2 if it was ‘essentially the same content’. Six examples (two for each possible rating) were presented along with the instructions in order to clarify the meaning of each value. The raters were presented with the five pairs of lists of mental content in a randomised order. Reliability analysis followed Ickes’ procedure. Each rater was treated as a questionnaire item and every entry score as a questionnaire response. Cronbach’s alpha was then calculated for each interview. Nunnally’s reliability criterion of .70 (Reference Nunnally1967) was used to assess the reliability of the obtained scores. Instructions were presented in digital form and answers were registered likewise (see Appendix 2E for an example of the instructions and response sheets).

3.3.6 The designer’s self-rated performance in regard to the dyadic interaction paradigm

After the dyadic interaction task, Designer 1 was asked to rate how well he thought he had completed the task on a single-item 10-point Likert scale. Since Designer 2 was aware of the results of Designer 1’s self-rated performance, he did not self-rate his performance.

3.3.7 An empathy map and ideas for improvements: The designers’ phase

The designers were asked to create an empathy map to summarise and synthesise the key insights they could identify after participating in and rewatching the interviews. The empathy map was a modified version of Both and Baggereor’s map (no date). Although this design tool contains four quadrants (i.e., ‘say’, ‘do’, ‘think’ and ‘feel’), only ‘think’ and ‘feel’ were used in this study in order to enable a similar comparison between the empathic accuracy scores and this design outcome. The empathy map’s thoughts and feelings differ from those of an empathic accuracy task. The empathy maps contained general judgements about what the users might be thinking or feeling and were completed after the designers watched the interviews. There was no specific mental content tied to specific time occurrences. The designers also generated ideas for new and/or improved accessories for the musicians. The designers listed their ideas in a text after completing the empathy map task. They were encouraged to complete both tasks as if they were part of a professional design project. Both tasks were used to crudely mimic what the next steps in a real design case might be: synthesising user understanding and generating initial ideas for further development. The designers took roughly 30 minutes per interview to complete this phase. Instructions were presented in printed form and answers were registered on a standard response sheet (see Appendix 3).

3.3.8 The empathy map and ideas for improvements: Rating the empathy map and ideas for improvements

Insights from the empathy maps and the lists of ideas for improvements suggested by both designers were sent back to the musicians for rating. This was to simulate the design-process step of coming back to the user to obtain direct feedback on the initial ideas. The musicians used a five-point Likert scale to rate how close every insight in the empathy map’s thoughts/feelings was to their experience as users. Similarly, the musicians rated the relevance of the proposed ideas using a five-point Likert scale, based on what they discussed during the interview. After completing these tasks, the musicians were fully debriefed about the aims of the study.

3.4 Materials

3.4.1 Data Logger

EMG data was collected using the portable telemetry and 16-channel data logger Biomonitor ME6000. The system allows the collection of different types of data including EMG, GSR and ECG data. EMG electrodes were placed on the left corrugator supercilii muscle and the left zygomaticus major muscle of each dyad member.

3.4.2 FSenSync (Förger Analytics)

A free-access software package was used to synchronise the recordings and streaming of the measured data. The software allows real-time streaming, recording, making notes, synchronising sensor units and compensating for slight clock drifts that may occur while recording.

3.4.3 Video recording

Interviews were videoed using Android phones running a video recording application synchronised to the FSenSync software. Cellphone cameras were placed at approximately the same height as the interviewer’s and the musicians’ eye level. The aim was to capture, as closely as possible, a frontal vision of each member of the dyad. The participants were framed from their seat upwards to ensure their hands and faces were visible at all times.

3.5 Data processing

3.5.1 An aggregated index of empathic accuracy

An empathic accuracy score was calculated for the performance of both designers in each interview following Ickes’ procedure (Reference Ickes, Hall and Bernieri2001). First, an average accuracy score for each entry was calculated. Second, a total index score was calculated by adding the average score for all entries. Third, the total index score was divided by the total amount of entries on each dyad in order to ‘yield an index of the proportion of accuracy points relative to the total number of accuracy points possible’ (p. 232). Fourth, indices of the proportion of accuracy points were percentage scaled by dividing them by two and then multiplying them by 100.

3.5.2 Electromyography preprocessing

The EMG signal was bandpass filtered at 20–400 Hz. A fast Fourier transform with a 1 s Hanning window and 0.5 s overlap was applied to filtered data in order to calculate power spectral density estimates (van Reekum et al. Reference van Reekum, Schaefer, Lapate, Norris, Greischar and Davidson2010; Lapate et al. Reference Lapate, Van Reekum, Schaefer, Greischar, Norris, Bachhuber and Davidson2014; Golland et al. Reference Golland, Hakim, Aloni, Schaefer and Levit-Binnun2018). The estimates were averaged and z-transformed to take account of variations in amplitudes between subjects.

3.5.3 Rating emotional valence

The users’ reported emotional valences and the designers estimates of them were compared. When they coincided, this was scored as 1.

3.5.4 Cross-correlation analysis of muscle activity

The maximum cross-correlation within a $\pm$5 s lag was calculated for every 10 s time event window to determine the similarity between the designer’s and users’ facial expressions during an event.

3.5.5 Correlation of EMG and empathic accuracy

To calculate whether physiological synchrony between the designers and the users was related to the former’s empathic accuracy, a Pearson correlation coefficient was calculated between the reactions of the zygomatic major (the ‘smile muscle’) and the empathic accuracy score obtained for each of the events where a thought or feeling was reported (117 entries).

3.5.6 Interpretation of effect sizes

Effect sizes (r or rho) were interpreted according to Cohen’s criterion (see Ellis Reference Ellis2010): a small $\text{effect}=.10$, a medium $\text{effect}=.30$, a large $\text{effect}=.50$.

4 Results

Our data consists of five video interviews of about 30 minutes. We used eight different channels to collect physiological data from our participants including GSR, EMG and ECG data (although here we only report the results from EMGs). Additionally, 117 remembered thoughts and feelings were reported by the musicians, as were the corresponding inferences from both designers. Both designers reported a total of 169 thoughts and feelings on empathy maps and 43 ideas for improvements. Their relevance was assessed by the musicians.

4.1 Controlling for change in the emotional state of users

A Wilcoxon signed-rank test indicated that the mean ranked emotional states of musicians were not significantly different between the beginning of the interview and the beginning of Ickes’ empathic accuracy task in either the positive mood subscale ($z=-1.83$, $p=.07$, $r=-.58$) or the negative mood subscale ($z=-0.37$, $p=.72$, $r=-.03$). Thus, the musicians felt similarly in both conditions, and it is less likely that their performance on the empathic accuracy task was affected by mood changes. It was important to control the musicians’ mood state given that the duration of the experiment was approximately four hours. For the designers, the experiment was much shorter – approximately 1 h 30 min. We assumed that fatigue would not have a noticeable detrimental effect on the designers’ performance and did not control for their mood changes. However, some other things could have affected the designers’ performance. Thus, controlling for their mood would have been important.

4.2 How accurately can designers understand a group of musicians?

4.2.1 The inter-reliability of the scoring of the similarity of content

Table 1 summarises the inter-rater reliability of the external raters’ ratings for the similarity of content between the users’ remembered thoughts and feelings, and both designers’ inferred thoughts and feelings. The reliability values were above Nunnally’s criterion of .70.

Table 1. The inter-rater reliability of the assessment of the similarity of content

Note: User 1, entries $=$ 45; User 2, entries $=$ 18; User 3, entries $=$ 15; User 4, entries $=$ 17; User 5, entries $=$ 22. Designer 1 was rated by eight external raters; Designer 2 by six external raters. SEM $=$ Standard Error or Measurement.

4.2.2 The designers’ empathic accuracy score

The designers’ aggregated index of empathic accuracy, self-rated accuracy when performing the empathic accuracy task (just Designer 1) and the percentage of correctly identified user emotional valence are summarised in Table 2. We tested if the designers’ empathic accuracy differed significantly by using three Mann–Whitney tests: (1) the designers received similar scores from the external raters: $U=6403.00$, $N=234$, $z=-0.856$, $p=.39$, $r=-.06$; (2) the designers had similar aggregated indices of empathic accuracy: $U=6.00$, $N=10$, $z=-1.36$, $p=.18$, $r=-.43$; and (3) the designers performed similarly when identifying the users’ emotional valence: $U=5.00$, $N=10$, $z=-1.58$, $p=.12$, $r=-.50$.

Table 2. The overall designers’ empathic accuracy scores

Note: Self-efficacy ranged from 1 to 10, here rescaled to percentage for ease of comparison.

4.2.3 Examples of remembered and inferred thoughts and feelings

The 117 entries obtained from the five interviews were assessed by naive raters with scores from 0 to 2, ranging from totally different content to essentially the same content. Here we present examples of high-, mid- and low-performance accuracy for both designers as well as the emotional valence remembered by the musicians and inferred by the designers.

4.2.4 The development of empathic accuracy over time

To test whether the designer’s empathic accuracy developed over time, we performed a Wilcoxon signed-rank test to compare the empathic accuracy scores obtained during the first and last 10 minutes of all interviews. Interview time did not have an effect on either designer’s empathic accuracy. Designer 1’s empathic accuracy for the first 10 minutes ($n=49$, $Mdn=0.88$, $SD=.58$) and last 10 minutes ($n=37$, $Mdn=1.13$, $SD=.56$) did not increase as the interviews progressed over time ($N=74$, $z=-.07$, $p=.95$, $r=-.01$), even when excluding the third user from the analysis (due to the shorter duration of the interview): $N=71$, $z=-.21$, $p=.84$, $r=-.02$. Similarly, a second Wilcoxon signed-rank test was done to compare the empathic accuracy scores obtained during the first 10 minutes ($n=49$, $Mdn=1.00$, $SD=.59$) and last 10 minutes ($n=37$, $Mdn=1.00$, $SD=.61$) from all the interviews watched by Designer 2. Likewise, it did not show significant changes ($N=74$, $z=-.17$, $p=.87$, $r=-.02$), even when excluding the third user from the analysis: $N=71$, $z=-.46$, $p=.65$, $r=-.01$.

Table 3. Examples of high-, mid- and low-empathic accuracy

Note: D1 $=$ Designer 1; D2 $=$ Designer 2. Negative valence ($-$), neutral valence (0) and positive valence ($+$).

4.2.5 Design task scores

Table 4 summarises the scores obtained by the designers in the three design tasks. The scores given by each musician were transformed into a percentage for ease of interpretation.

Table 4. The designers’ performance in three design tasks

4.2.6 Examples of ‘empathy map: thoughts’ outcomes

The designers completed the ‘think’ quadrant of Both and Baggereor’s (no date) modified empathy map. The designers synthesised thoughts from each interview and listed them under the ‘think’ quadrant. The thoughts gathered from all the interviews were grouped into five categories. We present two examples per category and the score given by a musician for a particular thought is presented in Table 5. The musicians were asked to rate the thoughts in terms of how representative were they of their own experiences as users: 1 $=$very far from the user’s experiences and 5 $=$very close to the user’s experiences.

Table 5. ‘Empathy map: thoughts’: categories, examples and the assigned scores

Note: D1 $=$ Designer 1; D2 $=$ Designer 2.

4.2.7 Examples of ‘empathy map: feelings’ outcomes

The designers completed the ‘feel’ quadrant of Both and Baggereor’s (no date) modified empathy map. The designers synthesised feelings from each interview and listed them under the ‘feel’ quadrant. The feelings gathered from all the interviews were grouped into three categories. We present two examples per category and the score given by a musician for a particular feeling is presented in Table 6. The musicians were asked to rate the feelings in terms of how representative were they of their own experiences as users: 1 $=$very far from the user’s experiences and 5 $=$very close to the user’s experiences.

Table 6. ‘Empathy map: feelings’: categories, examples and the assigned scores

Note: D1 $=$ Designer 1; D2 $=$ Designer 2.

Table 7. Examples of ideas for improvements

Note: D1 $=$ Designer 1; D2 $=$ Designer 2.

4.2.8 An example of ideas for improvements

We present four examples out of the 43 ideas for improvements suggested by the designers. The users were also asked to provide a justification for their score.

4.3 Does the designers’ empathic accuracy in regard to the musicians positively correlate with design outcomes?

Spearman’s correlation analyses between the designers’ empathic accuracy scores and their performance on three design outcomes (i.e., empathy map: thoughts, empathy map: feelings, and ideas for improvement) showed medium to large effect sizes. Additionally, the direction of the correlations was sometimes positive and sometimes negative. However, all the correlations were non-significant.

We also explored whether the designers’ valence-recognition accuracy (i.e., how correctly they identified whether the emotional tone of a user’s entry was positive, neutral or negative) related to their performance in the design outcomes. Spearman’s correlation analyses showed lower effect sizes than the ones displayed in Table 8, with the exception of Designer 1’s large correlation between valence recognition and ideas for improvement ($rho=-.80$, $p=.10$). As with the previous analysis, the direction of the correlations was sometimes positive and sometimes negative. Similarly, the correlations were all non-significant.

Table 8. The correlation matrix for empathic accuracy scores and design outcomes

Note: D1 $=$ Designer 1; D2 $=$ Designer 2.

4.4 Does the similarity of the emotional facial expressions of designers and musicians correlate with the designer’s empathic accuracy?

We found relatively few instances of frowning (activation of the corrugator supercilii muscle) in the dataset, which probably reflects the predominantly positive-valenced emotions experienced by the participants during the interviews. Therefore, we focused our emotional facial expression analysis on smiling (activity of the zygomaticus major muscle). A correlation analysis between Designer 1 and the users’ zygomaticus major signals (117 events), and the empathic accuracy scores obtained from the user (i.e., the empathic accuracy scores reported for all thoughts and feelings across all interviews) did not reveal any relationship (see Figure 2): $p=.51$, $r=-.06$. Similarly, when repeating the same analysis for Designer 2 and the musicians, no correlation was observed: $p=.76$, $r=-.03$. This indicates that similarity in emotional expression during the event was not necessary for (and did not help in) guessing what the musicians were thinking during the events. We also checked if the overall activation levels of either muscle (not their synchrony), of either the designer or the musician, were associated with the empathic accuracy scores, but all these correlations were close to zero as well.

Table 9. The correlation matrix for valence-recognition accuracy scores and design outcomes

Note: D1 $=$ Designer 1; D2 $=$ Designer 2.

Figure 2. Scatter plots of the zygomaticus major muscle’s EMG synchrony and event-based empathic accuracy scores. The blue dots represent the 117 events collected from the five musicians completing the empathic accuracy task. Left: Designer 1; right: Designer 2.

5 Discussion

This study is an initial attempt to rigorously test whether empathy translates into improved design outcomes. We measured empathic accuracy during dyadic interaction in three ways: first, by characterising a designer’s empathic accuracy performance; second, by exploring whether the designer’s empathic accuracy in regard to the musicians’ thoughts and feelings correlates positively with the design outcomes; and thirdly, by exploring whether the similarity of the emotional facial expressions of the designer and users correlated with the designer’s empathic accuracy. We found that both designers were capable of correctly identifying about 50% of a user’s reported mental content. We obtained small to large correlations between the designers’ empathic accuracy and their performance in design outcome tasks, although the contrary direction of the correlations, the lack of statistical significance and the small sample size all limit the interpretation of these results. The analysis of physiological synchrony and empathic accuracy revealed nearly non-existent correlations. Even when based on the performance of just two designers and five users, we collected a considerable amount of data from them; therefore, our results provide important initial information for future research.

5.1 How accurately can the designers understand the group of musicians?

On average, the designers could correctly infer 50% of the mental content reported by five professional musicians. Remarkably, the second designer received similar scores to the interviewer even though he was only exposed to the users through video recordings. The about 50% accuracy obtained by both designers is considerably higher than that found in earlier studies which have reported accuracies from 20% to 30% (Ickes & Hodges Reference Ickes, Hodges, Simpson and Campbell2013; Stueber 2013). For instance, Stinson and Ickes (1992) found that after a casual six-minute interaction, two interacting male strangers had a mean accuracy score of 24% while that of two male friends was 36%. Marangoni et al. (Reference Marangoni, Garcia, Ickes and Teng1995) found quite similar accuracy scores (23–34%) during psychotherapy sessions. When detecting emotional valence (i.e., whether the inferred thought or feeling had a positive, neutral or negative valence), the designers obtained scores below 50%. Previous studies do not report the participants’ correct identification of emotional valence. Thus, it is hard to interpret this result in the light of earlier studies.

The empathic accuracy scores assigned to both designers by two different groups of naive raters were highly reliable and clearly above the minimum standards (.70; Nunnally Reference Nunnally1967). Similarly, high reliability values have previously been reported by Ickes (Reference Ickes1993) and suggest the suitability of this rating system (Ickes Reference Ickes1993; Ickes Reference Ickes, Hall and Bernieri2001) for future studies.

Why then did the designers in our study obtain higher scores than those in previous studies? The musical background of Designer 1 did not seem to give him an advantage over Designer 2. Nor did the progression of interviewing time. The reason for the high-empathic accuracies of both designers may be the semi-structured interview context. The interview had a specific aim of exploring the musicians’ experiences with reeds and accessories. Verbal communication was complemented with demonstrations with real objects. In contrast, in unstructured and unexpected conversations (Stinson & Ickes 1992) and psychotherapy sessions (Marangoni et al. Reference Marangoni, Garcia, Ickes and Teng1995), the interviews dealt with more abstract topics and presumably did not include objects that helped one to understand the interviewees’ point of view. Thus, in our study the range of the possible mental content of the users was considerably narrower and concrete, making the identification task of the designers easier.

Our results also suggest that these outcomes can be attributed to a concrete design situation in which one is trying to understand a user and not to the trait of empathy in the designers (being more or less empathic). Extensive social psychology research shows that new circumstances (like a designer interviewing a group of musicians about reeds for the first time) have a greater influence on people’s behaviour than their trait characteristics (Ross & Nisbett Reference Ross and Nisbett2011). Similarly, empathic accuracy research suggests that we are faulty judges of our capacity to infer someone else’s mental content (Ickes Reference Ickes2003; Stueber Reference Stueber2018), thus it is aligned to previous social psychology research on the influence of specific situations on behaviour. The empathic accuracy method is a performance-based method for measuring the understanding between two or more individuals in a very specific situation. Therefore, it is not a trait measure of a designer.

Designer 1’s self-rated empathic accuracy for the dyadic interaction paradigm outcome differed considerably from his actual empathic accuracy. This could be the result of being asked how well he thought he completed the task whereas asking how accurately he inferred each musician’s thoughts and feelings would have been more relevant. However, even then a designer would be likely to overestimate her or his actual empathic skills. Previous studies have shown that people have such a tendency (Levenson & Ruef Reference Levenson and Ruef1992; Ickes & Hodges Reference Ickes, Hodges, Simpson and Campbell2013; Stueber 2013). How would the self-rated empathic accuracy performance differ among professional designers and non-designers? Would the shared educational background of the designers result in higher or lower confidence in their empathic skills when compared to non-designers?

Another relevant finding, although expected, was that the designers’ empathic accuracy did not improve over time. Marangoni et al. (Reference Marangoni, Garcia, Ickes and Teng1995) showed that when respondents to the standard dyadic interaction paradigm were given immediate feedback on the target person’s actual thoughts and feelings, there was an increase in empathic accuracy that was not found in a control group that did not get feedback. In the present study, the performance of our designers was not significantly different between the beginning and end of the interview. We wonder whether a designer could increase her or his empathic accuracy towards a user if provided with immediate feedback, thus aiding the designer to understand the context and experience of the user (Kouprie & Sleeswijk Visser Reference Kouprie and Sleeswijk Visser2009; Smeenk, Sturm & Eggen Reference Smeenk, Sturm and Eggen2017). Future studies could compare the empathic accuracy performance of designers versus non-designers when watching the same contextual interviews and test whether design training translates into differentiated outcomes.

Overall, the dyadic interaction paradigm allows designers to have different insights into users’ mental contents. By asking users to report what were they thinking or feeling in great detail and by assigning an emotional valence to this content, designers can have a more precise method with which to trace user experiences. Additionally, the dyadic interaction paradigm allows one to contrast how similar the remembered mental contents of users is to the contents inferred by a designer.

5.2 Does the designers’ empathic accuracy in regard to the musicians positively correlate with design outcomes?

It is inconclusive whether the designers’ empathic accuracy with regard to the musicians positively correlated with the design outcomes. The designers’ empathic accuracy scores, their performance on the empathy map and ideas for improvement tasks showed medium to strong correlations; however, they were completely non-significant. However, different reasons limit their interpretation. The obtained values followed unpredictable directions. Some correlations followed positive trends, as expected, but others had unexpectedly negative correlations. For instance, for Designer 1, the ‘think’ task of the empathy map had a strong correlation with the empathic accuracy scores and was thus closer to the predicted results. However, this pattern was not found with the ‘feel’ task of the empathy map. A similar interpretation follows for the outcomes of the ideas for the improvement task. Although the correlation between empathic accuracy and the accurate identification of ideas for improvements was very strong, it was a negative correlation, implying that the higher the empathic accuracy, the lower the accurate identification of ideas for improvements. There are similar difficulties for interpreting the correlations observed in Designer 2 performance. However, with the exception of the ‘think’ task, his correlations were positive, approaching our prediction. Interestingly, even though his only contact with the users was through videoed interviews, he obtained the same medium to large correlations that Designer 1 did. Perhaps this suggests that a videoed interview can communicate enough information to perform some design tasks.

Another reason that makes it difficult to interpret these effect sizes is possible rating biases and the limitations of the design task surveys responded to by users. Perhaps the musicians were biased when rating the designer whom they most likely knew was the same person who had interviewed them (Dell et al. Reference Dell, Vaidyanathan, Medhi, Cutrell and Thies2012). However, similar high scores were given to Designer 2 (notice the overall high scores obtained by both designers in Table 4) who had no physical contact with the users. Thus, it could simply be that ideas proposed by the experienced designers were genuinely well received by the users. It could also be possible that the design tasks used in this study were problematic. Perhaps choosing only two quadrants from the empathy map deprives it of its full utility. Similarly, the high ratings of the list of ideas for improvement could also be explained by biased users. The discussion remains open regarding how to properly quantify design outcomes. We chose a Likert scale response format to rate design tasks, which is not usually utilised in this way. For example, empathy maps are used as a synthesising and visualisation tool, but it remains unknown to us if users are ever asked to quantify the quality of empathy maps’ contents. Although these results suggest the possibility that the dyadic interaction paradigm or the standard stimulus paradigm might not be the best approach to use in order to capture how empathy translates into improved design outcomes, it is too early to draw such a conclusion. Therefore, our assumption that a designer’s empathic accuracy performance translates into improved design outcomes must be retested along the lines described in the previous paragraphs.

5.3 Does the similarity of the emotional facial expressions of the designers and musicians correlate with the designers’ empathic accuracy?

The similarity of the emotional facial expressions of the designers and users was not at all related to how accurate the designers were in inferring the thoughts or feelings of each of the users in the 117 entries. The negative result could be due to at least two reasons.

The synchronisation of a specific facial muscle (zygomaticus major) did not explain empathic accuracy in this study. Our result tentatively suggests that the task of inferring and reporting the thoughts and feelings of others is not helped by prosocial and probably unconscious mirroring of the other’s facial expressions. However, this result does not rule out that synchronous facial muscle activity or some other physiological signals could be crucial for empathic accuracy. Previous studies on social interaction indicate that physiology can be used to test synchronisation between individuals and its outcome on different behaviour (Kreibig Reference Kreibig2010; Quintana & Heathers Reference Quintana and Heathers2014; Massaro & Pecchia Reference Massaro and Pecchia2019). For instance, one study concluded that whenever the physiological synchrony (calculated from heart rate and electrodermal activity signals) between subjects was higher, their subjective emotional ratings of a movie they were watching were more similar (Golland et al. Reference Golland, Arzouan and Levit-Binnun2015). Subjects watching the movie were sharing the same space, but did not interact with one another. Therefore, we have to leave open the possibility that synchrony in other physiological signals could reveal an important relevance in relation to understanding others’ mind contents (see e.g. Levenson & Gottman Reference Levenson and Gottman1983; Levenson & Ruef Reference Levenson and Ruef1992; Zaki et al. Reference Zaki, Weber, Bolger and Ochsner2009).

The second reason why facial synchrony did not relate with empathic accuracy scores could be that a strong rapport or the sharing of emotional facial expressions might not be enough to understand the highly specific problems that reed users deal with. Understanding the difficulties related to reeds demands very specialised technical knowledge of acoustics, interpretation, phrasing, reed making, instrument mechanics etc. Perhaps it would be more relevant in the understanding of more emotionally charged topics such as perfectionism or music performance anxiety (Kenny Reference Kenny2011) – topics which can elicit a wider valence and arousal of subjective experiences.

5.4 Limitations and future directions

An evident limitation of the present study is the small number of participants. The low number of musicians interviewed was due to two reasons. First, the measurement session was very long. Every session with a musician took approximately four hours. Despite allowing breaks between sessions, a session was very demanding for the participants. Second, despite efforts to recruit more musicians, only five contacted us. The probable main reason for this is that we aimed to have a very specific group of musicians and thus excluded many that could have been interested in participating. However, the five participants were musicians of very high performing level and thus ideal users for our design problem.

In addition to controlling for the musicians’ mood state between the different stages of the study, we should have done the same with the designers. Our main reason for not controlling the designers’ mood changes was their comparatively short participation time (i.e., 1 h 30 min per meeting), distributed across different days, so we reasoned that fatigue would not influence their performance. However, some other factors could have affected their mood and therefore their performance. Therefore, in future studies a more careful control of the mood of all the participants at different stages of the experiment should be done. Other factors which might influence the performance could be, for example, sleeping time, the time of the day and smoking.

We should provide some clarity regarding our implementation of the dyadic interaction paradigm. In a dyadic interaction paradigm, a member of the dyad is asked to infer the mental content of the other member immediately after they have had an interaction. In the present study, we departed from this convention by asking Designer 1 to infer the thoughts and feelings of the users one month after the first interview and three days after the last. We followed this approach because we thought it better to reserve the inference task to the very end. We worried that asking Designer 1 to complete the inference task would have exposed him to a crucial part of the study and prompt him to approach the following interviews differently. It remains open whether Designer 1 would have had higher accuracy scores than Designer 2 if we had closely followed the dyadic interaction paradigm specifications by asking him to infer the users’ mental contents right after each interview.

We also believe that communicating our null results is relevant in order to prevent feeding the ‘file drawer problem’ (Rosenthal Reference Rosenthal1979) or the higher chance of reporting statistically significant results over null results (Franco, Malhotra & Simonovits Reference Franco, Malhorta and Simonovits2014). Given the high demands of our method, future studies aiming at adopting it should be informed about its potentialities and limitations.

It is important to discuss some additional lines of future work and other limitations. In this study, we selected an interview as a method of user understanding. However, user understanding is typically created with a wide array of methods – such as multiple interviews, surveys, immersion, iterative prototyping and testing, probes etc. (Sanders & Stappers Reference Sanders and Stappers2014; Oygür Reference Oygür2018) – instead of only consisting of a one-time interview. As the repeated assumption testing of users has been connected to design success (Häggman, Honda & Yang Reference Häggman, Honda and Yang2013) and as distinct reactions to user-centred information among designers have been reported (Sugar Reference Sugar2001; Zoltowski et al. Reference Zoltowski, Oakes and Cardella2012), it would be relevant to investigate whether some empathic accuracy paradigm could capture designers’ ability or tendency to become more accurate over time. In this study we only tested the empathic accuracy task on the very first interaction between the designers and users. For this particular interaction, and within the imposed limits of our controlled environment, we tried to recreate a real design case by using a contextual interview and capturing the initial design outcomes through two real-world design tools: the empathy map and idea generation. Obviously, the resulting outputs are not a final product or a prototype but rather the first elements for further development. Future work could test how these initial interaction outcomes impact on further design steps or could otherwise look at empathic accuracy over a more comprehensive design process, but this was out of the scope of this study.

Even though we controlled for the English proficiency of the designer and musicians, some of the latter expressed doubts about their language competency. Although all of them were capable of sharing their experiences during the interview, the language barrier could have hindered the flow of the interview and limited the full expression of the users’ experiences and emotions.

6 Conclusion

This study was an initial exploration into quantifying the effect of empathy on design outcomes. The initial results presented here are promising and demonstrate the feasibility of the method. We took two separate approaches to quantifying the designers’ understanding of a user. The first one was based on the previous works of Ickes (Reference Ickes, Hall and Bernieri2001) and Marangoni et al. (Reference Marangoni, Garcia, Ickes and Teng1995). A relevant finding was that the two designers correctly inferred about half of the five users’ stated mental contents. Besides this result, we provided a considerable number of examples in order to illustrate how this method can be used in a design scenario and the type of information that it can provide to researchers. The second approach was based on the work of Levenson and Gottman (Reference Levenson and Gottman1983) and Levenson and Ruef (Reference Levenson and Ruef1992). At the moments that the designers made inferences, their facial muscles were not related to the inference accuracy at the time that the inference was made. But this does not rule out other physiological signals and their potential role as predictors of design outcomes. Given the performance-based nature of the empathic accuracy task, it can be adapted to the very specific circumstances and problematics that designers have to encounter. Therefore, our results encourage future explorations of a method that could expand our understanding of empathy in design based on the measurement of accuracy.

Financial support

This work was supported by the ‘Future Makers’ grant of the Technology Industries of Finland Centennial Foundation and Jane and Aatos Erkko Foundation.

Appendix A. Interview model

Preparations

Main theme

Reeds used in woodwind instruments.

Example questions in no particular order

Execution

Introduction (10–15m)

While attaching electrodes

Interview (20–30m)

While electrodes attached and measuring

Subquestions are examples to expand on the stories.

(1) How long have you been playing [instrument]?Why? 0m
1. (a) What drew you into playing this instrument?
2. (b) Have you played any other instruments?
3. (c) Were there moments that you played less/more?
(2) Do you usually play solo or in a group, or several groups?Why? 3m
1. (a) How about when you practice?
2. (b) How about when taking lessons?
3. (c) How about when performing?
(3) If you think about preparing to play the [instrument], what do you typically need to do in order to be able to start playing the [instrument]?Why? 10m
1. (a) Do you need to clean, tune, assemble parts?
2. (b) What do you need to do after you have finished playing the instrument?
3. (c) What is most demanding in relation to being able to play?
(4) I am actually quite interested in these reeds. If you think about the reeds you use, what makes one stand out for you, what makes it good?Why? 15m
1. (a) Do you make your own, or have a special supplier?
2. (b) What other kinds of reeds have you used?
3. (c) Has your preference of reeds changed over time?
4. (d) What have been some good and bad experiences for you with reeds?
(5) If you think about your performances, what are your more memorable performances?21m
1. (a) What has been an enjoyable performance for you?
2. (b) What made that an enjoyable performance?
3. (c) What has been a less enjoyable performance for you?
4. (d) What made that performances less enjoyable?
5. (e) If you think about these performances, did the reed influence the enjoyability of that performance?
(6) I know this has been short, but we’re almost running out of time, so let’s go to the last question. Do you have any other experiences with your instrument that you would like to share?28m

Closing (10–15m)

While detaching electrodes

Appendix B. Filling in Thoughts or Feelings you Remembered

You will now rewatch the interview. Please, stop the recording at those points where you remembered having had a specific thought or feeling. Remember, you are asked to write down thoughts and feelings you remembered instead of new thoughts or feelings that you might have while rewatching the interview.

Under the column ‘time’ indicate the specific time on the recording where you remembered those thoughts or feelings. Report all of the thoughts and feelings you remember having as accurately, honestly and completely as possible under the ‘thought or feeling’ column. Please, use a different box for each thought or feeling you report. Finally, choose the tone of the emotion you experienced when remembering a specific thought or feeling:

Example of how to record your answers

After you have completed the task or at any point thereafter you will be allowed to delete any thought or feeling entry and any portion of the video recording that you would prefer remain private.

Thank you very much!

Appendix C. Filling in your Inferred Thoughts or Feelings

You will be showed the interview between the musician and you. Please, read through the following instructions. The video will be automatically paused by the researcher. Every time the video is paused, you are to write down what you think the musician was thinking or feeling at that moment by filling in one of these slots. Please, use a different box for each thought or feeling you inferred. Remember, your task is to make a straightforward inference about what the musician was actually thinking or feeling at each of the stop points on the video. Once you have written your answer, press the space bar to continue and repeat the process every time the video pauses. Finally, choose the tone of the emotion you think she experienced when having a specific thought or feeling:

Example of how to record your inferences

Thank you very much!

Appendix D. Example of User Response Sheet

Appendix E. Example of Designer Response Sheet

Appendix F. Instructions to Rate Similarity Between Thoughts and Feelings

Your task is to compare the written content of the ‘actual thoughts or feelings’ column with those of the ‘inferred thoughts or feelings’ one. Please, rate how similar do you think they are in terms of content by using the following scale:

2 $=$ essentially the same content.

1 $=$ somehow similar, but not the same content.

0 $=$ essentially different content.

Next you will see 6 examples, two per each scoring point. Should you have any questions, please contact the researcher. Thank you!

Example of an Actual Rating Case

Appendix G. Empathy Map and Ideas for Improvements Tasks

Create an empathy map to summarise and synthesise the key insights you came up with after watching the interview. This empathy map should have two columns called think and feel. Imagine you are completing this task as part of a professional task.

Feel free to work with the materials provided for you, but please write down your answers on this computer after completing the task.

Additionally, write down which are the most important features you came up with for the instrument after having watched the interview.

References

Almeida, A., George, D., Smith, J. & Wolfe, J. 2013 The clarinet: How blowing pressure, lip force, lip position and reed ‘hardness’ affect pitch, sound level, and spectrum. J. Acoust. Soc. Am., 134 (3).CrossRef Google Scholar PubMed

Baron-Cohen, S. & Wheelwright, S. 2004 The Empathy Quotient: an investigation of adults with Asperger syndrome or high functioning autism, and normal sex differences. Journal of Autism and Developmental Disorders 34, 163–175.CrossRef Google Scholar PubMed

Bird, G. & Viding, E. 2014 The self to other model of empathy: providing a new framework for understanding empathy impairments in psychopathy, autism, and alexithymia. Neuroscience and Biobehavioral Reviews 47, 520–532.CrossRef Google Scholar PubMed

Battarbee, K., Baerten, N., Hinfelaar, M., Irvine, P., Loeber, S., Munro, A. & Pederson, T. 2002 Pools and satellites: Intimacy in the city. In Proceedings of the 4th Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques, pp. 237–245. ACM Press.Google Scholar

Both, T. & Baggereor, D.Bootcamp Bootleg [Online]. Available at: https://dschool.stanford.edu/resources/the-bootcamp-bootleg (Accessed : 15 July 2019). d. School Hasso Plattner Institute of Design at Stanford.Google Scholar

Brown, T. 2009 Change by Design: How Design Thinking Transforms Organizations and Inspires Innovation. HarperBusiness.Google Scholar

Carr, L., Iacoboni, M., Dubeau, M. C., Mazziotta, J. C. & Lenzi, G. L. 2003 Neural mechanisms of empathy in humans: a relay from neural systems for imitation to limbic areas. Proceedings of the National Academy of Sciences 100 (9), 5497–5502.CrossRef Google Scholar PubMed

Cuff, B. M. P., Brown, S. J., Taylor, L. & Gowat, D. J. 2016 Empathy: a review of the concept. Emotion Review 8 (2), 144–153.CrossRef Google Scholar

Davis, M. H. 1980 A multidimensional approach to individual differences in empathy. JSAS Catalog of Selected Documents in Psychology 10, 85; Available at: http://www.ucp.pt/site/resources/documents/ICS/GNC/ArtigosGNC/AlexandreCastroCaldas/24_Da80.pdf.Google Scholar

Dell, N., Vaidyanathan, V., Medhi, I., Cutrell, E. & Thies, W. 2012 Yours is better!: Participant response bias in HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1321–1330. ACM Press.Google Scholar

Ellis, P. D. 2010 The essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results. Cambridge University Press.CrossRef Google Scholar

Ellis, J., Schroder, H., Patrick, C. & Moser, J. 2017 Emotional reactivity and regulation in individuals with psychopathic traits: evidence for a disconnect between neurophysiology and self-report. Psychophysiology 54 (10), 1–12.CrossRef Google Scholar PubMed

Franco, A., Malhorta, N. & Simonovits, G. 2014 Publication bias in the social sciences: unlocking the file drawer. Science 345 (6203), 1502–1505.CrossRef Google Scholar PubMed

FSenSync (Version 101) [Computer software]. Finland: Förger Analytics.Google Scholar

Gates, K. M., Gatzke-Kopp, L. M., Sandsten, M. & Blandon, A. Y. 2015 Estimating time-varying RSA to examine psycho-physiological linkage of marital dyads. Psychophysiology 52, 1059–1065.CrossRef Google Scholar

Ghosh, D., Olewnik, A., Lewis, K., Kim, J. & Lakshmanan, A. 2017 Cyber-empathic design: A data-driven framework for product design. J. Mech. Des. 139 (9), 1–12.CrossRef Google Scholar

Golland, Y., Arzouan, Y. & Levit-Binnun, N. 2015 The mere co-presence: synchronization of autonomic signals and emotional responses across co-present individuals not engaged in direct interaction. PLoS One 10 (5).CrossRef Google Scholar

Golland, Y., Hakim, A., Aloni, T., Schaefer, S. & Levit-Binnun, N. 2018 Affect dynamics of facial EMG during continuous emotional experiences. Biological psychology 139, 47–58.CrossRef Google Scholar PubMed

Hess, J. L. & Fila, N. D. 2016 The manifestation of empathy within design: findings from a service-learning course. CoDesign 1–2, 93–111.CrossRef Google Scholar

Hess, J. L., Strobel, J. & Pan, R. 2016 ‘Voices from the workplace: practitioners’ perspectives on the role of empathy and care within engineering. Eng. Stud. 8 (3), 212–242.CrossRef Google Scholar

Hess, J. L., Strobel, J., Pan, R. & Wachter Morris, C. A. 2017 Insights from industry: a quantitative analysis of engineers perceptions of empathy and care within their practice. Eur. J. Eng. Educ. 42 (2), 1128–1153.CrossRef Google Scholar

Häggman, A., Honda, T. & Yang, M. C.2013. The influence of timing in exploratory prototyping and other activities in design projects, Proceedings of the ASME 2013 IDETC/CIE, Charlotte, North Carolina. Available at: doi:10.1115/DETC2016-59417.CrossRef Google Scholar

Ickes, W. 1993 Empathic accuracy. Journal of Personality 61 (4), 587–610.CrossRef Google Scholar

Ickes, W. 2001 Measuring empathic accuracy. In Interpersonal Sensitivity: Theory and Measurement (ed. Hall, J. A. & Bernieri, F. J.), pp. 219–241. Erlbaum, Erlbaum, Mahwah, N. J.Google Scholar

Ickes, W.2003. Everyday Mind Reading: Understanding What Other People Think and Feel. Prometheus Books.Google Scholar

Ickes, W., Bissonette, V., Garcia, S. & Stinson, L. 1990 Implementing and using the dyadic interaction paradigm. In Review of Personality and Social Psychology: Volume 11, Research Methods in Personality and Social Psychology, pp. 16–44. Sage.Google Scholar

Ickes, W. & Hodges, S. D. 2013 Empathic accuracy in close relationships. In Oxford Library of Psychology. The Oxford Handbook of Close Relationships (ed. Simpson, J. A. & Campbell, L.), pp. 348–373. Oxford University Press.Google Scholar

Ickes, W., Stinson, L., Bissonnette, V. & Garcia, S. 1990 Naturalistic social cognition: empathic accuracy in mixed-sex dyads. Journal of Personality and Social Psychology 59 (4), 730–742.CrossRef Google Scholar

IDEO.org2015. The Field Guide to Human-Centered Design 2015 [Online]. Available at: http://www.designkit.org/resources/1 (Accessed: 23 May 2018). IDEO.org.Google Scholar

Jackson, P. L., Meltzoff, A. N. & Decety, J. 2005 How do we perceive the pain of others? A window into the neural processes involved in empathy. NeuroImage 24, 771–779.CrossRef Google Scholar PubMed

Kankainen, A., Vaajakallio, K., Kantola, V. & Mattelmäki, T. 2012 Storytelling Group – a co-design method for service design. Behaviour & Information Technology 31 (3), 221–230.CrossRef Google Scholar

Kenny, D. T. 2011 The Psychology of Music Performance Anxiety. Oxford University Press.CrossRef Google Scholar

Kleinbub, J. R. 2017 State of the art of interpersonal physiology in psychotherapy: a systematic review. Frontiers in Psychology 8.CrossRef Google Scholar PubMed

Köppen, E. & Meinel, C. 2015 Empathy via design thinking: creation of sense and knowledge. In Design Thinking Research (ed. Plattner, H., Meinel, C. & Leifer, L.), pp. 15–28. Springer International Publishing.Google Scholar

Koskinen, I. & Battarbee, K. 2003 Introduction to user experience and empathic design. In Empathic Design, User Experience in Product Design (ed. Koskinen, I., Battarbee, K. & Mattelmäki, T.), pp. 37–50. IT Press.Google Scholar

Kouprie, M. & Sleeswijk Visser, F. 2009 A framework for empathy in design: Stepping into and out of the user’s life. Journal of Engineering Design 20 (5), 437–448.CrossRef Google Scholar

Kramer, J., Agogino, A. M. & Roschuni, C.2016. Characterizing competencies for human-centered design, Proceedings of the ASME 2016 IDETC/CIE, Charlotte, North Carolina. Available at: doi:10.1115/DETC2016-60085.CrossRef Google Scholar

Kreibig, S. D. 2010 Autonomic nervous system activity in emotion: A review. Biological Psychology 84, 394–421.CrossRef Google Scholar PubMed

Lapate, R. C., Van Reekum, C. M., Schaefer, S. M., Greischar, L. L., Norris, C. J., Bachhuber, D. R. & Davidson, R. J. 2014 Prolonged marital stress is associated with short-lived responses to positive stimuli. Psychophysiology 51 (6), 499–509.CrossRef Google Scholar PubMed

Ledet, D. A. 1981 Oboe Reed Styles, Theory and Practice. Indiana University Press.Google Scholar

Levenson, R. W. & Gottman, J. M. 1983 Marital interaction: physiological linkage and affective exchange. Journal of Personality and Social Psychology 45 (3), 587–597.CrossRef Google Scholar PubMed

Levenson, R. W. & Ruef, A. M. 1992 Empathy: a physiological substrate. Journal of Personality and Social Psychology 63 (2), 234–246.CrossRef Google Scholar PubMed

Light, S. N., Moran, Z. D., Swander, L., Le, V., Cage, B., Burghy, C., Westbrook, C., Greishar, L. & Davidson, R. J. 2015 Electromyographically assessed empathic concern and empathic happiness predict increased prosocial behavior in adults. Biol Psychol. 104, 116–129.CrossRef Google Scholar PubMed

Lin, J. & Seepersad, C. C.2007. Empathic lead users: the effects of extraordinary user experiences on customer needs analysis and product redesign, Proceedings of the ASME 2007 IDETC/CIE, Las Vegas, Nevada. Available at: doi:10.1115/DETC2007-35302.CrossRef Google Scholar

Lunkenheimer, E., Tiberio, S. S., Buss, K. A., Lucas-Thompson, R. G., Boker, S. M. & Timpe, Z. C. 2015 Coregulation of respiratory sinus arrhythmia between parents and pre- schoolers: differences by children’s externalizing problems. Developmental Psychobiology 57, 994–1003.CrossRef Google Scholar

Marangoni, C., Garcia, S., Ickes, W. & Teng, G. 1995 Empathic accuracy in a clinically relevant setting. Journal of Personality and Social Psychology 68 (5), 854–869.CrossRef Google Scholar

Marci, C. D., Ham, J., Moran, E. & Orr, S. P. 2007 Physiologic correlates of perceived therapist empathy and social-emotional process during psychotherapy. Journal of Nervous and Mental Disease 195, 103–111.CrossRef Google Scholar PubMed

Massaro, S. & Pecchia, L. 2019 Heart rate variability (HRV) Analysis: a methodology for organizational neuroscience. Organizational Research Methods 22 (1), 354–393.CrossRef Google Scholar

Moreira, D., Azeredo, A. & Barbosa, F. 2019 Neurobiological findings of the psychopathic personality in adults: one century of history. Aggression and Violent Behavior 47, 137–159.CrossRef Google Scholar

Nagel, J. 2010 Treatment of music performance anxiety via psychological approaches: a review of selected CBT and psychodynamic literature. Medical Problems of Performing Artists 25, 141–148.CrossRef Google Scholar PubMed

Nunnally, J. C. 1967 Psychometric Theory. McGraw Hill.Google Scholar

Oygür, I. 2018 The machineries of user knowledge production. Design Studies 54, 23–49.CrossRef Google Scholar

Palumbro, R. V., Marraccini, M. E., Weyandt, L. L., Wilder-Smith, O., McGee, H. A., Liu, S. & M. S., Goodwin 2017 Interpersonal autonomic physiology: a systematic review of the literature. Personality and Social Psychology Review 21 (2), 99–141.CrossRef Google Scholar

Pang, M. A. & Seepersad, C. C.2016. Crowdsourcing the evaluation of design concepts with empathic priming, Proceedings of the ASME 2016 IDETC/CIE, Charlotte, North Carolina. Available at: doi:10.1115/DETC2016-59417.CrossRef Google Scholar

Preston, S. D. & de Waal, F. B. M. 2002 Empathy: its ultimate and proximate bases. Behavioral and Brain Sciences 25, 1–72.CrossRef Google Scholar PubMed

Postma, C. E., Zwartkruis-Pelgrim, E., Daemen, E. & Du, J. 2012 Challenges of doing empathic design: experiences from industry. International Journal of Design 6 (1), 59–70.Google Scholar

Quintana, D. S. & Heathers, J. A. 2014 Considerations in the assessment of heart rate variability in biobehavioral research. Frontiers in Psychology 5, 1–10.CrossRef Google Scholar PubMed

Rasoal, C., Danielsson, H. & Jungert, T. 2012 Empathy among students in engineering programmes. Eur. J. Eng. Educ. 37 (5), 427–435.CrossRef Google Scholar

Raviselvam, S., Anderson, D., Hölttä-Otto, K. & Wood, K. L.2018. Systematic framework to apply extraordinary user perspective to capture latent needs among ordinary users, Proceedings of the ASME 2018 IDETC/CIE, Quebec City, Quebec, Canada. Available at: doi:10.1115/DETC2018-86263.CrossRef Google Scholar

Raviselvam, S., Sanaei, R., Blessing, L., Hölttä-Otto, K. & Wood, K. L.2017. Demographic factors and their influence on designer creativity and empathy evoked through user extreme conditions, Proceedings of the ASME 2017 IDETC/CIE, Cleveland, Ohio. Available at: doi:10.1115/DETC2017-68380.CrossRef Google Scholar

Rosenthal, R. 1979 The ‘file drawer problem’ and tolerance to null results. Psychological Bulletin 86 (3), 368–641.CrossRef Google Scholar

Ross, L. & Nisbett, R. E. 2011 The Person and the Situation: Perspectives of Social Psychology. McGraw-Hill.Google Scholar

Sanders, E. B. N. & Stappers, P. J. 2014 Probes, toolkits and prototypes: three approaches to making in codesigning. CoDesign 10 (1), 5–14.CrossRef Google Scholar

Singer, T., Seymour, B., O’doherty, J., Kaube, H., Dolan, R. J. & Frith, C. D. 2004 Empathy for pain involves the affective but not sensory components of pain. Science 303 (5661), 1157–1162.CrossRef Google Scholar

Soto, J. A. & Levenson, R. W. 2009 Emotion Recognition across Cultures: the Influence of Ethnicity on Empathic Accuracy and Physiological Linkage. Emotion 9 (6), 874–884.CrossRef Google Scholar PubMed

Shamay-Tsoory, S. G. 2011 The neural bases for empathy. The Neuroscientist 17 (1), 18–24.CrossRef Google Scholar PubMed

Smeenk, W., Sturm, J. & Eggen, B.2017. Empathic handover: How would you feel? Handing over dementia experiences and feelings in empathic co-design, CoDesign. Available at: doi:10.1080/15710882.2017.1301960.CrossRef Google Scholar

Smeenk, W., Tomico, O. & Van Turnhout, K. 2016 A systematic analysis of mixed perspectives in empathic design: not one perspective encompasses all. Int. J. Des. 10 (2), 31–48.Google Scholar

Strobel, J., Hess, J., Pan, R. & Wachter Morris, C. A. 2013 Empathy and care within engineering: qualitative perspectives from engineering faculty and practicing engineers. Eng. Stud. 5 (2), 137–159.CrossRef Google Scholar

Stueber, K.2018. Empathy in E. N. Zalta (ed.) The Stanford encyclopedia of philosophy [Online] Available at: https://plato.stanford.edu/archives/spr2018/entries/empathy/.Google Scholar

Sugar, W. A. 2001 What is so good about user-centered design? Documenting the effect of usability sessions on novice software designers. Journal of Research on Computing in Education 33 (3), 235–250.CrossRef Google Scholar

Surma-aho, A., Björklund, T. & Hölttä-Otto, K.2018. Assessing the development of empathy and innovation attitudes in a project-based design thinking course, Research presented at the ASEE Annual Conference and Exposition, Salt Lake City, Utah. Available at: https://www.asee.org/public/conferences/106/papers/21511/view.Google Scholar

Thompson, S. C. 1979 The effect of the reed resonance on woodwind tone production. The Journal of the Acoustical Society of America 66, 1299–1307.CrossRef Google Scholar

van Reekum, C. M., Schaefer, S. M., Lapate, R. C., Norris, C. J., Greischar, L. L. & Davidson, R. J. 2010 Aging is associated with positive responding to neutral information but reduced recovery from negative information. Social Cognitive and Affective Neuroscience 6 (2), 177–185.CrossRef Google Scholar PubMed

Vaughan, M. R., Seepersad, C. C. & Crawford, R. H.2014. Creation of empathic lead users from non-users via simulated lead user experiences, Proceedings of the ASME 2014 IDETC/CIE, Buffalo, New York. Available at: doi:10.1115/DETC2014-35052.CrossRef Google Scholar

Walther, J., Miller, S. E. & Sochacka, N. W. 2017 A model of empathy in engineering as a core skill, practice orientation, and professional way of being. J. Eng. Educ. 106 (5), 123–148.CrossRef Google Scholar

Watson, D., Clark, L. A. & Tellegen, A. 1988 Development and validation of brief measures of positive and negative affect: the PANAS scales. Journal of Personality and Social Psychology 54 (6), 1063–1070.CrossRef Google Scholar PubMed

Wong, K., Norris, R. L., Siddique, Z., Altan, C., Baldwin, J. & Merchan-Merchan, W.2016. Cognitive empathy in design course for a more inclusive mechanical engineering, Proceedings of the ASME 2016 IDETC/CIE, Charlotte, North Carolina. Available at: doi:10.1115/DETC2016-60382.CrossRef Google Scholar

Zaki, J. & Ochsner, K. N. 2012 The neuroscience of empathy: progress, pitfalls and promise. Nature Neuroscience 15 (5), 675–680.CrossRef Google Scholar PubMed

Zaki, J., Weber, J., Bolger, N. & Ochsner, K. N. 2009 The neural basis of empathic accuracy. Proceedings of the National Academy of Sciences of the United States of America 106 (27), 11382–11387.CrossRef Google Scholar

Zoltowski, C. B., Oakes, W. C. & Cardella, M. E. 2012 Students’ ways of experiencing human-centered design. Journal of Engineering Education 101 (1), 28–59.CrossRef Google Scholar

Figure 1. An overview of the study procedure.

Table 1. The inter-rater reliability of the assessment of the similarity of content

Table 2. The overall designers’ empathic accuracy scores

Table 3. Examples of high-, mid- and low-empathic accuracy

Table 4. The designers’ performance in three design tasks

Table 5. ‘Empathy map: thoughts’: categories, examples and the assigned scores

Table 6. ‘Empathy map: feelings’: categories, examples and the assigned scores

Table 7. Examples of ideas for improvements

Table 8. The correlation matrix for empathic accuracy scores and design outcomes

Table 9. The correlation matrix for valence-recognition accuracy scores and design outcomes

Figure 2. Scatter plots of the zygomaticus major muscle’s EMG synchrony and event-based empathic accuracy scores. The blue dots represent the 117 events collected from the five musicians completing the empathic accuracy task. Left: Designer 1; right: Designer 2.

Article contents

Empathic accuracy in design: Exploring design outcomes through empathic performance and physiology

Abstract

Keywords

1 Background

1.1 Empathy in design and engineering

1.2 From empathy in design to empathic accuracy in design

1.2.1 The dyadic interaction paradigm

1.2.2 The standard stimulus paradigm

1.2.3 The shared physiology paradigm

2 The current study

3 Method

3.1 Participants

3.2 Design brief

3.3 Tasks and procedures

3.3.1 Interview

3.3.2 Logging in remembered mental contents: The musicians’ phase

3.3.3 The dyadic interaction paradigm: Designer 1’s phase

3.3.4 The standard stimulus paradigm: Designer 2’s phase

3.3.5 Assessing the similarity of contents

3.3.6 The designer’s self-rated performance in regard to the dyadic interaction paradigm

3.3.7 An empathy map and ideas for improvements: The designers’ phase

3.3.8 The empathy map and ideas for improvements: Rating the empathy map and ideas for improvements

3.4 Materials

3.4.1 Data Logger

3.4.2 FSenSync (Förger Analytics)

3.4.3 Video recording

3.5 Data processing

3.5.1 An aggregated index of empathic accuracy

3.5.2 Electromyography preprocessing

3.5.3 Rating emotional valence

3.5.4 Cross-correlation analysis of muscle activity

3.5.5 Correlation of EMG and empathic accuracy

3.5.6 Interpretation of effect sizes

4 Results

4.1 Controlling for change in the emotional state of users

4.2 How accurately can designers understand a group of musicians?

4.2.1 The inter-reliability of the scoring of the similarity of content

4.2.2 The designers’ empathic accuracy score

4.2.3 Examples of remembered and inferred thoughts and feelings

4.2.4 The development of empathic accuracy over time

4.2.5 Design task scores

4.2.6 Examples of ‘empathy map: thoughts’ outcomes

4.2.7 Examples of ‘empathy map: feelings’ outcomes

4.2.8 An example of ideas for improvements

4.3 Does the designers’ empathic accuracy in regard to the musicians positively correlate with design outcomes?

4.4 Does the similarity of the emotional facial expressions of designers and musicians correlate with the designer’s empathic accuracy?

5 Discussion

5.1 How accurately can the designers understand the group of musicians?

5.2 Does the designers’ empathic accuracy in regard to the musicians positively correlate with design outcomes?

5.3 Does the similarity of the emotional facial expressions of the designers and musicians correlate with the designers’ empathic accuracy?

5.4 Limitations and future directions

6 Conclusion

Financial support

Appendix A. Interview model

Appendix B. Filling in Thoughts or Feelings you Remembered

Appendix C. Filling in your Inferred Thoughts or Feelings

Appendix D. Example of User Response Sheet

Appendix E. Example of Designer Response Sheet

Appendix F. Instructions to Rate Similarity Between Thoughts and Feelings

Appendix G. Empathy Map and Ideas for Improvements Tasks

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests