1. Introduction
Our ability to perceive emotion is vital to well-being (Milders et al., Reference Milders, Fuchs and Crawford2003), impacting our capacity to understand and respond to others’ feelings (Fugate et al., Reference Fugate, Hare and Emmanuel2018). Emotion perception involves recognising, integrating, and interpreting different cues (Czimskey & Marquardt, Reference Czimskey and Marquardt2019; Zupan & Eskritt, Reference Zupan and Eskritt2022), a process at least partially driven by one’s internal awareness and representation of emotion (Ben-David et al., Reference Ben-David, van Lieshout and Leszcz2011; Czimskey & Marquardt, Reference Czimskey and Marquardt2019; Herpertz et al., Reference Herpertz, Schutz and Nezlek2016). In other words, our internal representation of emotion, or conceptual knowledge, helps us both describe our own emotional states and identify emotions in others (Adolphs, Reference Adolphs2002; Adolphs et al., Reference Adolphs, Damasio, Tranel, Cooper and Damasio2000; Phillips et al., Reference Phillips, Drevets, Rauch and Lane2003).
The words used to label emotional states are a key indicator of emotion knowledge (Barrett et al., Reference Barrett, Mesquita and Gendron2011). Since words are influenced by experience, context and culture (Niedenthal et al., Reference Niedenthal, Auxiette, Nugier, Dalle, Bonin and Fayol2004; Strauss & Allen, Reference Strauss and Allen2008), how people think about and describe emotion varies widely. Ultimately, the label attached to an emotion is dependent on how it is internally represented in our lexicon (Barsalou, Reference Barsalou1999, Reference Barsalou2003), including the complexity which characterises the relationships among lexical entries (e.g. Shaver et al., Reference Shaver, Schwartz, Kirson and O’Connor1987; Storm & Storm, Reference Storm and Storm1987).
Perspectives regarding how the emotion lexicon is structured differ. According to the prototype model (e.g. Shaver et al., Reference Shaver, Schwartz, Kirson and O’Connor1987), the structure is based on idealised representations (i.e. prototypes) formed from repeated experiences, with some words more central than others (Fehr & Russell, Reference Fehr and Russell1984; Hupka et al., Reference Hupka, Lenton and Hutchison1999). For example, in the fear prototype, anxiety might be more central than upset. This prototype model aligns with the basic and discrete theory of emotion in which all emotions are captured in a core set of classes, for which there exists a linguistic label in every language (Ekman, Reference Ekman1992; Russell et al., Reference Russell, Rosenberg and Lewis2011). Alternatively, the emotion lexicon might be a taxonomy, organised hierarchically with broad features housed at the superordinate level and finer distinctions captured at each subordinate level (Storm & Storm, Reference Storm and Storm1987). The superordinate level could indicate whether a word carries an emotion or not. Subordinate levels might differentiate valence, intensity, and so on. The characteristics or properties that lead to finer distinctions might differ according to context and people’s previous emotion experiences (Barrett, Reference Barrett2004). This taxonomic perspective is similar to the dimensional theory of emotion in which emotion is characterised by the degree to which various dimensions (e.g. valence, intensity) are activated (Russell, Reference Russell1980). In other words, the final percept arises from a pattern of cues (Posner et al., Reference Posner, Russell and Peterson2005). These perspectives have been used to develop standardised lists of both emotional (e.g. cruel, delight, pleasant) and non-emotional (e.g. ambulance, chair, garage) English words that are structured according to the degree to which words represent emotion dimensions (Bradley & Lang, Reference Bradley and Lang1999) or according to the discrete emotion categories the words fit within (Stevenson et al., Reference Stevenson, Mikels and James2007; Strauss & Allen, Reference Strauss and Allen2008).
Regardless of the perspective one subscribes to, the emotion word people assign a stimulus is constrained by available response options. Many studies in emotion perception use a forced-choice paradigm in which participants are instructed to select from a list the word that best describes the stimulus. This list is typically limited to words representing four to six basic (universal) emotion categories (i.e. happy sad, angry, fearful, disgust, surprise) (Airdrie et al., Reference Airdrie, Langley, Thapar and van Goozen2018; Grosbras et al., Reference Grosbras, Ross and Belin2018; Hauschild et al., Reference Hauschild, Felsman, Keifer and Lerner2020; Melendez et al., Reference Melendez, Satorres, Reyes-Olmedo, Delhom, Real and Lora2020; Zupan & Babbage, Reference Zupan and Babbage2017). This format permits ease of analysis, reproducibility, and comparison across studies but also constrains responses, potentially resulting in artificial and inflated agreement among participants (Barrett et al., Reference Barrett, Mesquita and Gendron2011; Limbrecht-Ecklundt et al., Reference Limbrecht-Ecklundt, Scheck, Jerg-Bretzke, Walter, Hoffmann and Traue2013; Nelson & Russell, Reference Nelson and Russell2013) and masking nuances in how people conceptualise and interpret emotion (Turkstra et al., Reference Turkstra, Kraning, Riedeman, Mutlu, Duff and VanDenHeuvel2017).
The labels people assign when identifying emotion are important because they guide responses to others’ feelings (Torre & Lieberman, Reference Torre and Lieberman2018), influencing interpersonal relationships (Trampe et al., Reference Trampe, Quoidbach and Taquet2015). In conversation, people use a wider variety of words to describe emotion than the four to six basic ones proposed by Ekman (Reference Ekman1992), of which only one is clearly positive in valence. Turkstra et al. (Reference Turkstra, Kraning, Riedeman, Mutlu, Duff and VanDenHeuvel2017) highlight the functional importance of complex (e.g. anxiety, amusement) and social emotions (e.g. pride, guilt) in everyday human interactions. Given the option, people may well use the labels for these emotions, in addition to those for basic emotions, to describe research stimuli. If they are to capture nuances in how people think about and perceive emotion, researchers must consider alternative paradigms and/or response options for examining emotion perception. This is of particular import in research addressing developmental trajectories, individual variability, and cultural differences. Better understanding of the breadth of labels people use to classify emotions can also inform assessment and treatment for clinical populations.
The aim of this project was to test the hypothesis that people will identify words as portraying complex or social emotions if given the opportunity. To do so, we conducted two studies. In the first, we tested our assumptions about the limitations of traditional forced-choice labelling tasks using basic emotion categories. Specifically, we sought to identify how well emotion words, previously generated via a free labelling task (Zupan & Babbage, Reference Zupan and Babbage2008), fit within a set of basic emotion categories. We used these data to make evidence-based decisions about the stimulus words we asked people to identify in Study 2. In Study 2, we investigated the influence of providing a broader array of response options on emotion categorisation using a sub-set of words from Study 1. Overall, our goal was to contribute to a more nuanced understanding of how people conceptualise and categorise emotion words and phrases. This work extends the work of studies focused on building a standardised corpora of emotion words in English (Stevenson et al., Reference Stevenson, Mikels and James2007; Strauss & Allen, Reference Strauss and Allen2008) by expanding the possible response options beyond the basic emotion categories proposed by Ekman (Reference Ekman1992). More importantly, this work has implications for developing response formats and scoring guidelines for studies that ask participants to rate and/or classify emotion stimuli.
2. Study 1
2.1. Participants
Participants were recruited from a Canadian university via institutionally approved campus advertisements and invitations posted to course websites. They received either monetary compensation ($10) or course credit for their participation. Only participants who were at least 18 years of age, spoke English as a primary language, and had no uncorrected vision deficits were eligible. Eighty-nine participants (17 males; 72 females), ranging in age from 18 to 54 years (M = 23.49; SD = 6.41) participated. The ratio of males to females in this sample is not balanced, but approximates that typically observed in introductory psychology courses of the type we recruited from (Dickinson et al., Reference Dickinson, Adelson and Owen2012). Given the inbalance we did not conduct gender-based analyses. On average, participants had completed 16.07 (SD = 1.76) years of education. The majority of participants (79.8%) self-identified as Caucasian.
2.2. Stimuli
Emotion words and/or phrases (maximum six words) were generated via an unpublished study (Zupan & Babbage, Reference Zupan and Babbage2008) in which 82 participants (34 males; 48 females) freely labelled facial (dynamic and static) and vocal emotion expressions. These emotion expressions had been elicited in 10 male and 10 female speakers using stimuli (film clips; novel excerpts; Zupan & Babbage, Reference Zupan and Babbage2017) equally representing happy, sad, angry, fearful and neutral emotions. During this free labelling tasks, participants provided a total of 494 unique words/phrases (hereafter referred to as emotion words). Three words were erroneously included twice, resulting in a total of 497 words. As in Strauss and Allen (Reference Strauss and Allen2008), the words were randomly divided into three lists, each consisting of 166–167 words. Each participant was presented with one list of words. The three duplicate words each appeared in different lists so no participant received any word twice. Ratings for the repeated words were combined in analyses.
2.3. Procedure
The cross-sectional design required that participants individually attend a single 60-minute face-to-face session. Each participant first completed a brief demographic questionnaire. Participants were then sequentially assigned to one of the three-word lists in the order in which they entered the study, with a total of 30 participants assigned to lists one and two, and 29 to list three. Using Cedrus SuperLab software, words were presented in size 36 black serif font on a white background via a 21.5-inch computer screen. To control for potential order effects, word order was randomised.
Using 9-point Likert scales, participants were prompted to rate each word first on intensity (1 = extremely low; 9 = extremely high), then valence (1 = extremely negative; 9 = extremely positive), then the degree to which the word represented each of five emotion categories (i.e. happy, sad, angry, fearful, neutral); the order of these categories remained consistent for each stimulus. The instructions for these ratings is provided in the Appendix. These five categories are considered distinctive without requiring complex appraisals, thoughts or judgments (Ekman & Cordaro, Reference Ekman and Cordaro2011) and are commonly used as the minimal response options in emotion studies (Airdrie et al., Reference Airdrie, Langley, Thapar and van Goozen2018; Ben-David et al., Reference Ben-David, van Lieshout and Leszcz2011; Zupan & Babbage, Reference Zupan and Babbage2017). Evidence suggests that some emotion words present with neutral semantics (Larsen et al., Reference Larsen, Mercer and Balota2006), hence the use of a neutral category. These categories also align to those used in previous studies aimed to build a standardised corpora of emotion words in English (Stevenson et al., Reference Stevenson, Mikels and James2007; Strauss & Allen, Reference Strauss and Allen2008). Following Strauss and Allen (Reference Strauss and Allen2008), we opted to have participants rate words on valence and intensity. We opted to use intensity rather than arousal, another important dimension of emotion perception, because the former is easier to present to participants (i.e. it does not require visual depictions). Though intensity and arousal are not the same, Laukka et al. (Reference Laukka, Juslin and Bresin2005) report a strong correlation between the two.
Participants used a response keypad with only 10 keys; numbers one through nine appeared on two lines (equally spaced); a tenth key labelled ‘next’ appeared on a third line. Limiting and spacing the keys reduced the likelihood of participants mistyping. Participants were not provided feedback at any time.
2.4. Data analysis
Descriptive statistics were calculated for demographics, valence, intensity, and degree-of-fit ratings, but only degree-of-fit ratings are reported here. Mean degree-of-fit ratings ranged from 1 to 9 in each of five emotion categories; higher scores indicated a better fit for the stimulus word. Since nine was the highest possible score, we considered ratings of 8.00 or greater as indicating the emotion category provided an excellent fit for the stimulus word.
Next, we calculated a Simpson Diversity Index Score (SDIS) using degree-of-fit ratings, an approach that has been applied in other studies with a psychosocial focus (Koffer et al., Reference Koffer, Ram, Conroy, Pincus and Almedia2016; Quoidbach et al., Reference Quoidbach, Gruber, Mikolajczak, Kogan, Kotsou and Norton2014; Ram et al., Reference Ram, Conroy, Pincus, Hyde, Molloy, Harring and Hancock2012; Zupan & Eskritt, Reference Zupan and Eskritt2022). The SDIS takes into account the emotion category that had the highest degree-of-fit rating and a frequency count for the number of times each emotion category was rated highest for each word. The resulting index is a nonparametric statistic representing the dispersion of scores across the emotion categories for each word. Dispersion values fall between 0 and 1, with values closer to 0 representing a more definite categorisation of the emotion. As the value of the SDIS increases, so does diversity across participant responses (Gregorius & Gillet, Reference Gregorius and Gillet2008; Zupan & Eskritt, Reference Zupan and Eskritt2022). Using existing guidelines, words with values between 0.1 and 0.4 were considered to have a low degree of diversity, words with values between 0.41 and 0.6 were classified as having moderate diversity and those with values of 0.61 or higher were considered to have widely dispersed ratings (Guajardo, Reference Guajardo2015). Thus, a value of 0.61 or higher indicates that when asked to rate the degree to which a word fit each emotion category, participants rated numerous emotion categories similarly.
2.5. Results
Appendix AFootnote 1 in the Supplementary Material provides the mean degree-of-fit ratings for all 497 words for each emotion category as well as the SDIS; valence and intensity ratings are also provided in the Appendix. A results overview is provided in Table 1. Since participants rated each emotion category for each word, we also identified which emotion categories were rated as having the second highest degree of fit (see Fig. 1). A summary for each emotion category follows.
Abbreviation: SDI, Simpson diversity index.
2.5.1. Happy
A total of 128 words (25.75%), the most for any category, were identified as best fitting in the Happy category. Overall, mean degree-of-fit ratings for words categorised as Happy ranged from 4.40 (drunk; expectant) to 8.70 (happy). Thirteen words (10.16%) received a mean rating of at least 8.00. All 13 of these words were low in diversity as indicated by SDISs ranging from 0.06 (merry) to 0.36 (excited). Moderate diversity was evident in the words with degree-of-fit ratings less than 8.00. The category most commonly rated as the second best degree of fit was Neutral (n = 118; 92.19%), with mean ratings ranging from 1.87 (energetic) to 5.57 (a tease). One word, earnest, received the same mean rating (5.10) for both Happy and Neutral, but had a lower standard deviation for Happy (SD = 2.28) so was listed in that category.
2.5.2. Sad
Of the 88 words identified as Sad, seven (7.95%) received degree-of-fit ratings of 8.00 or higher, with ratings ranging from 8.03 (devastated) to 8.87 (sad). Only sad was classified as low diversity, with an SDIS of 0.36. Of the remaining six words with degree-of-fit ratings of 8.00 or higher, five were moderate in diversity with scores ranging from 0.48 (grief) to 0.59 (depressed); one (suicidal) had an SDIS (0.66) indicating high diversity. For forty-five (51.14%) of the 88 words, Angry was the category with the second highest degree of fit. Degree-of-fit ratings for Angry ranged from 3.37 (crest fallen) to 6.45 (disappointed).
2.5.3. Angry
Ten (9.34%) of the 107 words best categorised as Angry received degree-of-fit ratings higher than 8.00, ranging from 8.03 (ticked) to 8.76 (rage). Snickering had the lowest degree-of-fit score (3.97) in the category. Interestingly, angry received only the third highest degree-of-fit rating (8.60), though it had the lowest SDIS (0.12). Overall, the SDIS for the 10 words ranged from 0.12 to 0.59 (murderous), with five of the 10 words receiving scores indicative of moderate diversity. Sad was the category most identified as having the second best degree of fit, with ratings ranging from 2.83 (snarky) to 7.20 (betrayed) across a total of 59 (55.14%) words.
2.5.4. Fearful
Only 67 (45.58%) words were identified as Fearful, the lowest proportion of words across the five categories. One of these words (sly) had the lowest degree-of-fit rating in any category (3.43). Six words (8.95%) had degree-of-fit ratings of 8.00 (horrified) or higher, with frightened rated highest (8.40), and fearful rated as third highest (8.28). Only frightened had a low diversity of ratings across emotion categories as indicated by its 0.30 SDIS. The remaining five words rated as 8.00 or higher were moderate in diversity, with scores ranging from 0.45 (scared) to 0.57 (petrified). For most of these words, Sad was the next best fit (n = 32; 47.76%), with ratings ranging from 3.27 (hesitant) to 6.72 (traumatised).
2.5.5. Neutral
One hundred and seven words (21.53%) were identified as Neutral. Overall, degree-of-fit ratings for Neutral were lower than for other categories, with the majority of words (n = 79; 73.83%) rated as 5.97 (unconcerned) or below. Only neutral received a degree-of-fit rating over 8.00 (M = 8.47). However, the SDIS for neutral (0.40) just met the threshold for low diversity. This was the lowest SDIS in this category. Three additional words were classified as moderate diversity with scores of 0.53 (bored), 0.54 (monotone), and 0.60 (cool). The remaining 103 words (96.26%) had high SDISs that ranged from 0.61 (blank) to 0.79 (e.g. nonplussed; pensive).
2.6. Discussion
The aim of Study 1 was to explore the extent to which a diverse array of emotion words fit within traditional basic emotion categories. Participants characterised a set of emotion words in a categorisation task that required they rate the degree to which each word fit five basic emotion categories. According to SDISs, many words received degree-of-fit ratings that were dispersed across multiple categories. Not surprisingly, given that participants had several negatively valenced labels to consider, this dispersion was particularly evident for words categorised into one of the three negatively-valenced categories.
The Happy category included the largest number of words (n = 128), most likely because it was the only positive response option. Because of the limited options, any positively-valenced word was likely to be categorised as Happy, Neutral being the only other feasible alternative. In fact, Neutral was consistently identified as having the second best degree of fit for Happy words. If one or more additional positively-valenced emotion categories had been provided as response options, the proportion of words discretely categorised as Happy may have been reduced and participants may not have rated Neutral as highly. For example, earnest had the same degree-of-fit score for both Happy and Neutral (5.10) suggesting that participants identified this word as one with positive valence, but that Happy was not necessarily a good fit. Words such as carefree, content, and satisfied also had similar Happy and Neutral ratings.
Negatively-valenced emotions are reported to frequently co-occur with one another (Samson et al., Reference Samson, Kreibig, Soderstrom, Wade and Gross2016) and this appeared to be the case in Study 1, particularly evident for words identified as representing Sad or Angry. For example, disappointed had degree-of-fit ratings of 7.52 for Sad and 6.45 for Angry. Neutral was rarely identified as the second best degree of fit for words that were categorised into one of the three negatively-valenced categories. In fact, Neutral was only identified as second-best degree of fit for 18 of the 262 words (7%) categorised as Sad, Angry, or Fearful. Overall, results suggest that valence may be an important factor in structuring the emotional lexicon (Kuperman et al., Reference Kuperman, Estes, Brysbaert and Warriner2014). This is consistent with research in lexical decision-making that showed valence has a strong effect on word processing. Results also suggest that when provided with several feasible response options (e.g. Anger, Fearful, Sad) there is diversity in how people categorise emotion words, at least in the case of those that are negatively-valenced.
A large number of words were categorised as Neutral in Study 1, but overall degree-of-fit ratings were lower than for other categories. In fact, only one word (neutral) had a degree-of-fit rating of 8.0 or higher. It is possible that participants do not routinely consider Neutral as representing an affective state and therefore tended to rate this category lower overall. For example, the degree-of-fit rating for emotionless was relatively low (5.45), despite the fact the word inherently suggests neutrality. The overall low ratings for words categorised as Neutral seemed to be because Neutral was perceived as a ‘catch all’ category. Participants appeared to use this category for words that did not clearly signal an emotional state (e.g. hungry; quiet; reading), that tended to be better described as actions or behaviours (e.g. hurried, preoccupied) (Goddard, Reference Goddard2014), and those that described personality traits (e.g. reserved; tolerant) (Nettle & Penke, Reference Nettle and Penke2010).
In sum, results of Study 1 support our hypothesis that people’s conceptualisations of emotion words may be quite nuanced, as indicated by the number of words (particularly negatively-valenced ones) with high SDISs. We conclude that a small-set of basic emotion categories is insufficient in capturing the degree of distinction with which people conceptualise emotion words. The use of Neutral as a ‘catch all’ category, particularly for Happy, is consistent with this conclusion. The lack of positively-valenced options in Study 1 means we do not know the extent to which the pattern observed in the negatively-valenced words is present across the full spectrum of emotion words.
3. Study 2
Study 2 was conducted to explore the influence of a broader set of response options on categorisation of a subset of emotion words from Study 1. We expanded the array of positive response options in order to explore whether the overlapping rating pattern observed for negatively valenced emotions in Study 1 might also occur for positively-valenced emotions. We also included complex emotions to explore whether a broader set of response options would provide a more nuanced picture of how people characterise emotion words. Due to public health restrictions in place at the time, Study 2, unlike Study 1, was conducted online.
3.1. Participants
Participants were recruited online using snowball sampling on the authors’ social media platforms. We required that participants be at least 18 years of age and speak English as a primary language. All participants provided informed consent. A total of 113 participants began the study; however, six participants exited the survey before completing the demographic questions. Of the remaining 107 participants, only 56 (7 males; 49 females) proceeded beyond the demographic questions to the ratings. These 56 participants ranged in age from 19 to 72 years (M = 47.5; SD = 12.63) and had completed 16.8 (SD = 3.12) years of education on average. Thirty-four participants were from Canada, 12 from Australia, six from the United States, and four from the United Kingdom.
3.2. Stimuli
In selecting stimuli to be used in Study 2, we did not rely solely on degree of fit or SDISs from Study 1. We made this decision because it appeared that limited response options may have influenced Study 1 ratings. This was reflected in the data in two ways. First, review of individual participant data showed that for some words (e.g. astonished, apprehensive, eager, miserable, sympathetic) no categories were identified as a good fit. In these cases, participants either selected ‘1’ for all five categories or selected ‘1’ for all but one category and gave the fifth category a low rating. Second, the SDISs for words in the three negatively-valenced categories (Sad, Angry, Fearful) were moderate to high, with very few words receiving low scores. It is unclear whether the higher overall diversity was because words that fell into these categories could be interpreted as any one of these three negatively-valenced emotions, or whether participants identified the word as negative, but did not feel that any of these three basic categories accurately fit the word. For example, guilty was identified as Sad with a degree-of-fit rating of 6.45. However, Angry and Fearful had similarly high ratings of 5.34 and 6.03, respectively, resulting in an SDIS of 0.68.
To reduce the length of task and minimise the likelihood of cognitive fatigue, a particular risk in online studies (Saleh & Bista, Reference Saleh and Bista2017), we opted to decrease the size of stimulus set in Study 2. We carefully reviewed the Study 1 words and excluded the following types of words: those that were better characterised as relating to appearance or character than emotion (e.g. ugly, goofy, silly, dumb), those considered slang (e.g. snarky, miffed), phrases (e.g. creeped out, put upon), and words that matched the category names of the response options (e.g. happy, angry). Next, all three authors independently reviewed the word list for each emotion category, and identified words that they would use in the sentence ‘I feel [WORD]’. Only words identified by all three authors as fitting within that sentence frame were considered in the next step of the selection process. As was the case in Study 1, the total number of words remaining in each category varied widely, but Happy still included the highest total number (n = 23). Table 2 lists the words identified from Study 1 for possible use in Study 2, including mean degree-of-fit ratings for the five emotion categories and SDISs.
Next, the word list for each emotion category was analysed for word frequency to ensure, inasmuch as possible, that participants would be familiar with the words. Since Study 2 would be conducted online and include participants with English as a first language across different countries, we calculated word frequency using the average rating between the Corpus of Contemporary American English (Davies, Reference Davies2008) and the British National Corpus (Davies, Reference Davies2004) (see Table 2). We then selected 30% of the most frequently used words in each emotion category. This yielded a stimulus set of 24 words (identified in Table 2 using an *) distributed as follows: Happy (7); Sad (7); Angry (3); Fearful (6); Neutral (1).
3.3. Procedure
The study was conducted online using the Survey Monkey platform (http://www.surveymonkey.com). After viewing study details and providing informed consent, participants confirmed they met inclusion criteria and then answered demographic questions regarding gender, age, country of residence, and years of education. Next, they were presented with a word and asked to rate how well the word fit a number of different categories (see the Appendix), using a scale of 1 to 9 (1 = not at all; 9 = extremely well). Response options were ordered alphabetically and remained consistent for each stimulus.
Thirteen categories were provided as response options. Included were an equal number of basic and complex emotions, and an equal number of positive and negative valenced emotions. Neutral was also retained as a response option. To extend the basic emotion categories to six, we opted to include Disgust and Joy as response options. Disgust is well known as a basic emotion category (Ekman, Reference Ekman1992). Joy has also been identified as basic and although this category is understood to contain Happy, people appear to differentiate these words by intensity (Zupan & Eskritt, Reference Zupan and Eskritt2020). Surprise was not included because it cannot be clearly identified as positive or negative in valence (Schlegel et al., Reference Schlegel, Grandjean and Scherer2012; Zupan & Eskritt, Reference Zupan and Eskritt2020). The final six emotion categories represented complex emotions – four that were positively-valenced (Amusement, Contentment, Pride, Relief) and two that were negatively-valenced (Anxiety, Irritated).
To maximise participant engagement, Revilla and Ochoa (Reference Revilla and Ochoa2017) report that the ideal length of an online study should be less than 20 minutes. In an effort to meet this target, participants were informed that they could rate as many of the categories as they wished for each word, but if they felt a word did not fit into a category, they could leave the rating blank and a rating of ‘1’ would be assumed. However, a minimum of three ratings was required for each word to increase the likelihood that participants were fully reflecting on the degree of fit across the range of response options. The study was expected to take approximately 15 minutes to complete.
3.4. Data analysis
The same analyses used in Study 1 were applied to Study 2.
3.5. Results
Table 3 lists the mean degree-of-fit rating for each emotion category for each of the Study 2 words, alongside the word’s corresponding SDIS. Of the 24 words, only half (n = 12; 50%) were identified as best fitting one of the same five categories used in Study 1, with four words identified as Happy, four as Sad, one as Angry, and three as Fearful. No words were categorised as Neutral.
Note: The bold values highlight the highest mean category response for each word.
Words were categorised into nine of the 13 possible categories. The diversity of ratings was higher overall in Study 2; only three words (nervous, afraid, scared), were found to have a moderate SDIS. The remaining 21 words had a high diversity as indicated by SDISs ranging from 0.62 (mad) to 0.91 (shocked). This included the positively-valenced words, which had been found to have lower diversity overall in Study 1. A summary of results follows, organised by the emotion category in which the word was initially placed in Study 1.
3.5.1. Words identified as happy in study 1
Four of the seven words (57%) identified as Happy in Study 1, were also identified as Happy in Study 2. The SDIS was higher for all four words, ranging from 0.70 to 0.87 in Study 2 compared to 0.36 to 0.56 in Study 1. Degree-of-fit ratings for Neutral for all four of these words were low, ranging from 1.06 to 1.15. Instead, Joy was identified as the second best degree of fit for three words (glad, excited, eager) and Contentment for the fourth (pleased). The remaining three words identified as Happy in Study 1 were categorised as complex, positively-valenced emotion in Study 2; two words (satisfied, peaceful) were identified as Contentment and the third word (confident) was identified as Pride. Happy was identified as the second-best degree of fit for all three of these words.
3.5.2. Words identified as sad in study 1
Four words (57%) were identified as Sad in both studies – down, hurt, disappointed, and upset. However, SDISs were high for all four words (Range = 0.64–0.81), including the word disappointed, which had only moderate diversity in Study 1. Angry was identified as the second best degree of fit for two of these four words (hurt, upset). The remaining words (overwhelmed, desperate, guilty) were all identified as Anxiety in Study 2; Fearful was the second best degree of fit.
3.5.3. Words identified as angry in study 1
Only the word mad was identified as Angry in both studies, with a degree-of-fit score of 8.10 in Study 1 and 8.69 in Study 2. The SDIS of this word increased from moderate diversity in Study 1 (0.44) to high diversity in Study 2 (0.62); Irritation was identified as the second best degree of fit. The remaining two words were identified as Irritated (frustration) and Anxiety (tense) in Study 2.
3.5.4. Words identified as fearful in study 1
Fearful was identified as the best degree of fit for three of the six words (50%) that had been categorised as Fearful in Study 1 – scared, afraid, and concerned. All three words had slightly higher degree-of-fit ratings for Fearful in Study 2 (Range = 6.11–8.57) compared to Study 1 (Range = 6.00–8.27). Diversity ratings were also similar across studies with moderate SDISs for scared (0.57) and afraid (0.54), and a high SDIS for concerned (0.70). Where ratings differed most was in the second best degree-of-fit category. In Study 1, participants chose Sad as the second-best category for all three words with ratings ranging from 4.17 to 4.47. However, in Study 2, Anxiety was identified as having the second best degree of fit with ratings similar to those for Fearful (Range = 6.11–7.76). In fact, concerned received the same mean degree-of-fit rating (6.11) for both Fearful and Anxiety (it was categorised as Fearful based on standard deviation). Two of the remaining three words identified as Fearful in Study 1 (worried, nervous) were categorised as Anxiety in Study 2; the third word, shocked, was categorised as Disgust.
3.5.5. Words identified as neutral in study 1
The word calm, which had been categorised as Neutral in Study 1, was instead identified as Contentment in Study 2 with a degree-of-fit rating of 7.32. Neutral was still identified as the second-best degree of fit (5.93). SDIS was high (0.75) due to moderate degree-of-fit ratings for Happy (5.06) and Relief (4.56), suggesting an overall positive valence for this word.
3.6. Discussion
Overall, the diversity of ratings was higher in Study 2 than in Study 1, ranging from 0.54 to 0.91 (M = 0.70) for negatively-valenced words, and 0.70 to 0.87 (M = 0.75) for positive ones. This suggests that when participants are given a broader set of options, categorisation becomes more dispersed regardless of whether the word is positive or negative in valence. While this dispersion in ratings may suggest that people categorise words differently to one another, it may also reflect that even though a broader set of options were provided, there was still no one ‘best’ fit for the word. Future research might explore if further increasing the set of response options leads to greater or less dispersion in ratings to further explore this concept.
However, the final categorisation of words still remained limited to a small set of categories, particularly for positively-valenced emotions where words were only categorised into three of the possible eight categories – Contentment, Happy, Pride – but at least moderate ratings still occurred across multiple positive categories. For example, pleased, which was identified as Happy with a degree-of-fit rating of 7.24, attained ratings of four or higher (Range = 4.15–7.07) in five additional positively-valenced categories (Amusement, Contentment, Joy, Pride, Relief). This suggests that, as is the case with negative emotions, there may be overlap in how people perceive positive emotions. This was particularly notable across Happy, Contentment and Joy, which were identified as the best or second best degree of fit for seven of the eight words identified as positively-valenced in Study 2. Data suggest that these three emotion categories may form an emotion family, with all three words representing a similar, general feeling of positive well-being, differing only in the level of intensity portrayed (Crane & Gross, Reference Crane, Gross, Paia, Prada and Picard2007; Diener et al., Reference Diener, Smith and Fujita1995; Ekman, Reference Ekman1992).
The availability of more complex, lower intensity emotions in Study 2 also appeared to influence participant responses for negatively-valenced words. For instance, guilty, desperate, and overwhelmed had all been categorised as Sad in Study 1 when participants were given only basic emotion categories to rate, with Fearful identified as the second best degree of fit. In Study 2, all three words were categorised as Anxiety, the lower intensity alternative within the Fearful emotion family; Fearful remained the second best degree of fit. This same pattern was observed for words categorised as Angry or Irritated. This pattern suggests that offering participants different intensity options can provide further insight into their conceptualisation of emotion.
Unlike in Study 1, participants only identified Neutral as the second best category for one of the positively-valenced emotion words – calm – a word that had been previously categorised as Neutral in Study 1. In Study 2, it was instead categorised as Contentment. Interestingly, the three words categorised as Contentment in Study 2 (satisfied, peaceful, calm) had the highest Neutral ratings (Range = 3.57–5.93) of all 24 words included in Study 2. Though not always labelled an emotion per se, Neutral is considered an affective state, either positive or negative, in which an individual feels nothing in particular (Gasper et al., Reference Gasper, Spencer and Hu2019). In other words, Neutral does not necessarily mean the absence of emotion or feeling. However, it has also been suggested that simply recognising an affective state as Neutral can result in feelings of pleasure (Gasper et al., Reference Gasper, Spencer and Hu2019). Providing participants with a postively valenced, low intensity option like Contentment may have enabled them to more accurately reflect their conceptualisation of words like calm. The purpose of the studies presented here was not to explore Neutral per se; however, in hindsight, we could have gained insight into this matter had we selected a subset of words identified as Neutral in Study 1 on the basis of valence ratings to explore whether participants would be more inclined to attach affect to these words if low-intensity descriptors were available.
4. Overall discussion
The overall aim of the two studies was to determine if emotion words categorised into a small set of basic emotion categories would retain that categorisation in the presence of a broader set of response options. In Study 1, participants were asked to categorise emotions into one of four basic emotion categories (Happy, Sad, Angry, Fearful) or Neutral. In Study 2, these categories were extended to a total of 13 options. Of the 24 words included in Study 2, only half were categorised using the same basic emotion categories as Study 1. The remaining 12 words were instead categorised across five of the broader response options, four of which represented complex emotions.
Overall, results suggest that there is considerable diversity in how people conceptualise words and that basic emotion categories may not sufficiently represent the way people think about and define emotion. The provision of more options had an interesting impact on how people categorise emotion words. First, when participants were provided the opportunity to utilise complex emotion categories, they did so. In fact, nearly half of the words in Study 2 (n = 11) were categorised into one of the complex emotion categories. Second, when participants were provided a greater array of emotion categories from which to choose, they rated words differently, sometimes considerably so. In some cases, words in Study 2 were categorised in entirely different emotion families than Study 1. For example, guilty, which had been categorised as Sad in Study 1, was categorised as Anxiety in Study 2. This result was particularly surprising given that guilt has previously been identified as belonging to a family of related emotions that includes both anger and frustration (Roseman, Reference Roseman and Frijda1994). Though further study is required, findings like these suggest that reliance on a narrow range of basic emotion categories may obfuscate our understanding of how people conceptualise emotion, such that we fail to appreciate the nuance of which they are capable.
We had expected that having categories that represented more complex and disparate (e.g. Amusement, Pride) and/or lower intensity emotions (e.g. Anxiety) would result in more distinct ratings in one category over others. This did not occur. Instead, it appeared that participants primarily differentiated emotion broadly on the basis of valence, and were less discerning within that valence. This was particularly evident in the case of positively-valenced emotions where the emotion categories were primarily complex in nature. It may be that participants have more difficulty differentiating complex emotion categories or that words that are best represented by complex emotions tend to have more overlap in meaning and appraisal. For instance, the word pleased, which was categorised as Happy by participants in both studies, had moderate to high ratings for six different emotion categories in Study 2 including Amusement, Pride, and Relief. Although these categories might generally be considered quite distinct from one another, words such as pleased may simply have multiple meanings depending on how they are appraised. For example, pleased could easily fit the category Pride if interpreted as pertaining to increased feelings of self-worth related to an accomplishment (Lewis, Reference Lewis, Lewis, Haviland-Jones and Barrett2008). Similarly, pleased could fit the category Relief, describing a situation where an aversive situation has been resolved (Roseman, Reference Roseman and Frijda1994).
4.1. Limitations and future directions
Findings must be considered in light of two types of limitations. The first involves choice of emotion words, both in regard to response options and the word lists. In regard to response options, there are a few notable absences. Though Surprise is one of the six basic emotion categories, we opted not to include it, both because it is not clearly positive or negative and because we sought an equal number words of each valence. However, future studies might consider including Pleasant Surprise, which has been successfully used in studies of vocal emotion recognition (Dupuis & Pichora-Fuller, Reference Dupuis and Pichora-Fuller2015), to make this distinction. We also did not include Compassion (considered a social emotion) or Contempt (a negative emotion, related to Anger and Disgust) (Ekman, Reference Ekman1992). Also, some of the words we had participants rate could be considered their own category, particularly if we look to more social emotions (e.g. guilt, worry). We acknowledge that inclusion of these words may have influenced the pattern of responding seen in Study 2. Future studies might vary the types of emotion words included as response options; specifically, we suggest studies include more social emotions (e.g. compassion) so that researchers can explore how such words are conceptualised. Results also suggest that people may conceptualise words as belonging to word families. Future studies might explore this (e.g. by asking participants to group words that belong together). Such studies have the potential to provide insight into hierarchies and the degree to which intensity influences how people conceptualise emotion words.
Participants were not provided with a ‘none’ or ‘I do not know’ option, something typical of forced-choice paradigms, which require participants to select only one option, regardless of how many they may attribute to the stimulus (Nelson & Russell, Reference Nelson and Russell2013). Instead, we allowed participants the flexibility to rate each stimulus according to its degree of fit with multiple response options. There may still have been times where participants felt uncertain of the meaning of the word; in such instances an escape option such as ‘none’, ‘I do not know’, or ‘unfamiliar’ could have reduced the likelihood of artificial agreements. However, other work (Frank & Stennett, Reference Frank and Stennett2001) suggests that the absence of an escape option is unlikely to have had much effect on the pattern of results. Finally, we acknowledge that order effects are possible since the order of response options remained consistent for each stimulus.
The word list, which was quite extensive in Study 1, was significantly reduced in Study 2. As a result conclusions are based on a relatively small number of words in each category. This makes further investigation of our findings important. It was necessary to reduce the list in Study 2 for pragmatic reasons. To do this, we had to make choices (detailed previously) about which emotion words we would include. Though we used word frequency as a guiding principal it is still possible that some words were unknown to participants. In future studies, researchers might consider including a response option that allows participants to indicate this. Also, for Study 2 we sought to keep word counts in each category in proportion to the category counts in Study 1. In retrospect, we should have selected an equal number of words from each category. The fact that we had relatively few Neutral words made it difficult to assess whether it was being used as a ‘catch all’ category or whether participants actually perceived some words as emotionally neutral. Finally, for Study 2 we chose words which people identified as fitting strongly into one of the five categories from Study 1. However, there were some emotion words where ratings in two categories were quite similar (e.g. sympathetic had similar Happy and Sad ratings, categories that were opposite in valence). Perhaps it is these not easily categorised words that require further investigation. Future studies involving such words might yield more valuable insights into how people conceptualise emotion words.
A second type of limitation concerns context. Participants were not provided any context to guide interpretation of word meaning which may have contributed to the high diversity ratings. Fehr and Russell (Reference Fehr and Russell1984) suggest that the degree to which people rate an emotion as belonging to an emotion category is dependent upon their appraisal of that emotion, which itself depends on their interpretation of the situation in which the emotion occurs (Siemer et al., Reference Siemer, Mauss and Gross2007). It is possible that the absence of context in Studies 1 and 2 limited participants’ ability to categorise words. With no prescribed context participants may have created their own, which would differ from person to person. For example, in Study 1 the primary rating for console was Sad, while the secondary rating was Happy. One would not expect the primary and secondary categories to be opposite in valence. However, perhaps respondents imagined a situation in which they were being consoled but differently appraised the valence of the word depending on whether they focused on the need to be consoled (negative), or its impact (positive). Individual differences in meaning appraisal could similarly have contributed to diversity scores in Study 2. To determine the extent to which the lack of context (or individual variation in inferred context) influences people’s interpretation of a word, future studies may consider using a standard carrier phrase (e.g. If someone said ‘I am feeling X’, I would categorise that word as meaning: happy, sad, etc.). The use of such a carrier phrase does not provide context indicative of a specific word meaning but could support more consistency in approach.
5. Conclusion
Results of the two studies presented here showed that the response options we provide participants for judgements of emotion stimuli influences the way they conceptualise and categorise emotion words. Though desirable for data analysis and replication purposes, the use of forced-choice with only four to six basic emotion categories is likely too constraining, failing to adequately capture how emotion words are conceptualised. It may not be possible to identify one ‘best’ list of response options; nevertheless, results suggest that a broader array of positive emotion words is needed. Further, it appears that including more complex or social emotion words would enable participants to reflect on emotion intensity. Extending response options has the potential to provide important insight into how people (of various cultures) conceptualise and label emotion across the lifespan, informing our understanding of developmental trajectories. It may also highlight deficits in emotion differentiation and enable us to better characterise people’s ability to interpret a broader scope of emotions associated with real-life contexts.
Supplementary Material
To view supplementary material for this article, please visit http://doi.org/10.1017/langcog.2022.24.
Competing interests
The authors declare none.
Appendix. Instructions given to participants for Study 1 and Study 2
A.1. Study 1
The following directions were provided following each stimulus:
-
• Rate the intensity level of the word from 1 (extremely low) to 9 (extremely high).
-
• Rate how positive or negative you think this word is from 1 (extremely negative) to 9 (extremely positive).
-
• Rate the degree to which the word fits the emotion category [Happy] from 1 (not at all) to 9 (extremely well).
A.2. Study 2
A.2.1. Initial instructions
For the following questions, you will be presented with a WORD and asked to rate how well that word fits a number of different emotion categories on a scale of 1 to 9, with 9 indicating the highest degree of fit and 1 indicated that the word does not fit into that category at all.
You can select a rating for as many categories as you wish for each word. If you do not feel the word fits into a category, you can leave that category blank and we will assume a rating of 1. However, you will be required to rate at least 3 emotion categories for each word, with a minimum rating of 1.
A.2.2. Instructions for each stimulus
-
• How well does the word [X] fit into each of the following emotion categories (1 = not at all; 9 = extremely well)?