Introduction
The 2018 indictments of multiple Volkswagen executives (Glinton & Gotbaum, Reference Glinton and Gotbaum2018), and the arrest of Audi CEO (Romo, Reference Romo2018) in connection with the Volkswagen emission scandal serve as reminders of the importance of understanding how to identify and develop ethical leaders. A more recent example includes Toyota Motor failing to report defects that interfere with cars-controlled tailpipe emissions, thus violating public health and the environment (Tabuchi, Reference Tabuchi2021). Knowing this violation was occurring, Toyota managers and leadership ignored the noncompliance in an effort to sell millions of vehicles. These examples serve to highlight that, despite increasing attention, ethical dilemmas and scandals continue to plague organizations. Recognizing that these organizations have high standards, standards stemming from organizational leadership, the employees within the organizations behaved unethically - failing reporting protocols and ultimately harming public health and safety. Certainly, unethical behavior is not isolated to the auto industry. Further examples can be seen in tech industry with Elizabeth Holmes’ deceitful practices in the development of new healthcare tech (BBC, 2022), the UK government with Boris Johnson’s COVID-19 rule violation (CNN, Reference Picheta2022), and Balenciaga’s 2022 advertising campaign featuring children in bondage (NYTimes, Reference Paton, Friedman and Testa2022). In addition to these examples, research on ethical behavior in the workplace emphasizes the important role that leaders play in fostering an ethical environment (e.g., Brown & Treviño, Reference Brown and Treviño2006). As such, to foster more ethical workplaces, we must also understand how to foster ethical leadership. The goal of this article is to examine a practical intervention focused on the development of ethical persons and managers using a virtual ethical leadership assessment center.
Brown et al. (Reference Brown, Treviño and Harrison2005) defines ethical leadership (EL) as the demonstration and promotion of ethical conduct. Specifically, the demonstration of normative behaviors through personal action and interpersonal relationships and the promotion of such conduct via communication and reinforcement. Normative behaviors include conduct such as honesty, trustworthiness, fairness, and care. The definition is purposefully vague, as it is context dependent. The other component of ethical conduct is promoting ethical behaviors via three elements. The first element is communication, in that leaders draw attention to and explicitly speak about ethics in the social environment while providing their followers with voice (Bass & Steidlmeier, Reference Bass and Steidlmeier1999). The second element is reinforcement, in which leaders set standards and reward and discipline followers (Gini, Reference Gini1998; Treviño et al., Reference Treviño, Brown and Hartman2003). Finally, the third element of ethical conduct is related to decision making, in which ethical leaders consider the consequences of their decisions and make fair choices that can be observed by others (Bass & Avolio, Reference Bass and Avolio2000; Howell & Avolio, Reference Howell and Avolio1992).
The inclusion of demonstration and promotion is key, implying that ethical leaders are moral persons, acting ethically themselves, as well as moral managers, encouraging their followers to engage in ethical behavior (Brown & Treviño, Reference Brown and Treviño2006; Treviño et al., Reference Treviño, Hartman and Brown2000). Although this is a burgeoning area of theoretical and empirical research, the lack of innovation in measurement hinders the growth of the field and limits the application of findings to organizational settings. Further, there is scant research on whether EL can be developed (Brown & Treviño, Reference Brown and Treviño2006). The present effort aims to address these issues by employing a longitudinal experimental design to assess the efficacy of receiving feedback based on an EL diagnostic assessment center (EL-DAC) for assessing and developing EL with respect to both moral person and moral manager dimensions.
Ethics assessment
Since its rise to popularity in the 1980s, the predominant means for assessing ethics in the workplace has been, and remains, integrity testing (Sackett & Wanek, Reference Sackett and Wanek1996). Integrity testing focuses on traits relevant to EL by assessing self-reports of overt behaviors, attitudes, or personality traits, which predict counterproductive workplace behaviors and unethical behavior (Sackett et al., Reference Sackett, Burris and Callahan1989; Ones & Viswesvaran, Reference Ones and Viswesvaran1998). Testing initially began with polygraphs, but the focus later turned toward oral and written integrity tests (Berry et al., Reference Berry, Sackett and Wiemann2007; Sackett & Wanek, Reference Sackett and Wanek1996), which displayed moderate predictive validity coefficients for managerial performance (r = .41; Schnmidt et al., Reference Schnmidt, Ones and Hunter1992). More recently, there have been efforts to assess integrity using situational judgment tests (SJTs; Becker, Reference Becker2005; de Meijer et al., Reference de Meijer, Born, van Zielst and van der Molen2010). Although validity evidence has been provided, integrity tests primarily focus on the traits associated with the moral person, and thus, fail to fully capture the EL construct.
Other efforts have been made to examine the decision making dimension of EL. For instance, Mumford et al. (Reference Mumford, Devenport, Brown, Connelly, Murphy, Hill and Antes2006) developed a series of ethical decision making SJTs in the sciences. These measures focus on meta-cognitive strategies (e.g., anticipating consequences) associated with ethical judgments. More centrally, Loviscky et al. (Reference Loviscky, Treviño and Jacobs2007) developed a measure of managerial moral judgment (MMJ). The MMJ was modeled after the Defining Issues Test which is grounded in Kohlberg’s (Reference Kohlberg and Lickona1976) theory of moral development. The authors provide evidence bearing on the construct validity of the measure. More recently, Watts et al. (Reference Watts, Medeiros, McIntosh and Mulhearn2020) developed a measure of cognitive biases, providing predictive validity evidence for their role in managerial ethical decision making. These measures primarily focus on the moral person, opposed to a moral manager.
There have also been efforts to expand this measurement to include dimensions of the moral manger. For example, Kalshoven et al. (Reference Kalshoven, Den Hartog and De Hoogh2011) aimed to assess both the moral person and moral manager components in their EL at Work Questionnaire (ELW). The ELW asked subordinates to respond to a series of questions regarding perceptions of a leader’s integrity, ethical guidance, and fairness, among others. More recently, Yukl et al. (Reference Yukl, Mahsud, Hassan and Prussia2013) developed the EL Questionnaire (ELQ). Similar to the ELW, the ELQ consists of 15 items such as, “My boss opposed the use of unethical practices to increase performance.” Taken together, these measures are effective in assessing one’s ethicality and morality as a person or manager, but fail to illuminate how organizations can develop ethical thinking and action among their current workforce.
Although these measures mark progress, measures of EL that can be used for selection and development purposes remain sparse. Some have argued that other, more innovative methods for assessing ethics may not be feasible. For instance, Loviscky et al. (Reference Loviscky, Treviño and Jacobs2007) argued that the measurement of moral judgement vis-à-vis assessment centers was impractical due to findings that scores may represent exercise ratings rather than traits (e.g., Guion, Reference Guion1998; Sackett & Dreher, Reference Sackett and Dreher1982). However, Loviscky et al. (Reference Loviscky, Treviño and Jacobs2007) focused on moral judgment rather than behaviors and decision making strategies associated with EL. If these dimensions can be approximated with SJTs, it stands to reason that an assessee may similarly demonstrate related behaviors in ethically charged assessment center simulations. In another critique of assessment centers, Loviscky et al., argued that assessment centers are costly. Although we do not disagree with this point, assessment centers may be worth the high cost when organizations are interested in assessing and developing high-level talent (Thorton & Rupp, Reference Thorton and Rupp2006.). The high costs, financial and otherwise, associated with unethical behavior may very well warrant investment in procedures that more finely pinpoint leader’s strengths and weaknesses with regard to ethical behavior and that may open the door for developmental interventions that prevent costly scandals.
Expanding EL measurement to assessment centers offers an additional advantage. Although ethical behavior has traditionally been studied from the individual, or “bad apple” perspective, research suggests that the environment, or the “barrel,” also plays a key role in how leaders behave (e.g., Kish-Gephart et al., Reference Kish-Gephart, Harrison and Treviño2010). An assessment center allows for the manipulation of environmental characteristics that approximate the culture and norms of a specific organization and may therefore provide a better estimation of how an assessee may behave under the unique circumstances of one organization or role. Despite the potential benefits, no research has explored if EL can, in fact, be assessed using an assessment center. Based on the preceding discussion, we propose the following hypothesis:
Hypothesis 1: Measurement of EL in an assessment center will demonstrate convergent validity with traditional measures of ethics-related individual differences, EL behaviors, decision making, and integrity.
Moral person and moral manager dimensions
The moral person dimension consists of three pillars: traits, behaviors, and decision making (Brown et al., Reference Brown, Treviño and Harrison2005). Moving beyond traits, the literature suggests several processes associated with ethical behavior and decision making. For instance, Rest (Reference Rest1986) argued that ethical behavior results from recognizing an issue, making a judgment, and establishing intentions to act. This model has been generally accepted and incorporated with other models and findings (e.g., Jones, Reference Jones1991) to provide a framework for understanding how ethical decisions are made and actions are taken. Although foundational, this model leaves to question, what decision making processes and strategies result in ethical judgments? Several processes have been proposed including considering actions and implementation planning (Janis & Mann, Reference Janis and Mann1977), information processing and evaluation (Bommer et al., Reference Bommer, Gratto, Gravander and Tuttle1987), and perspective taking and information seeking (Pizarro & Bloom, Reference Pizarro and Bloom2003).
The moral manager dimension includes three pillars: communicating ethical standards and values to followers, rewarding ethical behavior and punishing unethical behavior, and role-modeling by making sound ethical decisions (Brown et al., Reference Brown, Treviño and Harrison2005; Treviño et al., Reference Treviño, Brown and Hartman2003). Thus, the moral manager dimension focuses on how leaders encourage ethical decision making and behavior in their followers. In this instance, the third pillar, role-modeling, can be subsumed into the moral person category as it focuses on the leader’s own ethical decision making. The other two dimensions, communication and reward/punishment, may both be directly observed in an assessment center context. Although there are measures of EL available, they are based on subordinate ratings, which may be problematic because subordinates may have limited opportunities to observe supervisor unethical conduct (Wexley & Youtz, Reference Wexley and Youtz1985). Thus, an assessment center may allow for more directly observable accounts of EL. Communication of ethical values and standards and leader discipline regarding ethical or unethical behavior are the final two dimensions assessed.
Although there have been several studies examining predictors (e.g., Mayer et al., Reference Mayer, Aquino, Greenbaum and Kuenzi2012), outcomes (e.g., Toor & Ofori, Reference Toor and Ofori2009), moderators (e.g., Kacmar et al., Reference Kacmar, Bachrach, Harris and Zivnuska2010), and mediators (e.g., Walumbwa & Schaubroeck, Reference Walumbwa and Schaubroeck2009) associated with EL, there has been little work examining how to effectively develop ethical leaders. Specifically, there is limited research exploring how to develop moral manager skills. This is a critical need as developing manager traits, such as integrity and honesty, is not feasible (Kirkpatrick & Locke, Reference Kirkpatrick and Locke1991). Thus, developing ethical leaders through altering traits is not a viable option. A more effective path forward may be through the decision making and skill-based training efforts, which have shown some efficacy in the sciences and business (e.g., Medeiros et al., Reference Medeiros, Watts, Mulhearn, Steele, Mumford and Connelly2017; Watts et al., Reference Watts, Medeiros, Mulhearn, Steele, Connelly and Mumford2017). An assessment center may be particularly beneficial for developing ethical leaders as it may provide rich sources of behavioral information, makes use of relevant cases, and allows for feedback – key components of successful ethics training programs (Medeiros et al., Reference Medeiros, Watts, Mulhearn, Steele, Mumford and Connelly2017; Watts et al., Reference Watts, Medeiros, Mulhearn, Steele, Connelly and Mumford2017). Specifically, in a training context, feedback allows trainees to know what they are doing appropriately and should continue, and what behaviors or decisions need to be changed or improved upon (Goldstein & Ford, Reference Goldstein and Ford2002). Providing feedback to assessees after an assessment center may work similarly, providing them with actionable information regarding their EL. This leads to our second hypothesis:
Hypotheses 2: Providing assessees with feedback on EL-DAC performance can improve scores on subsequent EL assessments.
In addition to convergent and predictive validity, research has also demonstrated the importance of face validity, or the assessee’s perception of job relatedness regarding the assessment center procedures (Kluger & Rothstein, Reference Kluger and Rothstein1993; Smither et al., Reference Smither, Reilly, Millsap, Pearlman and Stoffey1993). Put simply, face validity in this context refers to the extent to which assessees perceive the content of the assessment center as relevant to developing their EL skills. A commonly cited strength of assessment centers is their face validity, as it is an important dimension in testing attitudes and reactions (Kluger & Rothstein, Reference Kluger and Rothstein1993; Macan et al., Reference Macan, Avedon, Paese and Smith1994).
To our knowledge, no research has explored the development of EL in assessment centers or the use of online mediums to conduct assessment centers, thus highlighting the need to consider assessee reactions in this particular context as a marker of face validity. The examination of face validity is especially critical when studying a new application of assessment centers, as the underlying motive of assessment center research is to produce a practical product or procedure. Additionally, it is important to account for assessee reactions in an assessment center, as perceptions can affect subsequent performance in future assessments.
Reactions in favor of the EL-DAC may be expected due to the strong face validity of assessment centers in general; however, it is important to note that the EL-DAC has two unique features that set it apart from traditional assessment centers. The first is rather obvious: the construct of interest, EL. Traditional assessment centers seek to measure complex multidimensional competencies that, together, compose performance in a specific job. Although ethical leadership is indeed a complex multidimensional component of some jobs, the construct is especially difficult to define due to its context dependence. If the context of scenarios in the EL-DAC is unclear to assessees, assessees may not be able to evaluate and track their own efficacy. The right way of responding may not be apparent; therefore, the assessee may feel the EL-DAC lacks objectivity. As such, we explore the following research question:
Research Question: To what extent do assessees positively or negatively respond to the EL-DAC?
Method
Participant info
Twenty-four assesses were recruited over a two-year time period. Assessees included undergraduate and graduate business students at a Midwestern university. On average, assessees had seven years of work experience. The sample consisted of eight men and nine women, with seven assessees selecting not to report their gender. Each assessee was offered $75 for their participation. Ethics board approval was obtained.
Ethical dimensions explication
We used an iterative approach to the explication of our behavioral dimensions. First, the primary investigators identified sub-dimensions of Treviño et al.’s (Reference Treviño, Hartman and Brown2000) moral manager and moral person dimensions. Each sub-dimension was defined and behavioral markers were identified. Although Treviño et al. (Reference Treviño, Hartman and Brown2000) provide clear theoretical definitions of the moral person and moral manager dimensions, as well as sub-dimensions, these definitions were not always well-suited for developing behavioral markers observable in an assessment center context. This was especially true for the moral person dimension. For example, “personal morality” and “doing the right thing” are theoretically important, and often touted as critical to EL, but “morality” and “right” remain vague terms that are difficult to define and observe in an applied setting. Indeed, whole fields of philosophy are dedicated to arguing about what is right and what is moral. As such, without clear consensus on what is right, wrong, and moral, these sub-dimensions are difficult to define and contextualize within an assessment center context. Further, traits, such as trustworthiness, are difficult to assess given their subjective nature.
Additionally, some of the sub-dimensions explicated by Treviño et al. (Reference Treviño, Hartman and Brown2000) used socially loaded labels and definitions (e.g., “personal morality”). Given the potential sensitivity around some of the phrasing, the primary investigators viewed these labels as inappropriate for the developmental nature of the assessment center. It is possible that retaining the original language could limit an assessee’s responsiveness to feedback. For example, research on biases demonstrates that most people view themselves as moral and having integrity (e.g., Dong et al., Reference Dong, van Prooijen and van Lange2019). Receiving feedback that one received a low score on personal morality may unnecessarily limit one’s willingness to hear that, or other, feedback. As such, we deemed it necessary to change some of the original language of the sub-dimensions to encourage assessee reflection.
The remaining dimensions were reviewed by three graduate students familiar with the EL literature and were revised for content, specificity, and clarity. The final list contained eight dimensions and can be viewed in Tables 1 and 2.
Exercise design
The primary investigators reviewed the assessment center literature for exercises relevant to EL to develop two exercises that were eventually used in the assessment center. The first exercise identified was a case analysis. Cases from “Arthur Anderson Case Studies in Business Ethics” hosted by Carnegie Melon’s Tepper School of Business (Tepper School of Business, 2022) were reviewed and pilot tested by a small sample of undergraduates. During the pilot test, students read the cases and were asked to respond to three prompts: 1) describe the ethical dilemma in the scenario, 2) describe the steps you would take to resolve the dilemma and 3) describe why you would choose to take these steps. The pilot test was conducted to ensure that the selected cases were eliciting responses relevant to EL and that were captured by the explicated dimensions. The first case involved a dilemma about whether to cover up or expose potentially damaging information about one’s firm related to a worker injury. The second case dilemma involved a conflict between protecting the environment versus protecting an employer’s reputation. Both cases asked assessees to take on a leadership role. Based on a review of the responses, two cases and the three questions used in the pilot test were retained for the EL-DAC. Although all participants responded to both cases over the course of the study, the order in which cases were presented was randomized.
The second exercise borrowed components of Treviño et al.’s (Reference Treviño, Sutton and Woodman1985) inbox exercise focused on EL. We selected a combination of both relevant and irrelevant prompts to minimize the threat of demand characteristics. Using this exercise as a foundation, we then incorporated a role-play focused on sexual harassment claims. We chose to include this role play, as sexual harassment as it is an ethical issue per the APA Ethics Code (APA, 2022). Specifically, while working through the inbox exercise, assessees were interrupted two times by the fictional administrative assistant who suggested that there had been complaints about sexual harassment from one of the managers in the team. The role play was conducted by one of the assessors who took on the role of the administrative assistant. A copy of the revolving door script can be found in the Appendix.
The EL-DAC
Due to the COVID-19 pandemic, the assessment center was conducted online. Unique to other assessment centers, communication was one-on-one with the assessor and the assessee via email, Qualtrics surveys, and Zoom. One week prior to the assessment center, assessees received an email with expectations for their participation, as well as a link to complete the IRB consent form and pre-test. The pre-test included several measures of ethicality including Becker’s (Reference Becker2005) Integrity Scale (α = .90–.95) and Watts et al.’s (Reference Watts, Medeiros, McIntosh and Mulhearn2020) biased attitudes scale (BiAS; α = .73–.85), as well as measures of Machiavellianism (Christie & Geis, Reference Christie and Geis1970), Narcissism (Crowe et al., Reference Crowe, Carter, Campbell and Miller2016), and Moral Identity (Aquino & Reed, Reference Aquino and Reed2002). Assessees were also asked to send a link to an EL survey including the Ethical Leadership at Work Questionnaire (Kalshoven et al., Reference Kalshoven, Den Hartog and De Hoogh2011; α = .81) and the Ethical Leadership Questionnaire (Yukl et al., Reference Yukl, Mahsud, Hassan and Prussia2013; α = .74) to 3–5 members of their professional team.
When assessees logged in for their assessment center using Zoom, they were greeted by one assessor. Assessees first completed one of the two selected case studies with no time limit. They were asked to complete three open-ended questions in response to each case study. Next, assessees began their revolving door exercise using the Qualtrics survey tool. Each assessee was allotted 45 minutes to complete the exercise. Again, they were asked to provide open-ended responses to the exercise prompts. The assessee was interrupted twice at the 10- and 20-minute time mark. After completing the revolving door exercise, the assessee was asked to complete an online interview survey which asked questions regarding the EL- specific components of the exercise.
Two weeks after completing the assessment center, assessees were sent a post test survey. The post test included all the original measures included in the pre-test survey. Additionally, assessees were once again asked to send the EL survey to 3–5 team members.
Assessment center scoring and feedback
Exercises were scored independently by three organizational psychology graduate students using EL dimensions (Treviño et al., Reference Treviño, Hartman and Brown2000). Raters were trained using frame-of-reference training on samples of responses until sufficient rater agreement was observed. The raters were retrained until there was no more than 1.0 variance in agreement. Thus, using a 5-point rating scale, the raters evidenced an average interrater agreement coefficient (rwg) above .90. In instances of substantial disagreement among raters, additional training was provided in the form of calibration meetings. Raters were provided participant responses shortly following participant completion of exercises and would individually rate the EL dimensions for each exercise (i.e., cases, inbox, role play). The dimensions used were the moral person and moral manager behavioral markers outlined in above sections (e.g., integrity, self-awareness, role-modeling; Brown et al., Reference Brown, Treviño and Harrison2005; Treviño et al., Reference Treviño, Hartman and Brown2000).
Rater scores were provided within 1–3 weeks of completion. Aggregated (i.e., averaged) scores from these raters and exercises were used to populate the EL-DAC feedback form provided to assessees. The feedback form included the assessee’s score on each dimension, a rating of low, medium, or high for each score, and an example of how this dimension was (or was not) demonstrated during the exercises. For example, one assessee scored a high score of Concern for Others for their response “this is obviously very concerning becayse of the safety of all the employees and individuals of all parties involved” and another scored high on Giving Voice, as they stated “I would be willing to listen to the voices of my team and hopefully bring some wisdom to the tough decision”. Whereas other assessees scored low on Reward and Discipline by stating “I appreciate you for reaching out” and Concern for Norms by defaulting decision-making power and responding, “I will inform upper management of the problem.”
Conditions
To test the efficacy of assessment center feedback at developing EL, assessees were randomly assigned to one of two conditions. In condition 1 (i.e., feedback), assessees received their post-test within 1 week of completing the assessment center and prior to completing the post survey. In condition 2 (i.e., no feedback), assessees received their post test 3-4 weeks after their assessment center and after completing the post survey. This design allowed for the comparison of post-test scores between groups who received feedback and those who did not receive feedback. However, all participants received feedback within 2 weeks of completing the assessment center to ensure fairness across assessees.
Results
EL-DAC convergent validity and feedback efficacy
Hypothesis 1: Convergent validity
Correlations were examined to assess convergent validity. To better observe the effectiveness of the revolving door exercise, the inbox, and role-play components were examined separately. As such, three correlation matrices are presented in Tables 3 (case analysis), Table 4 (inbox), and Table 5 (role play). Given the low statistical power, none of the relationships were statistically significant. Nevertheless, patterns in the direction of correlation coefficients (positive vs. negative) provided some insight bearing on convergent validity. For example, correlations between moral identity and EL dimension scores for both the case analysis exercise (Table 3) and the inbox exercise (Table 4) trended in the positive direction for all eight EL dimensions. Similarly, depending on the exercise scored, Machiavellianism showed mostly negative correlational patterns with the EL dimensions. Finally, despite a few exceptions, the eight EL dimensions within each exercise tended to correlate in the positive direction with one another.
Hypothesis 2: Efficacy of Feedback
Hypothesis 2 focused on differences between those who received immediate feedback and those who received delayed feedback (after the post-test). Similar to the examination of Hypothesis 1, the shift in our data collection resulted in a sample that would likely be underpowered for a statistical mean difference test. Thus, it is not surprising that no statistically significant differences were observed between the feedback versus no feedback conditions for the integrity test or BiAS scores. Although most assessees had at least one team member successfully complete the pre-surveys of EL, only five assessees had at least one team member complete the post-surveys of EL. As such, no analyses could be conducted regarding others’ perceptions of the assessee EL before and after the assessment center.
Qualitative reactions to revolving door exercise
The revolving door exercise consisted of both an inbox and a role-play. Although originally designed as a physical disruption to the assessee’s inbox exercise, COVID-19 circumstances necessitated moving the role-play to an online chat context. To our knowledge, the chat function has not been used in a formal assessment center in the literature. Reactions to the revolving door exercise are threefold: (a) the chat elicited responses relevant to the dimensions, (b) using the chat function does represent how many people communicate at work, and (c) assessees sometimes did not notice and respond to the chat potentially creating construct irrelevant variance.
On the positive side, when assessees did notice the chat messages, their responses were related to the dimensions at hand. For instance, several assessees noted following up with formal investigations into the remarks, whereas others specifically noted punishments. Others noted needing to follow up on company policy before responding further. Still, some pushed the issue aside and suggested that it may take a while to resolve due to “more pressing issues.” The range of responses suggests that this exercise both tapped relevant dimensions and provided the expected variability across assessees.
Last, assessees were asked to share their experience with the revolving door exercise during the post exercise interview. One of the authors identified five themes in the 24 comments provided by assesses. The first two themes—positivity and challenge—emerged for the majority of assesses. Specifically, 75% of assessees commented that they had a positive reaction to the exercise (e.g., “I actually really enjoyed this activity”), and 87.5% commented about the challenging nature of the exercise. Constraints that were reported to increase the challenge of the exercise included limited time, lack of personal management experience, lack of organization-specific knowledge relevant to the scenario, and the difficulty of solving interpersonal and ethical issues. Nevertheless, many assessees (25%) noted the realism of experiencing such constraints in a management role—our third theme. For example, one assessee noted, “Since there was a time limit, it stimulates real life working condition to scramble and try to reply all emails in a short period of time.” The fourth theme, noted by 20.8% assessees, was that the exercise stimulated self-reflection and learning (e.g., “I feel l learned about myself during this process and how I make consequential decision under pressure”). The fifth, and final, theme was ideas for improving the exercise (29.2% of assessees). For example, one assessee noted they could have completed the exercise more efficiently if they had the ability to “toggle between emails.” A full list of assessee comments can be viewed in Table 6.
Discussion
Experiential and quantitative results from this effort reveal several challenges and opportunities for assessing and developing EL using the assessment center approach. Regarding dimensions, the primary investigators were able to develop eight dimensions with behavioral anchors (see Brown et al., Reference Brown, Treviño and Harrison2005) observable during the assessment center. This is an important outcome as it was unclear whether observable behavioral markers of EL in this context could successfully be developed. However, the dimensions also presented several challenges. Specifically, the principle investigators and the graduate research assistants noted the difficulty in observing moral person dimensions (e.g., integrity, self-awareness, concern for others) during the exercises. In contrast, both groups noted that the moral manager dimensions were displayed more frequently. As such, although the moral person and moral manager dimensions make appropriate theoretical distinctions, the moral person dimension appears less appropriate for assessing observable behaviors. Finally, it is likely more useful to assessees to receive feedback on the moral manager dimensions compared to the moral person dimensions. From a practical perspective, it is more appropriate and useful to provide feedback such as “Communicate about ethics with your team more” rather than “Have more integrity.”
Regarding assessee reactions to the revolving door exercise, it is noteworthy that chat functions such as Microsoft Teams, Slack, and other communication channels have become common methods of communication in the workplace. As such, the introduction of an interactive chat function adds a level of face validity to the assessment center. Although the present effort used the Zoom chat function, more realistic revolving door tasks should include more high-tech, face valid chat functions such as Teams and Slack. Future research might also consider how to better represent the modern work experience in an inbox exercise. Just as the in-basket moved to inbox, perhaps now is the time to consider a more dynamic exercise that better reflects the multiple different types of communication tools (e.g., email, chat, text) available to workers today.
It is important to note a major drawback to the exercise. Although participants were informed that some colleagues may use the chat function, some assessees did not notice the chat, or did not notice the chat until later in the exercise. Failing to notice the chat likely represents construct irrelevant variance as it is distinct from an assessee who reads the chat and chooses not to respond. In this instance, we were unable to parse out these two types of responses and as such, were unable to accurately capture the assessee’s response to the text-based role play. Using practice chats, or other chat methods, may be fruitful in future efforts using this approach.
Additionally, variability in assessee responses to the revolving door exercise suggests that the exercise was effective for eliciting a range of responses. This is particularly true for the role-play component of the exercise. As such, we encourage future researchers to continue to expand the arsenal of available assessment center exercises that incorporate modern technology and better reflect the way people live and work. For example, COVID-19 made remote and hybrid working a mainstay for many. Creating remote (such as this) or hybrid assessment centers that allow assessees to use common platforms and technologies could create a more face valid experience. However, it is still important to bear in mind potential downsides of using technology in the assessment center exercise. Specifically, technology should not be used casually as it can create construct irrelevant variance. As such, technology and modern platforms should only be used to the extent that that technology is important on the job and required for performance.
In line with the point above, we note that our study design focuses primarily on work and leadership in a virtual context. Therefore, our observations may not generalize to assessing EL in assessment centers in all work contexts. Mesmer-Magnus & DeChurch (Reference Mesmer-Magnus and DeChurch2009) and De Guinea et al. (Reference De Guinea, Webster and Staples2012) identified key differences in social interactions in virtual versus in-person work. Namely, openness behaviors are expected to be less frequent, communication and knowledge sharing are expected to decrease, and conflict is expected to increase in virtual work. Such social variables may affect how EL behaviors are enacted and detected as part of a virtual assessment center process. A related concern emerges when considering the predictive validity of EL assessment centers. Specifically, our approach was limited in examining others’ ratings of assessee EL. Given that peers and coworkers may have limited opportunities to observe an assessee’s unethical conduct, particularly in virtual work settings, coworker ratings of EL may be too contaminated or insufficient as a criterion for establishing the predictive validity of EL assessment centers.
The lack of empirical results regarding the efficacy of the assessment center feedback is disappointing. However, given the complexity of EL, developing this approach may be difficult and require a longer period of time than may be observed in the present study. As such, future research should consider a multi-wave longitudinal approach that tracks the development of EL over time.
Despite these results, correlational relationships suggest some promising patterns. For example, the largely negative relationships between the assessment center dimensions and Machiavellianism and positive relationships with moral identity suggests that the dimensions may be appropriately assessing important ethics constructs. However, the results were inconsistent across constructs. As such, more research is needed to fully understand the potential of assessment centers for developing EL.
Of course, it is important to note that these correlations may not be generalizable. For example, it is plausible that different results may have been observed in a face-to-face setting. As methodological choices can introduce construct irrelevant variance, it is possible that additional factors related to our use of a digital medium may have impacted results. Although this approach was necessary due to the COVID-19 pandemic, and, in fact, presented a unique opportunity to conduct the assessment center in a virtual environment, future research is needed to understand and compare the differences between an online and a face-to-face EL assessment center.
Our research offers an initial examination of the potential for using the assessment center technique to develop EL. Our experience and initial results suggest that it is possible to use this technique to assess EL. Specifically, assessment centers may be particularly useful for evaluating the “moral manager” dimension of ethical leadership. Assessing the “moral person” component proved much more difficult and as such, may be better suited for other assessment techniques. This notion was supported by empirical evidence, which demonstrated some patterns, albeit inconsistently, of relationships across ethics-related variables and the dimensions assessed.
Although this work presents preliminary findings and opens the way for future research, some suggestions for practice can be discussed. Organizations may seek to quantify EL to preempt damaging scandals. This study demonstrates that some critical qualities that may seem abstract (e.g., EL), can indeed be measured. Specifically, assessment centers can capture the moral managing components of EL; however, the moral component of ethical leadership may be better evaluated though other assessments (e.g., survey response).
Our study also revealed that EL assessment center approaches can be adapted to incorporate interventions or pre/post tests. This point may be especially useful for designing EL development strategies. Moreover, our review of assessee feedback suggests that managers may even respond positively to live, high-fidelity evaluation of their ethical behavior. An example of the high-fidelity nature of our inbox/role-play exercise was the organizational chart provided to each assessee that illustrated their position in the organization relative to other characters in the simulation. This organizational chart helped to simulate the power differences between positions might occur in real-world work relationships—power differences each assessee had to navigate to successfully demonstrate their EL.
Although conclusions are limited due to a small sample, initial empirical results suggest that the assessment center technique may produce a backlash effect seen in other training contexts if attempting to develop attitudes. As such, future research is needed to understand the trajectory of development for EL using this and other techniques. As illustrated in the Toyota Motors example, ethical dilemmas continue to arise in modern organizations, as the integration of profit goals, public health, and environmental issues become salient. This research offers preliminary solutions to these relationships in that virtual EL assessment centers may contribute to the future of ethical leader development.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/iop.2024.7.
Acknowledgments
Support for this research was provided by the Douglas W. Bray and Ann Howard Research Grant, administered by the Society for Industrial and Organizational Psychology (SIOP).