Stereotypes of older workers and perceived ageism in job ads: evidence from an experiment

Ian Burn; Daniel Firoozi; Daniel Ladd; David Neumark

doi:10.1017/S1474747222000270

Stereotypes of older workers and perceived ageism in job ads: evidence from an experiment

Published online by Cambridge University Press: 05 January 2023

Ian Burn ,

Daniel Firoozi ,

Daniel Ladd and

David Neumark

Show author details

Ian Burn: Affiliation:
University of Liverpool, Liverpool, UK
Daniel Firoozi: Affiliation:
University of California-Irvine, Irvine, California, USA
Daniel Ladd: Affiliation:
Quantitative Economic Solutions, LLC, Boston, Massachusetts, USA
David Neumark*: Affiliation:
University of California-Irvine, Irvine, California, USA
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
Introduction
Conceptual framework and implications of the evidence
Studying job ads
Methods
Experimental evidence
Results
Conclusions and discussion
Footnotes
References

Rights & Permissions

Abstract

We explore whether ageist stereotypes in job ads are detectable using machine-learning methods measuring the linguistic similarity of job-ad language to ageist stereotypes identified by industrial psychologists. We then conduct an experiment to evaluate whether this language is perceived as biased against older workers searching for jobs. We find that job-ad language classified by the machine-learning algorithm as closely related to ageist stereotypes is perceived by experimental subjects as biased against older job seekers. These methods could potentially help enforce anti-discrimination laws by using job ads to predict or identify employers more likely to be engaging in age discrimination.

Keywords

Age discrimination age stereotypes hiring job search J18 J26

Type: Special Issue Article
Information: Journal of Pension Economics & Finance , Volume 22 , Issue 4 , October 2023 , pp. 463 - 489

DOI: https://doi.org/10.1017/S1474747222000270 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

1. Introduction

Lengthening work lives for those able to work longer is an important part of the policy response to population aging. Reducing age discrimination in hiring is critical to achieving this goal, because many seniors transition to part-time or shorter-term ‘partial retirement’ or ‘bridge jobs’ at the end of their careers (Cahill et al., Reference Cahill, Giandrea and Quinn2006; Johnson et al., Reference Johnson, Kawachi and Lewis2009), or return to work after a period of retirement (Maestas, Reference Maestas2010). There is an extensive body of research testing for employer discrimination against older workers in labor markets, using correspondence studies to test for discrimination in hiring (e.g., Bendick et al., Reference Bendick, Jackson and Romero1997, Reference Bendick, Brown and Wall1999; Lahey, Reference Lahey2008; Farber et al., Reference Farber, Herbst, Silverman and von Wachter2019; Neumark et al., Reference Neumark, Burn and Button2019a, Reference Neumark, Burn, Button and Chehras2019b). This research focuses on measuring employer behavior – specifically, whether there is less hiring of qualified older workers – and generally finds evidence consistent with hiring discrimination against older workers.Footnote ¹ There is little work, however, that studies how workers respond to age discrimination in the labor market.

In this paper, we explore potential worker responses to one possible manifestation of age discrimination in the labor market – in particular, whether workers perceive job requirements using language related to ageist stereotypes as biased against older workers searching for jobs. If ageist stereotypes in job ads discourage older workers from applying for jobs, such language can have the same adverse outcome on the hiring of older workers as employers discriminating against older job applicants. Viewed in the context of labor market search, direct discrimination by employers reduces the likelihood that employers make an offer to an older worker, whereas discouraging them from applying reduces the likelihood that older workers find a match. Both thus reduce the arrival rate of job offers, lengthening unemployment durations – which are generally longer for older workers (Neumark and Button, Reference Neumark and Button2014), a problem that has been long noted in policy debate regarding age discrimination (U.S. Department of Labor, 1965).

We study how job-ad language using age-related stereotypes, or more blatantly ageist language, is interpreted by potential older job applicants, in two steps. First, we use machine-learning methods (partly developed in Burn et al., Reference Burn, Button, Munguia Corella and Neumark2022a) to identify phrases in job ads that are linguistically related to ageist stereotypes drawn from the industrial psychology literature.Footnote ² We use these phrases to construct typical job-ad language that reflects specific age stereotypes. Second, we conduct an experiment using an online panel (Amazon's MTURK), in which we ask whether respondents perceive this job-ad language, which the machine-learning algorithm classified as related to ageist stereotypes, as ageist – based on questions about whether stated job requirements are ‘biased against workers over the age of 50’. Our experimental evidence shows that job-ad sentences that are classified as closely related to ageist stereotypes by the machine-learning algorithm are also rated by experimental subjects as biased against older job seekers. Because we couch the language in the MTURK experiment in the context of job search, we believe our evidence speaks directly to the question of whether ageist language in job ads may discourage older workers from applying for jobs.

Utilizing ageist language in job ads to shape the applicant pool by discouraging older applicants has a potential benefit for discriminatory employers, because of the incentives created by age discrimination laws. A lower representation of older workers in their applicant pool can justify a lower representation of older workers among employees, making it easier to rebut an allegation of age discrimination in hiring. More generally, employers who do not want to hire older workers might, in order to avoid unnecessary search costs, discourage older workers from applying by signaling their ageism.

Our evidence does not directly address the actual behavior or intent of employers that might underlie the use of ageist stereotypes in job ads. Using such language to deter older job applicants could reflect taste discrimination or statistical discrimination. That is, employers may intentionally use this language to deter older workers from applying, either because of taste discrimination (a simple aversion to hiring older workers) or statistical discrimination (an assumption that older workers are not as productive), and this may ‘work’ because respondents perceive age stereotypes in job ads as directly reflecting age bias. Alternatively, employers could be stating actual job requirements; they may have no discriminatory intent in doing this, but may engage in statistical discrimination by assuming that older workers who apply for jobs with these requirements are less likely to meet the requirements. Older job searchers, knowing this, might perceive the job-ad language as ageist because job ads with this language are less likely to result in job offers when older workers apply.Footnote ³

However, we believe that this more subtle story of no discriminatory intent is not the operative one, and instead that employer behavior and respondent perceptions pertain to a desire to avoid hiring older workers. First, in the original correspondence study from which the job ads are drawn (Neumark et al., Reference Neumark, Burn and Button2019a), evidence of statistical discrimination based on age-related worker skills and characteristics was largely ruled out. Second, the treatment phrases we use based on the machine learning (described in more detail below) do not seem to describe skills that are notably different between workers over and under 50 (the dividing line for ‘older’ in our experiment). As examples, the phrases are things like ‘good communication and teamwork’, ‘accounting software systems like Netsuite…’ (software that is over 20 years old, and other software we mention is also older), and ‘lift 40 pounds’. Thus, our sense is that the job-ad language is perceived more as a signal of ageism than as a signal of job requirements that are strongly related to age. Third, our MTURK experiment specifically asks about perceptions of age bias.

Regardless, in each of these scenarios, our evidence has the potential to identify job-ad language that can deter job applications from older workers. As discussed in the next section, any of these scenarios could reflect age discrimination as either social scientists or the law define it.

Our paper demonstrates the promise of machine-learning methods to help reduce age discrimination in the labor market. We present two types of evidence as ‘proof of concept’. First, we verify that our machine-learning methods detect the presence of stereotyped language in our constructed job ads, even when only one sentence in the job ad is highly related to the ageist stereotype. Second, our main evidence, from our experiment, indicates that this age-stereotyped language is viewed as biased against older workers, which we believe indicates that older job seekers would be less likely to apply to job ads using such language.

Two recent papers have focused on how ageist stereotypes impact the job prospects of older workers. Burn et al. (Reference Burn, Button, Munguia Corella and Neumark2022a) show that ageist language in job ads helps predict age discrimination by employers, and van Borm et al. (Reference van Borm, Burn and Baert2021) show that employers use ageist stereotypes to help them evaluate resumes. In contrast, we focus on the potential responses of workers to job-ad language that reflects ageist stereotypes. By examining the behavior of workers rather than employers, we are the first in this literature to show that language reflecting age-related stereotypes is viewed as ageist by employees, and that machine-learning methods developed to identify age-related stereotypes have potential applications for enforcing non-discrimination protections for older workers.

It might seem unsurprising that job ads using ageist or age-stereotyped language are perceived as ageist by respondents. Indeed, one of our treatments uses language suggested by AARP that is sufficiently blatant (e.g., ‘energetic person’) that the results might not appear surprising at all. A real-world example that is similarly blatant is stating maximum experience levels in job ads. This occurred recently in Kleber v. Carefusion Corp., where the job ad requested ‘3 to 7 years (no more than 7 years) of relevant legal experience’, language that will clearly act to exclude many older applicants.Footnote ⁴

However, we emphasize two points that make the evidence much more interesting and applicable to real-world job ads that do not use this kind of blatant – and exceptional – language. First, we use phrases from actual job ads (approximately 14,000 job ads collected in the age discrimination correspondence study by Neumark et al., Reference Neumark, Burn and Button2019a), selecting phrases that appear in these ads and are semantically related to age stereotypes. As shown later in the paper, these phrases are far more subtle.Footnote ⁵ Second, the kinds of age-stereotyped phrases from the job ads that we use help predict age discrimination by employers, as measured in the correspondence study (Neumark et al., Reference Neumark, Burn and Button2019a; Burn et al., Reference Burn, Button, Munguia Corella and Neumark2022a). In other words, our methods can be used to identify actual age-stereotyped language, and the same methods we use in this paper to study perceptions of potential job applicants also classify job-ad language in a manner that helps predict employer discrimination. Thus, we can garner evidence on whether the same kind of job-ad language that is associated with discriminatory employers also might be likely to discourage older workers from applying for jobs by signaling age discrimination. This kind of discouragement could lead to age patterns in application and hiring data that understate age discrimination, including in correspondence studies of age discrimination if discriminatory employers do not discriminate as much against older applicants because ageist language has already reduced the number of older applicants.

2. Conceptual framework and implications of the evidence

Why might employers use stereotyped language in job ads? One hypothesis is that employers who discriminate based on age use stereotyped language to try to shape the applicant pool, to reduce the likelihood that age discrimination is detected. Using language that conveys positive stereotypes related to young workers might discourage older workers from applying (as might language conveying negative stereotypes related to older workers – although that seems less likely and is, in fact, less common in our data). This discouragement from applying would lead to the underrepresentation of older applicants in the applicant pool, and is potentially valuable to a discriminating employer because the probability of a hiring age discrimination claim and of an adverse outcome for the employer is smaller when the ratio of older applicants to younger applicants is lower.Footnote ⁶ Employers could use job-ad language this way regardless of the nature of age discrimination, and, in the case of statistical discrimination, whether or not the language is related to the assumptions they make about older workers (e.g., they might assume older workers will leave the firm sooner). In either case, employers might use ageist language in job ads to deter older workers from applying, but introduce this language via job requirements that are correlated with age, natural to use in job ads, and not so blatant as to make the age discrimination clear.

A second hypothesis, which is more complex, is also related to statistical discrimination. Different jobs may have different requirements, which could be stated in job ads. But employers may hold stereotypes about older job applicants' abilities to meet these job requirements – for example, assuming that older workers are less likely to be able to do the heavy lifting that a job requires, which may well be true on average but of course not of each applicant.

Both statistical and taste discrimination are illegal under US law. Not surprisingly, language in job ads that refers to age either explicitly or ‘mechanically’ is illegal in the United States. The US Code of Federal Regulations covering the ADEA currently states, ‘Help wanted notices or advertisements may not contain terms and phrases that limit or deter the employment of older individuals. Notices or advertisements that contain terms such as age 25 to 35, young, college student, recent college graduate, boy, girl, or others of a similar nature violate the Act unless one of the statutory exceptions applies’ (§1625.4).Footnote ⁷

The legality of less blatant job-ad language with job requirements that reflect age stereotypes and is associated with lower hiring of older workers is more complex. On the one hand, EEOC regulations state: ‘An employer may not base hiring decisions on stereotypes and assumptions about a person's race, color, religion, sex (including pregnancy), national origin, age (40 or older), disability or genetic information’ (see U.S. Equal Employment Opportunity Commission, n.d.a). On the other hand, job requirements that are based on factors related to age are not necessarily illegal. The legality of job requirements related to age generally requires an employer to show that the use of these requirements is based on a reasonable factor other than age (RFOA), even if that factor is correlated with age. An RFOA is defined as ‘a non-age factor that is objectively reasonable when viewed from the position of a prudent employer mindful of its responsibilities under the ADEA under like circumstances’ (see Federal Register, n.d.). In other words, the law recognizes that characteristics of workers that are related to age can sometimes be legitimate for employers to consider.

Indeed the law even goes further, as in some rare cases employers can use age as an explicit criterion if it is inherently related to a requirement for the job that is related to age but hard to assess independently. This requires that age can be shown to be a ‘bona fide occupational qualification’ (BFOQ) that is ‘reasonably necessary to the normal operation of the business’ (U.S. Equal Employment Opportunity Commission, n.d.b). A key example is Hodgson v. Greyhound Lines, Inc., where the company was sued for having a maximum hiring age. Greyhound prevailed by establishing that driving ability is essential to passenger safety, that older hires would be less safe drivers (because achieving maximum safety took 16–20 years of experience), that some abilities associated with safe driving deteriorate with age, and that these changes are not detectable by physical examination (which could otherwise be a substitute for an age criterion) (see U.S. Court of Appeals, 7th Circuit, 1974). The issue of the rights of older workers vs. public safety has figured prominently in court decisions regarding age as a BFOQ under the ADEA (Combs, Reference Combs1982).

Our evidence indicates that we can reliably detect age stereotyping in job ads, and that this job-ad language is interpreted as disadvantaging older applicants. This evidence can provide information and tools to parties that enforce age discrimination laws, by helping to identify job-ad language that may predict intent to discriminate on the basis of age in hiring and adverse impact on older job applicants. Secondarily, our paper makes a contribution regarding methods used in the literature in industrial psychology on employer and worker beliefs about stereotypes. Much of the previous literature in industrial psychology utilizes surveys to understand how employers view older workers or how workers view job ads. To incentivize the elicitation of respondents' true beliefs, we asked respondents to guess how the average respondent to our survey rated each statement, and respondents were paid based on how close to the true value their answers were. When asked to state their own beliefs, respondents in our survey were less likely to rate a statement as ageist. But when asked how they thought the average respondent would view the same statement, they rated statements as more ageist on average. These findings suggest that standard surveying methods that do not incentivize responses may lead to an underreporting of perceptions of ageism; one potential explanation is that it is not socially desirable to perceive the language we use as ageist (Cherry et al., Reference Cherry, Allen, Denver and Holland2015), since doing so indicates that one attributes the characteristics of people used in this language as applying more to older people.

Our evidence cannot speak directly to the question of taste vs. statistical discrimination or whether the job requirements would be viewed as legal. Indeed, we do not study employer behavior in our experiment, although we do use job-ad language from real employers. Instead, in our experiment we ask respondents if they perceive job-ad language pertaining to job requirements as ‘biased against workers over age 50’ (as explained in more detail below). A positive response could mean either that the language is perceived as directly reflecting age bias – aversion to hiring older workers – or that the language is perceived as ‘biased’ because it puts older workers at a disadvantage because they may be less likely to satisfy the stated job requirement. Similarly, our evidence does not speak to whether a stated job requirement would be legal. What our evidence does address is whether age stereotypes expressed in job ads likely signal to job applicants that older workers are less likely to be hired. Thus, our evidence can reveal the potential for employers to use job-ad language to discriminate against older workers in hiring, and the potential adverse impact on older job applicants. Challenges remain in fully understanding the behavior underlying the actual use of such language in real job ads, and the legality of doing so.

Nonetheless, evidence on age stereotypes in job language could help identify employers that may be discriminating based on age, providing an additional tool in identifying potential discriminators, above and beyond current enforcement mechanisms that rely on worker-initiated complaints focused on hiring outcomes.Footnote ⁸ This could be valuable for two reasons. First, the use of ageist stereotypes in job ads that discourage older applicants from applying for jobs could provide, to the EEOC, a potential indicator of age discrimination in hiring that could be investigated further. In addition, the EEOC might, in response to such evidence, issue guidance to employers to avoid language that might discourage older workers from applying.Footnote ⁹ Second, if employers use ageist stereotypes in job ads to discourage older workers from applying for a job, this can serve as a second source of age discrimination and hiring that has the same effect in lowering employment of older workers as direct hiring discrimination.

3. Studying job ads

Very few studies explore job ads, and fewer still focus on discrimination. Among studies of issues other than discrimination, Modestino et al. (Reference Modestino, Shoag and Balance2016) use text data from job ads to document that ‘downskilling’ occurred during the recovery from the Great Recession, with firms reducing skill requirements in their job ads. Deming and Kahn (Reference Deming and Kahn2018) use text data in job ads to measure how different skills relate to wages. Marinescu and Wolthoff (Reference Marinescu and Wolthoff2020) match text data from job ads to job application data to study the matching process between jobs and applicants. Focusing on discriminatory language, Kuhn and Shen (Reference Kuhn and Shen2013) and Kuhn et al. (Reference Kuhn, Shen and Zhang2018) explore how gender preferences feature explicitly or implicitly in job ads in China, Hellester et al. (Reference Hellester, Kuhn and Shen2020) explore age and gender preferences in job ads in China and Mexico, and Arceo-Gomez and Campos-Vazquez (Reference Arceo-Gomez and Campos-Vazquez2019) study gender and attractiveness preferences in job ads in Mexico. Kang et al. (Reference Kang, DeCelles, Tilcsik and Jun2016) study how potential job seekers respond (with minority job seekers ‘whitening’ their resumes) to job ads including text about valuing diversity.

Two studies, to date, connect the text of job ads to measured discriminatory behavior of employers.Footnote ¹⁰ Tilcsik (Reference Tilcsik2011) identifies words in job ads related to masculine stereotypes (decisive, aggressive, assertive, and ambitious) and links those to hiring outcomes in a correspondence study of discrimination against gay men. In the most systematic approach, Burn et al. (Reference Burn, Button, Munguia Corella and Neumark2022a) identify common age stereotypes from the research literature in industrial psychology, use machine learning to calculate the relationship between the text of the job ads and specific age stereotypes, and test whether job-ad language related to the stereotypes predicts hiring discrimination against older workers in a correspondence study. As already noted, the present paper builds on this prior work.Footnote ¹¹

There has been no research on whether the ageist language in job ads is perceived as ageist by potential job applicants. Obviously, the use of such language in job ads is much more troublesome if it is perceived as ageist and thus discourages older workers from applying for jobs. If this happens, it should be viewed as another dimension of age discrimination in hiring – one that has not been studied or detected in the research literature that tests for hiring discrimination, mainly using correspondence studies.Footnote ¹²

What is known about how job applicants read job ads focuses exclusively on gender bias. Gaucher et al. (Reference Gaucher, Friesen and Kay2011) found that job ads for male-dominated occupations used words associated with male stereotypes (such as ‘leader’, ‘competitive’, or ‘dominant’) more frequently than advertisements for female-dominated occupations, and women found job advertisements less appealing when they contained more masculine than feminine wording (Bem and Bem, Reference Bem and Bem1973; Gaucher et al., Reference Gaucher, Friesen and Kay2011). Chaturvedi et al. (Reference Chaturvedi, Mahajan and Siddique2021) use machine learning to study job ads, identifying words that are predictive of a gender preference; they find that wage offers are lower in jobs with language expressing a preference for women, whether directly, or implicitly through gendered language related to skills, personality traits, and flexible work.

4. Methods

4.1 Selecting the stereotypes

To select the stereotypes we study, we start with a list of 17 ageist stereotypes from the industrial psychology literature, identified in earlier research (Burn et al., Reference Burn, Button, Munguia Corella and Neumark2022a). We conducted a detailed review of the industrial psychology, communications, and related literature to identify age stereotypes that this research identifies as applying to workers in their 50s and 60s. We relied on studies that were more likely to cover more recent older cohorts, since age stereotypes may change over time (Gordon and Arvey, Reference Gordon and Arvey2004); we avoided studies published before the 1980s and studies of non-Western countries. We reviewed an extensive set of literature reviews and meta-analyses to identify the relevant studies, but we draw our stereotypes from papers that tested for stereotypes rather than papers that simply reported or aggregated the evidence on stereotypes from other studies. We compiled lists of the stereotypes that these studies identified as applying to older workers. Since studies often have similar stereotypes but phrase them differently, we grouped very similar stereotypes into aggregate categories in a similar manner to the literature review and meta-analysis papers (e.g., Posthuma and Campion, Reference Posthuma and Campion2007). To focus the analysis on stereotypes on which research agrees, we included a stereotype in our analysis only if at least two studies confirmed the stereotype. This process led to 17 stereotypes of older workers, listed in Table 1.Footnote ¹³

Table 1. Age stereotypes from industrial psychology literature

Note: See Burn et al. (Reference Burn, Button, Munguia Corella and Neumark2022a).

For our analysis of job ads, we selected a subset of these stereotypes that met the following criteria. First, the stereotype is commonly expressed in job-ad language about the ideal or preferred candidate's skills or attributes; we did not want to focus on stereotypes that are not often included in job ads (e.g., hearing and memory), even if, according to the industrial psychology literature, employers hold these stereotypes. Second, we focused on stereotypes for which we had evidence of a correlation between discrimination and the stereotype (from Burn et al., Reference Burn, Button, Munguia Corella and Neumark2022a).Footnote ¹⁴ Third, older workers should be aware that employers held the stereotype. As evidence, we drew on various reports put out by AARP; see Brenoff (Reference Brenoff2019) and Terrell (Reference Terrell2019). Our final list of stereotypes is three skills or abilities for which older workers are stereotyped as deficient: communication skills, physical ability, and technological skills.Footnote ¹⁵

Industrial psychology research focuses on the skills that employers desire in workers, but in which older workers are perceived deficient. In contrast, job ads rarely use negative formulations of a skill requirement, but instead turn the language to a positive formulation (e.g., ads will ask that a candidate be ‘adaptable’, rather than that they are ‘not stubborn’). When describing skills and requirements related to our stereotypes, employers use words like ‘outgoing’ (Stewart and Ryan, Reference Stewart and Ryan1982), ‘sociable’ (Kite et al., Reference Kite, Deaux and Miele1991), and ‘conversational skills’ (Schmidt and Boland, Reference Schmidt and Boland1986; Ryan et al., Reference Ryan, See, Meneer and Trovato1992) to describe communication skills. Physical ability is expressed using words like ‘energy’, ‘speed’, and ‘physical capability’ (Levin, Reference Levin1988; van Dalen et al., Reference van Dalen, Henkens and Schippers2009). Technological skills focus on the ability to use ‘new technology’ or on ‘technological competence’ (AARP, 2000; McGregor and Gray, Reference McGregor and Gray2002; McCann and Keaton, Reference McCann and Keaton2013).

4.2 Creating the treatment (stereotyped) and control job-ad language

We create two sets of phrases: treatment phrases and control phrases. The treatment and control job-ad sentences differ in the job requirements expressed and the type of language used in these phrases; we try as much as possible to have the treatment and control phrases describe similar skills, although we had to allow for some differences to be appropriate to the occupation (see Table 2).Footnote ¹⁶ Our control sentences express job requirements that are also appropriate for the job but use age-neutral language not related to these age-stereotyped skills or abilities, while our treatment sentences use language highly related to these ageist stereotypes.

Table 2. Control and treatment phrases by occupation

Note: See text for a description of how each sentence was created. The average cosine similarity score with the stereotype for each phrase (averaging over the cosine similarity score of each word contained in the phrase) is reported in parentheses. CSS, cosine similarity score.

Our main method for generating phrases and sentences highly related to ageist stereotypes uses measures of semantic similarity generated by machine-learning methods. Moreover, to isolate the effects of the different stereotypes, we used the results from these machine-learning methods to construct sentences that were highly related to only one of the three stereotypes we use. We calculate the semantic similarity of nearly one million (997,562) phrases from the approximately 14,000 job ads collected in Neumark et al. (Reference Neumark, Burn and Button2019a) to the communication skills, physical ability, and technology skills stereotypes, measuring semantic similarity by the ‘cosine similarity score’, a metric that ranges from −1 (completely unrelated) to 1 (identical).

We provide a brief overview, explanation, and example of these machine-learning methods and the cosine similarity score.Footnote ¹⁷ We first identify a corpus of the English language that we will use to measure how similar words and phrases are to each other. We use as the ‘corpus’ the entirety of English-language Wikipedia, which contains all words in the English language. With this corpus, the ‘input data’ are the sentences and paragraphs of Wikipedia, and we compute the semantic similarity between any two words based on the frequency with which they appear together in either sentences or paragraphs, a common procedure in computational linguistics. This procedure results in a vector space (we use 200 vectors), with each phrase (we use three-word phrases, or ‘trigrams’) being represented by weights on each one of these vectors. The vector weights are chosen by a typical machine-learning algorithm that iteratively selects these vector weights so as to accurately predict which phrases are near each other (in the same sentence or paragraph) in Wikipedia.

We use these vector weights to compute the similarity between all three-word phrases in our ads and the ageist stereotypes drawn from the industrial psychology literature. Using the vector weights for these computed from the Wikipedia corpus (which can always be created by breaking down phrases into words), we calculate the cosine similarity (CS) score between these trigrams from the job ads and each stereotype, defined as:

(1)$${\rm CS}( {{\rm trigram}, \;\;{\rm stereotype}} ) = \displaystyle{{{\rm dot\;product}( {{\rm trigram}, \;\;{\rm stereotype}} ) } \over {\Vert {{\rm trigram}} \Vert \Vert {{\rm stereotype}} \Vert }}$$

where ‘trigram’ and ‘stereotype’ in the equation refer to the vectors of weights.Footnote ¹⁸

The CS score varies between −1 and 1. A score of −1 means the words never appear in the same sentences or paragraphs in Wikipedia. As the CS score increases, the usage of the words becomes more similar; that is, they are used more often in the same sentences or paragraphs, suggesting that they are often used to discuss the same topic. This is what the literature defines as greater semantic similarity. If the words coincide perfectly, the CS score equals 1.

As an example, Figure 1a shows the distribution of CS scores of all three-word phrases (trigrams) with a particular stereotype; the distribution is centered above zero, which makes sense since we are looking at the text from job ads. To provide some examples, referring to the panel for communications skills, trigrams at the lower end of the distribution are highly unrelated (such as ‘Christmas season near’, with a score of around −0.3), and trigrams at the higher end are more closely related (such as ‘interactions excellent phone’, with a score of around 0.5). The examples provided in the other panels – for physical ability and technology – similarly show low CS scores for unrelated phrases and high CS scores for related phrases.

Note: (a) Figure reports the distribution of cosine similarity scores for all trigrams from the job ads with the indicated stereotypes. The higher the cosine similarity score, the more related the trigram is to the stereotype, with a minimum of −1 and a maximum of 1. The phrases in the boxes are examples of phrases located at that point in the distribution.

(b) Solid lines indicate the location of a control sentence in the cosine similarity score distribution. Dashed lines indicate the location of a treatment phrase (for the machine-learning treatments shown in Table 2).

(c) The dark points/lines are at the average cosine similarity score of the treatment and control phrases as shown in Table 2, for the indicated stereotype. The height of the right-hand dark point/line in each panel indicates the difference in the perceived ageism of the machine-learning treatment phrases relative to the control phrases for individuals over 50 (Table 4, column 9).

We use the list of words and phrases from our job ads to construct our treatment sentences. We iteratively edited the sentences to ensure that only the CS score of the manipulated stereotype substantively differed between the treatment and control sentences, whereas the CS scores for the other stereotypes listed in Table 1 (including the other two treatment stereotypes) were similar for the treatment and control sentences. For example, if the treatment language related to communication skills was also highly related to the stereotype about personality, we identified which words in the sentence were highly related to personality, and then we selected synonyms that were less related to personality. Our control sentences were created to express requirements for similar jobs without referring to ageist stereotypes about skills or abilities. We iteratively modified words and phrases that were highly related to our stereotypes to minimize the semantic similarity. The resulting sentences, for the treatment and control groups, are listed in columns (3) and (4) of Table 2, along with their CS scores with the stereotype.

Figure 1b repeats the histograms of CS scores from Figure 1a, but now overlaying the positions of these control and treatment phrases. The figure shows that the treatment phrases are a good deal higher in the distributions than the control phrases, indicating that the treatment phrases are more semantically similar to the stereotypes. Figure 1b and Table 2 illustrate that our experimental results based on the treatment phrases derived from our machine learning will not be ‘contrived’ results that pertain to unusual job-ad language designed to evoke the responses we find. In contrast, we are testing whether subtle shifts in language, which echo actual and reasonable job-ad language variation, affect perceived age bias that could in turn influence job application behavior.

We also use a second stereotype treatment that conveys bias by using ageist language identified by the AARP as related to communication skills, physical ability, and technology skills. We select three AARP examples that best correspond to our respective stereotypes: ‘cultural fit’, ‘energetic person’, and ‘digital native’ (Brenoff, Reference Brenoff2019; Terrell, Reference Terrell2019). We adapted the language to fit our job ads and created three sentences, one for each stereotype (Table 2, column (5)). Using the text about cultural fit, we created the sentence ‘You must be up-to-date with current industry jargon and communicate with a dynamic workforce’ to reflect stereotypes about communication skills, emphasizing the communication aspect of fitting in. Using the text about energetic persons, we created the sentence ‘You must be a fit and energetic person’ to reflect stereotypes about physical ability. Using the text about digital natives, we created the sentence ‘You must be a digital native and have a background in social media’ to reflect stereotypes about technology skills by emphasizing social media.

We thought it useful to use these blatant examples of stereotyped language suggested by AARP as a way of validating our methods and potentially learning more about perceptions of job-ad language related to age stereotypes. One might think that if these phrases are not identified as ageist, it is less likely that our more subtle sentences would be. If these blatant phrases were not perceived as ageist, then absence of evidence that the age-stereotyped phrases generated by the machine learning would point more to an uninformative experiment than to a true lack of perception of age bias in these latter phrases. Conversely, at the other extreme, the experimental responses to the AARP phrases might give a sense of the upper bound of perceived stereotyping we could expect. With reference to the earlier discussion of whether the job-ad language reflects outright age discrimination/bias or stereotyping, one might regard the AARP language as more clearly reflecting discrimination/bias.

On the other hand, note that – as shown in Table 2 – in every case, the CS score with the stereotype for the machine-learning phrase is higher than for the AARP language. This does not necessarily mean that the AARP language is less ageist; rather, the AARP language may be perceived as more ageist, while the language is less directly aligned with a particular ageist stereotype. For example, the AARP language we use for the communications stereotype includes ‘up-to-date’, ‘jargon’, and ‘dynamic’, all of which may reflect ageist stereotypes that are not strongly related to communications. This is not surprising since the AARP stereotypes were not chosen by machine learning with the goal of high semantic similarity with a specific stereotype and not with others. Indeed, as we noted, the AARP language used for communications is in fact described as ‘cultural fit’, which could be much broader.

It is also true that the treatment phrases for some stereotypes have higher semantic similarity with the targeted stereotype than do others. In particular, the CS score is always lowest for the treatment phrase corresponding to technological skills than for the treatment phrases corresponding to the other two stereotypes. However, the score is also lower for the technological skills control phrases. Thus, the impact of the difference between treatment and control phrases may not be expected to differ as much across stereotypes (if, in fact, they are perceived as ageist).

4.3 Validating the treatment vs. control differences

While the AARP language is blatant and should be perceived as ageist by workers, a first question is how well the stereotyped vs. neutral sentences generated from the machine-learning results lead to ads that convey the intended stereotypes. In the language of epidemiology, we would like our treatment sentences to have high ‘sensitivity’ (conveying ageist stereotypes) and ‘specificity’ (conveying information about the specific ageist stereotype intended).

To test whether our phrases are powerful enough to be detected in a job ad, we embedded the treatment and control sentences in job-ad templates we created to correspond to actual job ads. In particular, we created 12 templates per occupation using actual ads collected in Neumark et al. (Reference Neumark, Burn and Button2019a, Reference Neumark, Burn, Button and Chehras2019b) as a guide to creating our experimental job ads. For creating these templates, we supplemented the sample of ads from the correspondence study with recent real ads posted on the same job boards used in that study, to capture contemporaneous patterns of behavior. Figure 2 provides a few examples, and online Appendix A provides the full set of job-ad templates.

Figure 2. Job ad examples.

The treatment and control ads differ in the job requirements (denoted in bold with asterisks in each template), with three sentences assigned to be either a treatment phrase (stereotyped) or a control phrase (not stereotyped). As explained above, the requirements we manipulate have to do with a candidate's communication skills, physical ability, and technology skills. Our control phrases express job requirements that are also appropriate for the job but use age-neutral language not related to these age-stereotyped skills or abilities, while our treatment phrases use language highly related to these ageist stereotypes.

Figures 3–5 illustrate how the semantic similarity differs across the templates for the treatment and control job ads – for ads based on the machine-learning treatment phrases and the control phrases – and show that our treatment job ads do activate the intended stereotypes. Information on the distribution of all phrases found in the actual approximately 14,000 collected job ads is shown in grey, information for the ads with the treatment job-ad language is shown with dashed black lines, and information for the ads with the control neutral job-ad language with solid black lines. The figures show the median to 99th percentile range of the distribution of semantic similarity scores with the different stereotypes and the average (with plotting symbols).Footnote ¹⁹ We show these for the three stereotypes we study, and then averaged across the other stereotypes.Footnote ²⁰

Figure 3. Cosine similarity scores of administrative assistant templates (based on machine-learning treatments and the controls).

Note: Graphs display median to 99th percentile range of trigram semantic similarity scores for stereotypes for Administrative Assistant ads. The average trigram semantic similarity score for each stereotype is represented by the respective shape for each template. The category ‘Other’ is the average of the remaining stereotypes listed in Table 1. Control (‘neutral’) templates contain trigrams from the created ad templates with only non-stereotyped phrases included. Collected ads comprise trigrams from all Administrative Assistant job ads. Treatment templates contain trigrams from the created ad templates with the respective stereotyped phrase or phrases included.

Figure 4. Cosine similarity scores of retail sales templates (based on machine-learning treatments and the controls).

Note: Graphs display median to 99th percentile range of trigram semantic similarity scores for each stereotype for retail sales ads. The average trigram semantic similarity score for each stereotype is represented by the respective shape for each template. The category ‘Other’ shows the averages for the remaining stereotypes listed in Table 1. Control (‘neutral’) templates contain trigrams from the created ad templates with only non-stereotyped phrases included. Collected ads comprise trigrams from all retail sales job ads. Treatment templates contain trigrams from the created ad templates with the respective stereotyped phrase or phrases included.

Figure 5. Cosine similarity scores of security guard templates (based on machine-learning treatments and the controls).

Note: Graphs display median to 99th percentile range of trigram semantic similarity scores for each stereotype for security guard ads. The average trigram semantic similarity score for each stereotype is represented by the respective shape for each template. The category ‘Other’ is the average of the remaining stereotypes listed in Table 1. Control (‘neutral’) templates contain trigrams from the created ad templates with only non-stereotyped phrases included. Collected ads comprise trigrams from all security guard job ads. Treatment templates contain trigrams from the created ad templates with the respective stereotyped phrase or phrases included.

These figures display a few key results. First, the biased (treatment) job ads have considerably higher 99th percentiles of the semantic similarity scores with the targeted stereotypes than do the control job ads, as well as higher mean scores (and median scores, although less so). For example, this is apparent in Figure 3, looking at the bars and symbols for the physically able stereotype in the upper left-hand panel, the bars and symbols for the technology stereotype in the upper right-hand panel, and the bars and symbols for the communication stereotype in the lower left-hand panel. (In these three panels, we manipulate only the indicated stereotype, using the neutral language for the other two.) On the other hand, for the remaining stereotypes – the ones we do not manipulate – the control/neutral templates, treatment templates, and collected ads generally have similar medians, means, and 99th percentiles; see, for example, the bars for other in the upper right-hand panel in Figure 3. The implication of the differences in the means and especially the upper tails of the distributions is that the ads we write using the treatment sentences do, in fact, create ads with notably stronger age stereotypes for the ‘target’ stereotype we are trying to convey. But our manipulated treatments do not create similarity with the other stereotypes, as shown by the distribution of all other stereotypes in the job ads (besides the three we are trying to study). That is, our treatment ads only generate a shift in similarity for the stereotypes we are seeking to activate, hence isolating those stereotypes in the job ads.

This key result is also apparent from the lower right-hand panel in each figure, where we use the treatment ads in which we manipulate all three stereotypes at once. If one compares the bars and symbols for any of the three manipulated stereotypes in this panel to the corresponding bars and symbols in the first three panels, the results look almost identical. Again, this reinforces the conclusion that machine learning-generated semantic similarity scores are powerful enough to pick up the presence of stereotyped language, even when only one or a few sentences in the ad are actually related to the ageist stereotype.

In addition, the treatment effect is accentuated by using the control ads rather than simply using all the collected ads because the control ads are more neutral than the full set of collected ads. This difference is evidenced by the much lower values of the 99th percentiles for the control ads than for the collected ads for each of the stereotypes we manipulate, but not for the other stereotypes; for example, compare the bars for physically able and for other in the upper left-hand panel of Figure 3.

5. Experimental evidence

Our final step, and the key new contribution of this paper, was to conduct an experiment using Amazon MTURK to measure whether and to what extent job-ad language using phrases with high CS scores with age-related stereotypes – especially those phrases identified by machine-learning methods – are perceived as ageist by potential job applicants, including older applicants.

We recruited participants through the Amazon MTURK online platform. We restricted the sample to US residents. To guarantee that the median age of the sample was roughly 50, we used age-based quotas with a third of the sample in each of the following age bins: 25 to 35, 45 to 55, and 55+. Because the age bins are pre-set by Amazon MTURK, and MTURK's age data may not be up-to-date, we ask participants to self-report their age. The bins we use to collect self-reported age (25 to 35, 35 to 50, and 50+) broaden the MTURK bins to cover a gap in the age distribution and adjust the highest cutoff to 50, in line with our benchmark age for older workers in the survey. We did not balance the recruitment sample on race or gender.Footnote ²¹

Our experiment consisted of two parts, the baseline survey and the incentivized survey, which were conducted using Qualtrics. In the baseline survey, we recruited 50 respondents who met our criteria and passed the manipulation checks. The responses from the baseline survey were used solely to provide ‘correct’ answers for the next step when we incentivize our second group of respondents to predict the responses of this first group. For the incentivized survey, we recruited 151 respondents who met our criteria and passed the manipulation checks.Footnote ²²

In Table 3, we report the self-reported demographic composition of our sampled MTURK respondents (for the incentivized survey). The sample is relatively more-educated, white, and female than the US population as a whole. Consistent with the age-based quotas we set for recruiting participants, the sample is also relatively older, which may explain some of the differences in the other demographic characteristics.Footnote ²³ In line with our target, we generated a sample with a median age near 50, with roughly 55% of participants over that threshold.

Table 3. Demographics of MTURK sample

Note: MTURK participants self-reported their demographic characteristics. Respondents who selected two or more of the race and ethnicity categories were grouped into the ‘Two or more’ group.

5.1 Baseline survey

The baseline survey had three separate blocks of questions. In the first block of questions, subjects were asked to give their informed consent to participate in the survey (online Appendix Figure C2). In the second block, participants were shown a series of job requirements and told they were from job ads posted online. For each requirement, respondents were asked whether they personally agreed or disagreed with the statement that ‘[Treatment or control requirement] is biased against workers over 50’.Footnote ²⁴ Moreover, participants are told that the study in which they are participating ‘explores the effects of ageist language or age stereotypes in job ads, to understand how this affects who applies for jobs’.Footnote ²⁵ This language frames the survey in the context of job search – importantly, whether job seekers would perceive job-ad language as biased against older workers.

Within each of the blocks that asked about whether or not phrases were perceived as biased, there were three pages for each respective stereotype: communication, physical, and technology. All the treatment and control phrases were tested on their respective pages, but the pages were not labeled by stereotypes that respondents viewed. Respondents had to proceed sequentially through the pages and could not go back or skip between them. The ordering of the questions was randomized on each page. We adopted this approach to force respondents to compare treatment and control phrases in a specific stereotype category without prompting them about the specific age-related stereotype in question. The order of questions on a page was randomized for each respondent. Responses to the questions were given on a Likert scale with the options: strongly agree, somewhat agree, neither agree nor disagree, somewhat disagree, strongly disagree.

In the third block, we concluded by collecting the demographic characteristics of our sample (online Appendix Figure C3). They were asked to report their age, gender, education, and race/ethnicity.

5.2 Incentivized survey

The second survey was split into five blocks of questions. The first asked the participants to give their informed consent. The second block of questions repeated questions from the baseline survey and asked respondents about their own opinions on job requirements.

The next two parts formed the crux of the incentivized survey. In each part, the respondents were asked to guess how the participants in the baseline survey rated the job requirements, and they were rewarded for how correctly they guessed.Footnote ²⁶ Before starting each of these blocks of questions, respondents were sent to a landing page that emphasized the new prompt and cash incentive.

The third set of questions asked respondents to predict what the previous survey respondents answered when they were shown job requirements from the job ads.Footnote ²⁷ For each requirement, they were asked whether they thought previous respondents agreed or disagreed with the statement that ‘This job requirement is biased against workers over 50’. They were shown the same Likert scale shown to the baseline respondents. Respondents were informed that either the third set of questions in the survey or the fourth (described below) would be randomly selected for a cash incentive based on their answers to the questions in that part. They were told they would earn bonus pay, which was to be calculated based on how close they were to the correct answer. If they correctly predicted what the average participant said, they would earn $0.32 per question.Footnote ²⁸ Incorrect answers received less money, with the penalty increasing the further they were from the correct answer. Payouts were calculated using the quadratic scoring rule

(2)$$P_{iq} = 0.32-0.02 \times ( \overline {A_q} -A_{iq}) ^2, \;$$

where P _iq was the payout to individual i for question q based on the average response to question q by previous respondents ($\overline {A_q}$) and their answer about how previous respondents answered question q (A _iq).Footnote ²⁹ All the treatment and control phrases were tested on their respective pages, but the pages were not labeled by stereotypes that respondents viewed. Respondents had to proceed sequentially through pages and could not go back or skip between them. The question order was randomized within each page.

The fourth block of questions differed from the third in that we asked respondents to guess how the older participants in the baseline survey (those over 50 years old) responded.Footnote ³⁰ Before starting this block of questions, respondents were sent to a landing page that emphasized both the scoring rule and the age of respondents for whom they were guessing. The instructions read, ‘For each requirement, please state whether you think previous respondents over the age of 50 agree or disagree with the statement that “This job requirement is biased against workers over 50”’. The structure of this section was identical to the third block of questions.

5.3 Analyses

To test for differences in how the treatment and control phrases were perceived, we employ a series of regression models. We begin with a simple regression testing for differences in respondent beliefs (self-beliefs and beliefs about other respondents) between treatment and control phrases conditional on observable characteristics:

(3)$$A_{iq} = \alpha + \beta T_{iq} + X_i\delta + \varepsilon _{iq}\;$$

The ranking that individual i gave to question q (A _iq) is the dependent variable. We include controls for gender, level of education, race, and age (X _i). If respondents view the treatment phrases as more biased against older workers than the control phrases, we should find that β is less than zero, because the responses (A _iq) range from 1 for strongly agree to 5 for strongly disagree. If we observe β is positive, then this is evidence that treatment phrases were rated as less biased by respondents. In Equation (3), and all subsequent regressions, the standard errors are clustered at the respondent level.

Our next step is to explore two forms of heterogeneity in our estimated treatment effects.Footnote ³¹ The first examines whether respondents view treatment phrases differently depending on the stereotype to which the requirement is related. To test this, we define dummy variables for each of our three pairs of a stereotype treatment and the corresponding control (omitting one from the regression), and interactions between these dummy variables and the dummy variables for each stereotype treatment (communication skills, physical ability, or technology skills):

(4)$$\eqalign{A_{iq} = & \alpha + \beta _1( {T_q \times ComSkill_q} ) + \beta _2( {T_q \times PhysAb_q} ) + \beta _3( {T_q \times TechSkill_q} ) \cr & +\gamma _1( {PhysAb_q} ) + \gamma _2( {TechSkill_q} ) + \delta X_i + \varepsilon _{iq}\;} $$

Thus, the coefficients β _i, i = 1, 2, 3, capture how biased respondents view the treatment phrase for the stereotype relative to the control phrase.

The second form of heterogeneity we examine is the difference between the machine learning-derived treatment phrases and the AARP treatment phrases. To do this, we add an additional interaction for when the treatment phrase is the AARP phrase:

(5)$$\eqalign{A_{iq} = & \alpha + \beta _1( {T_q \times ComSkill_q} ) + \beta _2( {T_q \times PhysAb_q} ) + \beta _3( {T_q \times TechSkill_q} ) \cr & + \theta _1( {T_q \times ComSkill_q \times AARP_q} ) + \theta _2( {T_q \times PhysAb_q \times AARP_q} ) \cr & +\theta _3( {T_q \times TechSkill_q \times AARP_q} ) \cr & + \gamma _1( {PhysAb_q} ) + \gamma _2( {TechSkill_q} ) + \delta X_i + \varepsilon _{iq}} $$

The coefficients θ _i, i = 1, 2, 3, identify how much more biased respondents view the AARP treatment phrases relative to the machine learning-derived treatment phrases for the same stereotype. More importantly, perhaps, only with additional interactions added do we get separate estimates of the treatment effects that exclude the AARP language – i.e., the phrases based on machine learning from the job ads. Their effects are identified from the estimates of β _i, i = 1, 2, 3, in Equation (5). This is potentially important because the AARP phrases may be distant from what would be viewed as normal or acceptable job-ad language.

6. Results

Figure 6 provides a graphical depiction of the answers from the MTURK survey participants. Across the three blocks of the survey that solicited respondents' self-assessments of age-bias, their predictions of previous' respondents' answers, and their predictions of the answers of respondents over the age of 50, our results were consistent. The participants, on average, strongly disagreed with the notion that anyone would perceive the control phrases as biased against workers over the age of 50. Respondents rated the physical and technology-biased phrases derived from our CS score index as more biased than the control phrases, but viewed the communication skills stereotyped phrases as roughly identical to the controls. Opinions of the AARP-derived treatment phrases were starker, as all three were rated as far more age biased than their respective control counterparts.

Figure 6. Survey results.

Note: These numerical ratings reflect the degree to which survey respondents rated phrases as age-biased or not age-biased, with lower numbers indicating a greater bias against older workers. Likert scale ratings were translated to a numerical value such that ‘strongly agree’ mapped to 1, ‘somewhat agree’ mapped to 2, ‘neither agree nor disagree’ mapped to 3, ‘somewhat disagree’ mapped to 4, and ‘strongly disagree’ mapped to 5. The three categories: ‘self’, ‘others’, and ‘over 50’, refer to which group's opinions the MTURK respondents were asked to provide or predict in a given survey block. The average bias rating was collapsed on the treatment status of phrases (control, treatment, and AARP) as well as the category of the stereotype (communication, physical, or technology). Hence, each point in the figure reflects the average bias rating MTURK respondents gave to a given treatment status for a specific stereotype from the perspective of a given group of people. For example, the triangle in the first row of the figure indicates that when respondents were asked for their self-assessment of whether or not the physical stereotype control phrases were age-biased, they, on average, stated that they strongly disagreed.

The absence of evidence for bias for the language related to communication skills may reflect the fact that older workers are not always stereotyped as having worse communication skills, but are sometimes, as Table 1 showed, perceived as having better communication skills. In that sense, one might view the evidence of ageist ratings for the physical ability and technology-related stereotypes but not the communications stereotype as further confirmation that respondent perceptions accord with the industrial psychology literature. (Note that the CS scores from the machine learning do not detect positive vs. negative uses of the language.)

In Table 4, we estimate regression models for the survey responses that delve into more detail. In all cases, standard errors are clustered at the respondent level. In column (1), we report the estimated coefficient from a simple model of the responses for self-beliefs on a dummy variable for whether the response is to any treatment phrase. The estimated coefficient of −0.886 implies that responses are lower by almost one category of the Likert scale. Recall that the responses range from 1 for strongly agree to 5 for strongly disagree, so a negative estimate implies the phrase was perceived as more ageist. The estimate is strongly statistically significant. To help interpret the magnitude, the third number (in square brackets) reports the implied effect in terms of standard deviations of the responses.Footnote ³²

Table 4. Differences in beliefs by treatment (negative implies more biased against older workers)

Note: * indicates statistical significance of the coefficient. *p < 0.1, **p < 0.05, ***p < 0.01. † indicates significant differences between the same coefficients in the Beliefs about all respondents and the Self-beliefs models. †p < 0.1, ††p < 0.05, †††p < 0.01. In each regression, we include a constant and controls for gender, level of education, race, and age (not reported). Numbers in parentheses are robust standard errors clustered at the respondent level; numbers in brackets are coefficients normalized to standard deviations of the outcome variable. Negative numbers indicate higher levels of perceived bias against older workers, as the outcome ranges from 1 for ‘strongly agree’ to 5 for ‘strongly disagree’. The row labeled, e.g., Treatment × physical ability, reports the estimated effect of the machine-learning generated phrase for this stereotype, while the row labeled Treatment × physical ability × AARP reports the incremental impact of using the AARP treatment instead. In other words, for the machine-learning treatment only the first interaction variable equals one, while for the AARP treatment both interaction variables equal one.

Column (2) expands the specification to differentiate the treatment by the type of stereotype – communications, physical ability, or technology skills – without differentiating the machine-learning phrases from the AARP phrases. We find significant negative effects (implying more ageist phrases) for all three, with the largest estimate (whether looking at the coefficient or the standard deviation effect) for physical ability, followed by technology skills, and the smallest estimate for communications. In this model, we also include controls for the different stereotype phrases (whether treatment or control) so that the treatment × stereotype interactions measure the differences relative to the paired control phrase.

Column (3) expands the specification to differentiate between machine learning and AARP treatment phrases. In this column, the estimated coefficients of the treatment × stereotype coefficients measure the effects of the machine-learning phrases, and the treatment × stereotype × AARP estimates measure the differential effects of the AARP treatments relative to the machine-learning treatments. We see that in every case the estimated effects of the AARP treatments are larger, in the direction of more perceived bias. All of the estimated differences for the AARP phrases are statistically significant, and the magnitudes are considerably larger for the communications and technology skills stereotypes. For the machine-learning stereotypes, the estimated treatment effects for physical ability and technology skills are sizable, significant, and negative, while the effect for communications is near zero and insignificant (paralleling what we found in the raw data). On the one hand, this suggests that the phrases generated by machine learning for communications stereotypes do not evoke ageism as strongly, while the phrases in the AARP treatment do; on the other hand, recall the earlier caution that the stronger evidence for the AARP treatment for communications may not isolate the communications stereotype well.

Columns (4)–(6) report estimates of the same specifications, but for the beliefs about other respondents – i.e., how respondents think others would perceive the language. The qualitative pattern of estimates is the same, but the estimated impacts of the treatments are generally larger. This can be seen most simply by comparing the estimated treatment effects between columns (5) and (2). The estimated coefficients are substantially larger – especially for the physical ability and technology skills stereotypes – and in all three cases, the differences are strongly statistically significant (as indicated by the ‘daggers’). Comparing columns (6) and (3) indicates that the differences between self-beliefs and perceived beliefs of others for the physical ability and technology stereotypes are driven by the machine-learning phrases, as their estimated coefficients are substantially larger in column (6) than in column (3), whereas the estimated interactions with the AARP phrases are not uniformly larger or smaller. As noted earlier, the differences in responses for perceived beliefs of others and self-beliefs may reflect the incentives we offered in eliciting the former, which could counter social desirability biases.

Finally, columns (7)–(9) focus on the perceived beliefs of those over age 50. These estimates are similar to those in columns (3)–(6), suggesting that respondents did not particularly believe that older individuals were more likely to perceive the language as ageist.Footnote ³³ Of course, given that the stereotyped language was perceived as ageist, the impact on behavior would likely be stronger the older is the person reading job ads with these phrases.

Our last analysis compares the perceptions (self-beliefs, or of others) to the CS scores for the phrases we use (see Table 2), providing a useful graphical depiction of our survey results that captures many of our key points. Figure 7a does this for self-beliefs. Note the vertical axis is decreasing in the reported belief because a lower number implies stronger perceived ageism. Consider first the points plotted for control phrases. Referring back to Table 2, there are two control phrases for physical ability and two for communications, so there are two circles for each of these. But there are three circles for technology skills, for which there are three control phrases. The horizontal axis measures the CS scores for these, and they are clustered towards zero. The vertical height measures the perceived bias of these phrases, and – by design – they are low.Footnote ³⁴ The triangles are for the machine-learning phrases. As Table 2 showed, these generally have the highest CS scores with the corresponding stereotypes. The squares are for the AARP stereotypes, which generally have lower CS scores; there are only three of these plotted because there are only three phrases. Comparing the height of the triangles to the squares, we see that the AARP phrases are generally perceived as more biased, even though the CS scores with the stereotypes are lower.

Figure 7. Scatterplot of self-beliefs of age bias and cosine similarity (CS) scores. (a) Self-beliefs. (b) Others' perceptions. (c) Others' over 50s' perceptions.

Note: CSS, cosine similarity score. Figure plots MTURK respondents' average perceptions of age bias against CSS stereotype ratings from Table 2. Lower numbers on the y-axis indicate higher levels of perceived age bias. Higher CSS scores on the x-axis indicate higher average levels of semantic similarity of a phrase with its respective stereotype. Circular, triangular, and square markers represent control phrases, CSS treatment phrases, and AARP treatment phrases, respectively. Black (solid), blue (shaded), and red (unshaded) markers represent physical, technology, and communication phrases, respectively.

Figures 7b and c show the same kind of evidence of beliefs about others' perceptions in general, and then for others over age 50. The qualitative patterns are the same, but Figures 7b and c, in comparison to Figure 7a, show the stronger perceived bias reported when asked about others' perceptions. This is apparent in Figures 7b and c, for both the machine-learning phrases (triangles) and the AARP phrases (squares); there is no apparent shift for the control phrases. This has an interesting potential implication. If an older job applicant thinks others will perceive job-ad language as more age biased than the individual herself perceives the language as age biased, they might expect less competition from older applicants, which might boost the likelihood the person applies for a job relative to the case where her perceptions were the same as those she ascribes to others. This does not mean the age-stereotyped language will not deter the older applicant from applying for a job, but it does mean that the greater perceived age bias by others may mitigate the effect.Footnote ³⁵

6.1 Assessing actual job ads

To help the reader contextualize our results, we now connect the results to real job ads in a more concrete way. In particular, in Figure 1c, we return to the distributions of CS scores from the job ads, but we overlay, for each of the three stereotypes, the average treatment and control CS scores (for the machine-learning treatments), and we also show the estimated effect of the treatments on perceived ageism. The panels in this figure give a sense of how one might, in principle, relate our results to actual job-ad language in a set of job ads, by showing how our experimental manipulation and the effects of that manipulation are related to a large set of actual job ads. For example, language with CS scores near or above the levels of our experimental treatments could be flagged as potentially indicating discriminatory behavior; and as discussed earlier, our treatment relates to actual and reasonable job-ad language, not extreme phrases that are blatant (and perhaps very unlikely to be used). To be clear, though, we would not advocate for this being viewed as definitive evidence of discrimination, but rather – at most – as a potential indicator for further investigation into actual hiring behavior.

7. Conclusions and discussion

In this paper, we explore whether the type of ageist stereotypes used in job ads is detectable using machine-learning methods and whether this language is perceived as biased against older workers searching for jobs. This is important for three reasons. First, workers may respond to this language, with older workers applying to a narrower set of jobs or perhaps choosing not to apply at all, hence diminishing their job market opportunities. Second, the mechanism we study is plausible, as employers who want to discriminate against older workers but also want to avoid getting caught might manipulate job-ad language to discourage older workers from entering the applicant pool. And third, if ageist job-ad language can be detected by machine-learning methods, then these methods could, in principle, be used to help enforce anti-discrimination laws by helping flag job ads by employers who may be more likely to be engaging in age discrimination.

We use machine-learning methods to identify phrases in job ads that are linguistically related to standard ageist stereotypes from the industrial psychology literature. We use these phrases to construct typical job-ad language that reflects specific age stereotypes. We show that machine-learning methods are sensitive enough to detect the presence of stereotyped language, even when only one sentence in the job ad is highly related to the ageist stereotype. We then conduct an MTURK experiment that asks whether respondents perceive this job-ad language – which the machine-learning algorithm classifies as related to ageist stereotypes – as ageist. We also used some more blatant ageist phrases identified by AARP.

Our experimental evidence shows that sentences that are classified as closely related to ageist stereotypes by the machine-learning algorithm are generally perceived as ageist by respondents in our MTURK experiment (and more so when asked, with incentives, how they will be perceived by others). These results imply that the different age stereotypes we study capture real ageist sentiments and will be perceived as such by job applicants. As such, the use of ageist stereotypes in job ads may discourage older workers from applying for jobs, with similar adverse effects on employment of older workers as direct hiring discrimination by employers.

Although the AARP phrases were perceived as more ageist than those generated by our machine-learning methods, the latter were more directly and more distinctly related to specific ageist stereotypes. This is potentially significant from a policy perspective, as it implies that machine learning can be used to identify ageist stereotypes in job ads that pertain to specific stereotypes. Because the legality of an age stereotype in a job ad might hinge on whether the language pertains to a job requirement based on an RFOA, it is important to be able to ascertain the type of job requirement to which the language might refer. In contrast, the AARP phrases we use, while perceived as more ageist, are harder to tie to specific stereotypes and hence, perhaps, to specific job requirements. This distinction may be useful with regard to the issues of what our evidence implies about underlying behavior and how our evidence might assist in enforcing age discrimination laws. It seems safe to say that job-ad language with the same ageist flavor of the AARP phrases will fairly reliably help to identify employers engaging in illegal age discrimination. In contrast, job-ad language identified by machine-learning methods should be interpreted more cautiously, as it could reflect legitimate job requirements, and may not have as adverse an impact on older workers looking for jobs.

More generally, though, the machine-learning methods could be helpful in flagging potentially discriminatory behavior, providing the EEOC (or state agencies) with an additional tool in identifying potential discriminators, above and beyond current enforcement mechanisms that rely on worker-initiated complaints focused on hiring outcomes. To be sure, language flagged as ageist – especially when not overly blatant – should not be viewed, in and of itself, as indicating age discrimination, because the language could reflect actual job requirements that are correlated with age. Rather, such language could prompt additional statistical investigation, focused on two questions: First, are employers who use such job-ad language less likely to hire older workers who nonetheless apply, which could indicate statistical discrimination based on assumptions that older workers cannot meet the stated job requirements, or simple taste discrimination? Second, are the stated job requirements relevant to the job, or perhaps instead just used to discourage older workers from applying? In addition, this research can help inform EEOC guidance to employers on job-ad language to avoid potential age discrimination.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S1474747222000270.

Acknowledgements

This research was supported by the Sloan Foundation. Any views expressed are our own and not those of the Sloan Foundation. This experiment was approved by the UCI Institutional Review Board, HS# 2015-2107. We are very grateful for helpful comments from Thiemo Fester, Joanna Lahey, and Andrea Tesei.

Footnotes

¹ For example, Neumark et al. (Reference Neumark, Burn and Button2019a) study artificial applicants aged 29–31 (younger), 49–51 (middle-aged), and 64–66 (older). For women, there is a distinct pattern of the highest callback rates for the younger applicants, lower for the middle-aged applicants, and lowest for the older applicants. Compared to young applicants, the callback rate for older female applicants for administrative jobs was 47% lower (7.58% vs. 14.41%). In sales, the difference was a bit smaller – a 36% lower callback rate. For male job applicants in sales, security, and janitor jobs, there was a lower callback rate for older applicants than younger applicants, although the age pattern is not as consistent or pronounced across the three age groups.

² The job ads were collected as part of a large-scale correspondence study of age discrimination (Neumark et al., Reference Neumark, Burn and Button2019a). The present research, in turn, was used to develop a field experiment on how actual job applicants respond to ageist language in job ads (Burn et al., Reference Burn, Firoozi, Ladd and Neumark2022b).

³ Moreover, this would be a reasonable expectation, given the correspondence-study evidence from prior work that the kinds of age-stereotyped phrases from the job ads that we use help predict age discrimination by employers (Burn et al., Reference Burn, Button, Munguia Corella and Neumark2022a).

⁴ See Kleber v. Carefusion Corp. (http://www.aarp.org/content/dam/aarp/aarp_foundation/litigation/pdf-beg-02-01-2016/kleber-amended-complaint.pdf, viewed 8 November 2017). Surprisingly, the court ruled in favor of the defense in this case, reaching a new interpretation that the ADEA does not authorize job applicants to bring a disparate impact claim (Button, Reference Button, Czaja, Sharit and James2019).

⁵ Admittedly, the language in Kleber v. Carefusion Corp. was from a real-world job ad but was not subtle; but this is an extreme exception.

⁶ In legal cases, the most compelling data on hiring discrimination comes from comparing hiring rates of the group in question (e.g., older workers) relative to the applicant pool. Hiring charges under the U.S. Age Discrimination in Employment Act (ADEA) made up nearly 5% of total ADEA charges in 2020 – more than double the percentage under Title VII (protecting women, minorities, etc.) or the Americans with Disabilities Act. (This is based on authors' computations using EEOC statistics available at https://www.eeoc.gov/statistics/statutes-issue-charges-filed-eeoc-fy-2010-fy-2020, viewed 18 January 2022.) The representation of hires among applicants is important in anti-discrimination enforcement, as the EEOC uses a ‘4/5ths’ rule (the ratio of the selection rate for the group in question to the group with the highest selection rate) as ‘a practical means of keeping the attention of the enforcement agencies on serious discrepancies in rates of hiring, promotion and other selection decisions’ (U.S. Equal Employment Opportunity Commission, 1979).

⁷ European Union law also bars age discrimination. To the best of our knowledge it is less explicit about the forms of discrimination barred, and it also differs in not protecting older workers per se, but rather barring discrimination based on age generally. See Lahey (Reference Lahey2010) and European Commission (2000).

⁸ See https://www.eeoc.gov/how-file-charge-employment-discrimination.

⁹ If ageist language did less to discourage older workers from applying to discriminatory firms, the ability of the EEOC to identify potential discriminatory behavior from hires relative to applications would be increased.

¹⁰ Though they did not focus on job ads, Hanson et al. (Reference Hanson, Hawley and Taylor2011) and Hanson et al. (Reference Hanson, Hawley, Martin and Liu2016) study language used by mortgage originators and connect this language to their behavior. Hanson et al. (Reference Hanson, Hawley and Taylor2011) study subtle discrimination through ‘keywords’ used by landlords responding to prospective tenants. Hanson et al. (Reference Hanson, Hawley, Martin and Liu2016) had research assistants subjectively (and blindly) code the helpfulness and other characteristics of mortgage loan originator responses to prospective borrowers.

¹¹ In an early small study, Wax (Reference Wax1948) found that summer resorts in Ontario, Canada, were more likely to discriminate against Jewish customers (based on names) requesting accommodations if they used phrases like ‘restrictive clientele’ in their advertising.

¹² These include Baert et al. (Reference Baert, Norga, Thuy and Van Hecke2016); Bendick et al. (Reference Bendick, Jackson and Romero1997, Reference Bendick, Brown and Wall1999); Carlsson and Eriksson (Reference Carlsson and Eriksson2019); Farber et al. (Reference Farber, Silverman and von Wachter2017, Reference Farber, Herbst, Silverman and von Wachter2019); Lahey (Reference Lahey2008); Neumark et al. (Reference Neumark, Burn and Button2016, Reference Neumark, Burn and Button2019a, Reference Neumark, Burn, Button and Chehras2019b); and Riach and Rich (Reference Riach and Rich2006, Reference Riach and Rich2010).

¹³ See Burn et al. (Reference Burn, Button, Munguia Corella and Neumark2022a) for documentation of the sources used to identify these stereotypes and the larger set of phrases that correspond to them, and additional details about the literature search. Note that a few of these appear as both positive and negative stereotypes about older workers.

¹⁴ In addition, there is no circularity – in the sense of finding preordained results – in studying stereotypes for which we found this correlation. In Burn et al. (Reference Burn, Button, Munguia Corella and Neumark2022a), although we found that employers who discriminate by calling back older applicants at lower rates tend to use language related to the stereotypes we study in the present paper, we do not know why. In particular, in our view the most compelling and interesting hypothesis is that the same employers use this language to try to discourage older workers from applying. Perceptions of job-ad language with these stereotypes as biased against older workers would strengthen the plausibility of this hypothesis, and if the latter happens, then there may be ‘more’ discrimination occurring than what Neumark et al. (Reference Neumark, Burn and Button2019a) detect in the correspondence study, and more generally this form of discrimination may be missed by enforcement mechanisms and practices that focus only on hiring outcomes. Thus, there is direct interest in studying the stereotypes for which we found a relationship with age discrimination in hiring.

¹⁵ Communication skills are one of three stereotypes in Table 1 that appear both positively and negatively. Later, we discuss the implications of this for our evidence.

¹⁶ For example, for the machine learning treatment for communication skills, the phrase for administrative assistants is ‘You must have good communication skills and teamwork on tasks’, while for retail sales the phrase is ‘You must have good communication skills with customers and staff’. In contrast, the empirically age-neutral control phrase – ‘You must be good at working without supervision’ – is the same.

¹⁷ See Burn et al. (Reference Burn, Button, Munguia Corella and Neumark2022a) for a thorough discussion.

¹⁸ The || notation indicates the Euclidean norm – e.g., ||[x, y]^T|| = (x ² + y ²)^1/2.

¹⁹ We think very high percentiles are relevant because they are potentially associated with strongly stereotyped phrases in the job ads, which readers are more likely to notice than potentially very subtle language with lower semantic similarity with ageist stereotypes.

²⁰ We show results for each of the separate stereotypes in online Appendix Figures B1–B3.

²¹ Respondents to our surveys who met the sample restrictions were excluded if they failed a manipulation check as the first step of the surveys. This manipulation check acted as both a Turing test (ensuring our respondents were human) and a check for American English language fluency (to reduce the chance someone has masked their location). All questions were free response to prevent individuals or computer programs from clicking through and getting the right answers by accident. The first three questions test for English understanding by asking respondents to complete the analogy (e.g., Canine is to Dog as Feline is to ___, for which cat, kitten, Cat, or cats would all be correct). One of these questions (Pen is to Whiteout as Pencil is to ___) was designed to elicit different responses between American and UK English speakers. If participants listed ‘eraser’, the correct answer in American English, they were allowed to proceed, while participants who responded with ‘rubber’, the answer in UK English, were excluded. The last question was a free response that asked respondents to write two sentences of at least 140 characters telling us who their favorite band or artist is and why. These were checked after the fact to ensure the participant was paying attention and was capable of writing in coherent English sentences. The manipulation checks overall screened out nine respondents. See online Appendix Figure C1 for the manipulation checks.

²² We recruited 150 respondents initially – 50 in each age bin – but one person completed the survey and then refused to accept payment. Thus, we ended up with 151 subjects because the quota filling on MTURK only counts people who accepted payment.

²³ While the sample is not representative by age, our main interest is in how older job seekers perceive the job ads (or how others think they perceive them), not in population estimates for which weighting would be critical. That said, we also verified that responses were similar by age group. Results are available upon request.

²⁴ An example is shown in online Appendix Figure D1. In principle, an experiment like ours could also ask respondents questions about their views of true correlations of age (or other groups studied) with the worker characteristics captured in the treatment phrases, as opposed to whether the phrases just signal bias.

²⁵ See online Appendix C.

²⁶ As noted earlier, we do this to counter social desirability bias. As reported below, we find some evidence of this, with greater impacts of the machine-learning responses when asked to predict how others would respond.

²⁷ See online Appendix Figure D2.

²⁸ Average pay was $7.22.

²⁹ Respondents were told that their earnings, ‘would be calculated according to the formula: M = $0.32 − $0.02 × (average previous answer − your prediction)²’. To illustrate their payoff, respondents were told they would earn $0.24 if the correct answer was ‘somewhat agree’ and they guessed ‘somewhat disagree’.

³⁰ See online Appendix Figure D3.

³¹ We also examined heterogeneous differences in the treatment phrases across demographic groups. We found little evidence to support the hypothesis that different groups view job requirements differently. The pattern of results observed in Table 4 held by sex, age, education, and race. These results are available upon request.

³² We estimated the specifications in Table 4 using an ordered probit model as well, to account for the fact that our dependent variable is actually ordinal, rather than cardinal. This led to qualitatively very similar results, so we report the OLS results for simplicity. Results available upon request.

³³ As further evidence that younger and older people do not perceive the ageist phrases differently, the estimated effects of the treatments on self-beliefs of respondents aged 50 or under and over age 50 were very similar. Results available upon request.

³⁴ We do this analysis for the mean survey responses, rather than the regression coefficients, because we want to depict these for each occupation and the regressions do not estimate separate effects by occupation.

³⁵ Bursztyn et al. (Reference Bursztyn, González and Yanagizawa-Drott2000) describe a related result in a much different context. In particular, they show that young married men in Saudi Arabia are more supportive of women working outside the home than they think other similar men are. In this case, the authors do an experimental intervention to correct men's beliefs about others' beliefs, and find that doing so – making them aware that other men are more supportive – increases the likelihood that their wives apply for jobs outside the home, suggesting that agents can be responsive to perceptions about others' beliefs. Of course in our case, correcting the apparent misperception of how others perceive job ads could have an adverse effect, because in our context the misperception might encourage older job seekers to apply for jobs.

References

AARP (2000) American Business and Older Employees. Washington, DC: AARP.Google Scholar

Arceo-Gomez, EO and Campos-Vazquez, RM (2019) Double discrimination: is discrimination in job ads accompanied by discrimination in callbacks? Journal of Economics, Race, and Policy 2, 257–268.CrossRef Google Scholar

Baert, S, Norga, J, Thuy, Y and Van Hecke, M (2016) Getting grey hairs in the labour market. An alternative experiment on age discrimination. Journal of Economic Psychology 57, 86–101.CrossRef Google Scholar

Bem, SL and Bem, DJ (1973) Does sex-biased job advertising ‘aid and abet’ sex discrimination? Journal of Applied Social Psychology 3, 6–18.CrossRef Google Scholar

Bendick, M Jr., Jackson, CW and Romero, JH (1997) Employment discrimination against older workers: an experimental study of hiring practices. Journal of Aging & Social Policy 8, 25–46.CrossRef Google Scholar

Bendick, M Jr., Brown, LE and Wall, K (1999) No foot in the door: an experimental study of employment discrimination against older workers. Journal of Aging & Social Policy 10, 5–23.CrossRef Google Scholar PubMed

Brenoff, A (2019) 5 Ageist Phrases to be Aware of. Washington, DC: AARP.Google Scholar

Burn, I, Button, P, Munguia Corella, LF and Neumark, D (2022a) Older workers need not apply? Ageist language in job ads and age discrimination in hiring. Journal of Labor Economics 3, 613–667.CrossRef Google Scholar

Burn, I, Firoozi, D, Ladd, D and Neumark, D (2022b) Help really wanted? The impact of age stereotypes in job ads on applications from older workers. Working Paper No. 30287, National Bureau of Economic Research, Cambridge, MA.CrossRef Google Scholar

Bursztyn, L, González, AL and Yanagizawa-Drott, D (2020) Misperceived social norms: women working outside the home in Saudi Arabia. American Economic Review 110, 2997–3029.CrossRef Google Scholar

Button, P (2019) Population aging, age discrimination, and age discrimination protections at the 50th anniversary of the Age Discrimination in Employment Act. In Czaja, SJ, Sharit, J and James, JB (eds), Current and Emerging Trends in Aging and Work. New York, NY: Springer, pp. 163–188.Google Scholar

Cahill, KE, Giandrea, MD and Quinn, JF (2006) Retirement patterns from career employment. The Gerontologist 46, 514–523.CrossRef Google Scholar PubMed

Carlsson, M and Eriksson, S (2019) The effect of age and gender on labor demand – evidence from a field experiment. Labour Economics 59, 173–183.CrossRef Google Scholar

Chaturvedi, S, Mahajan, K and Siddique, Z (2021) Words matter: gender, jobs and applicant behavior. IZA Discussion Paper No. 14497.CrossRef Google Scholar

Cherry, KE, Allen, PD, Denver, JY and Holland, KR (2015) Contributions of social desirability to self-reported ageism. Journal of Applied Gerontology 34, 712–733.CrossRef Google Scholar PubMed

Combs, DH (1982) Striking a balance between the interests of public safety and the rights of older workers: the age BFOQ defense. Washington and Lee Law Review 39, 1371–1395.Google Scholar

Deming, D and Kahn, LB (2018) Skill requirements across firms and labor markets: evidence from job postings for professionals. Journal of Labor Economics 36, S337–S369.CrossRef Google Scholar

European Commission (2000) Council Directive 2000/78/EC of 27 November 2000 Establishing a General Framework for Equal Treatment in Employment and Occupation. Available at https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32000L0078.Google Scholar

Farber, HS, Silverman, D and von Wachter, TM (2017) Factors determining callbacks to job applications by the unemployed: an audit study. RSF: The Russell Sage Foundation Journal of the Social Sciences 3, 168–201.CrossRef Google Scholar

Farber, HS, Herbst, CM, Silverman, D and von Wachter, TM (2019) Whom do employers want? The role of recent employment and unemployment status and age. Journal of Labor Economics 37, 323–349.CrossRef Google Scholar

Federal Register (n.d.) Disparte Impact and Reasonable Factors Other Than Age Under the Age Discrimination in Employment Act. Available at https://www.federalregister.gov/documents/2012/03/30/2012-5896/disparate-impact-and-reasonable-factors-other-than-age-under-the-age-discrimination-in-employment (Accessed 15 September 2019).Google Scholar

Gaucher, D, Friesen, J and Kay, AC (2011) Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of Personality and Social Psychology 101, 109–128.CrossRef Google Scholar PubMed

Gordon, RA and Arvey, RD (2004) Age bias in laboratory and field settings: a meta-analytic investigation. Journal of Applied Social Psychology 34, 468–492.CrossRef Google Scholar

Hanson, A, Hawley, Z and Taylor, A (2011) Subtle discrimination in the rental housing market: evidence from email correspondence with landlords. Journal of Housing Economics 20, 276–284.CrossRef Google Scholar

Hanson, A, Hawley, Z, Martin, H and Liu, B (2016) Discrimination in mortgage lending: evidence from a correspondence experiment. Journal of Urban Economics 92, 48–65.CrossRef Google Scholar

Hellester, MD, Kuhn, P and Shen, K (2020) The age twist in employers’ gender requests. Journal of Human Resources 55, 482–469.Google Scholar

Johnson, RW, Kawachi, J and Lewis, EK (2009) Older Workers on the Move: Recareering in Later Life. Washington, DC: AARP Public Policy Institute.Google Scholar

Kang, SK, DeCelles, KA, Tilcsik, A and Jun, S (2016) Whitened résumés: race and self-presentation in the labor market. Administrative Science Quarterly 61, 469–502.CrossRef Google Scholar

Kite, ME, Deaux, K and Miele, M (1991) Stereotypes of young and old: does age outweigh gender? Psychology and Aging 6, 19–27.10.1037/0882-7974.6.1.19CrossRef Google Scholar

Kuhn, P and Shen, K (2013) Gender discrimination in job ads: evidence from China. Quarterly Journal of Economics 128, 287–336.CrossRef Google Scholar

Kuhn, P, Shen, K and Zhang, S (2018) Gender-targeted job ads in the recruitment process: evidence from China. NBER Working Paper No. 25365. Cambridge, MA.CrossRef Google Scholar

Lahey, J (2008) Age, women, and hiring: an experimental study. Journal of Human Resources 43, 30–56.10.1353/jhr.2008.0026CrossRef Google Scholar

Lahey, J (2010) International comparisons of age discrimination laws. Research on Aging 32, 679–697.CrossRef Google Scholar PubMed

Levin, WC (1988) Age stereotyping: college student evaluations. Research on Aging 10, 134–148.CrossRef Google Scholar

Maestas, N (2010) Back to work: expectations and realizations of work after retirement. Journal of Human Resources 45, 718–748.CrossRef Google Scholar PubMed

Marinescu, IE and Wolthoff, R (2020) Opening the black box of the matching function: the power of words. Journal of Labor Economics 38, 535–568.CrossRef Google Scholar

McCann, RM and Keaton, SA (2013) A cross cultural investigation of age stereotypes and communication perceptions of older and younger workers in the USA and Thailand. Educational Gerontology 39, 326–341.CrossRef Google Scholar

McGregor, J and Gray, L (2002) Stereotypes and older workers: the New Zealand experience. Social Policy Journal of New Zealand 18, 163–177.Google Scholar

Modestino, AS, Shoag, D and Balance, J (2016) Downskilling: changes in employer skill requirements over the business cycle. Labour Economics 41, 333–347.CrossRef Google Scholar

Neumark, D and Button, P (2014) Did age discrimination protections help older workers weather the Great Recession? Journal of Policy Analysis and Management 33, 566–601.CrossRef Google Scholar

Neumark, D, Burn, I and Button, P (2016) Experimental age discrimination evidence and the Heckman critique. American Economic Review Papers and Proceedings 106, 303–308.CrossRef Google Scholar

Neumark, D, Burn, I and Button, P (2019a) Is it harder for older workers to find jobs? New and improved evidence from a field experiment. Journal of Political Economy 127, 922–970.CrossRef Google Scholar

Neumark, D, Burn, I, Button, P and Chehras, N (2019b) Do state laws protecting older workers from discrimination reduce age discrimination in hiring? Evidence from a field experiment. Journal of Law and Economics 62, 373–402.CrossRef Google Scholar PubMed

Posthuma, R and Campion, MA (2007) Age stereotypes in the workplace: common stereotypes, moderators, and future research directions. Journal of Management 35, 58–88.Google Scholar

Riach, PA and Rich, J (2006) An experimental investigation of age discrimination in the French labour market. IZA Discussion Paper No. 2522. Bonn, Germany.10.2139/ssrn.956389CrossRef Google Scholar

Riach, PA and Rich, J (2010) An experimental investigation of age discrimination in the English labor market. Annals of Economics and Statistics 99/100, 169–185.10.2307/41219164CrossRef Google Scholar

Ryan, EB, See, SK, Meneer, WB and Trovato, D (1992) Age-based perceptions of language performance among younger and older adults. Communication Research 19, 423–443.CrossRef Google Scholar

Schmidt, DF and Boland, SM (1986) Structure of perceptions of older adults: evidence for multiple stereotypes. Psychology and Aging 1, 255–260.CrossRef Google Scholar PubMed

Stewart, MA and Ryan, EB (1982) Attitudes toward younger and older adult speakers: effects of varying speech rates. Journal of Language and Social Psychology 1, 91–109.CrossRef Google Scholar

Terrell, K (2019) Age Bias That's Barred by Law Appears in Thousands of Job Listings. Washington, DC: AARP.Google Scholar

Tilcsik, A (2011) Pride and prejudice: employment discrimination against openly gay men in the United States. American Journal of Sociology 117, 586–626.10.1086/661653CrossRef Google Scholar PubMed

U.S. Court of Appeals, 7th Circuit (1974) Hodgson v. Greyhound Lines, Inc., 499 F.2d 859 (7th Cir. 1974). Available at https://casetext.com/case/hodgson-v-greyhound-lines-inc (Accessed 7 January 2022).Google Scholar

U.S. Department of Labor (1965) The Older American Worker – Age Discrimination in Employment. Report of the Secretary of Labor to the Congress under Section 715 of the Civil Rights Act of 1964. Washington, DC: U.S. Department of Labor.Google Scholar

U.S. Equal Employment Opportunity Commission (1979) Questions and Answers to Clarify and Provide a Common Interpretation of the Uniform Guidelines on Employee Selection Procedures. Federal Register, Vol. 40, No. 43, March 2. Available at https://www.eeoc.gov/laws/guidance/questions-and-answers-clarify-and-provide-common-interpretation-uniform-guidelines (Accessed 18 January 2022).Google Scholar

U.S. Equal Employment Opportunity Commission (n.d. a) Prohibited Employment Policies/Practices. Available at http://www1.eeoc.gov//laws/practices/index.cfm?renderforprint=1 (Accessed 15 September 2019).Google Scholar

U.S. Equal Employment Opportunity Commission (n.d. b) Fact Sheet: Age Discrimination. Available at https://www.eeoc.gov/laws/guidance/fact-sheet-age-discrimination (Accessed 7 January 2022).Google Scholar

van Borm, H, Burn, I and Baert, S (2021) What does a job candidate's age signal to employers? Labour Economics 71, 102003.CrossRef Google Scholar

van Dalen, H, Henkens, K and Schippers, JJ (2009) Dealing with older workers in Europe: a comparative survey of employers’ attitudes and actions. Journal of European Social Policy 19, 47–60.CrossRef Google Scholar

Wax, SL (1948) Discrimination by summer resorts in Ontario. Information and Comment. Montreal, Canada: Committee on Social and Economic Studies of the Canadian Jewish Congress, 7 June, pp. 10–13.Google Scholar

Table 1. Age stereotypes from industrial psychology literature

Table 2. Control and treatment phrases by occupation

Figure 1. (a) Distributions of cosine similarity (CS) scores. (b) Locations of treatment and control phrases in the CSS distribution of job ad phrases. (c) Comparing the distribution of CSS scores and perceived ageism by stereotype.Note: (a) Figure reports the distribution of cosine similarity scores for all trigrams from the job ads with the indicated stereotypes. The higher the cosine similarity score, the more related the trigram is to the stereotype, with a minimum of −1 and a maximum of 1. The phrases in the boxes are examples of phrases located at that point in the distribution.(b) Solid lines indicate the location of a control sentence in the cosine similarity score distribution. Dashed lines indicate the location of a treatment phrase (for the machine-learning treatments shown in Table 2).(c) The dark points/lines are at the average cosine similarity score of the treatment and control phrases as shown in Table 2, for the indicated stereotype. The height of the right-hand dark point/line in each panel indicates the difference in the perceived ageism of the machine-learning treatment phrases relative to the control phrases for individuals over 50 (Table 4, column 9).

Figure 2. Job ad examples.

Figure 3. Cosine similarity scores of administrative assistant templates (based on machine-learning treatments and the controls).Note: Graphs display median to 99th percentile range of trigram semantic similarity scores for stereotypes for Administrative Assistant ads. The average trigram semantic similarity score for each stereotype is represented by the respective shape for each template. The category ‘Other’ is the average of the remaining stereotypes listed in Table 1. Control (‘neutral’) templates contain trigrams from the created ad templates with only non-stereotyped phrases included. Collected ads comprise trigrams from all Administrative Assistant job ads. Treatment templates contain trigrams from the created ad templates with the respective stereotyped phrase or phrases included.

Figure 4. Cosine similarity scores of retail sales templates (based on machine-learning treatments and the controls).Note: Graphs display median to 99th percentile range of trigram semantic similarity scores for each stereotype for retail sales ads. The average trigram semantic similarity score for each stereotype is represented by the respective shape for each template. The category ‘Other’ shows the averages for the remaining stereotypes listed in Table 1. Control (‘neutral’) templates contain trigrams from the created ad templates with only non-stereotyped phrases included. Collected ads comprise trigrams from all retail sales job ads. Treatment templates contain trigrams from the created ad templates with the respective stereotyped phrase or phrases included.

Figure 5. Cosine similarity scores of security guard templates (based on machine-learning treatments and the controls).Note: Graphs display median to 99th percentile range of trigram semantic similarity scores for each stereotype for security guard ads. The average trigram semantic similarity score for each stereotype is represented by the respective shape for each template. The category ‘Other’ is the average of the remaining stereotypes listed in Table 1. Control (‘neutral’) templates contain trigrams from the created ad templates with only non-stereotyped phrases included. Collected ads comprise trigrams from all security guard job ads. Treatment templates contain trigrams from the created ad templates with the respective stereotyped phrase or phrases included.

Table 3. Demographics of MTURK sample

Figure 6. Survey results.Note: These numerical ratings reflect the degree to which survey respondents rated phrases as age-biased or not age-biased, with lower numbers indicating a greater bias against older workers. Likert scale ratings were translated to a numerical value such that ‘strongly agree’ mapped to 1, ‘somewhat agree’ mapped to 2, ‘neither agree nor disagree’ mapped to 3, ‘somewhat disagree’ mapped to 4, and ‘strongly disagree’ mapped to 5. The three categories: ‘self’, ‘others’, and ‘over 50’, refer to which group's opinions the MTURK respondents were asked to provide or predict in a given survey block. The average bias rating was collapsed on the treatment status of phrases (control, treatment, and AARP) as well as the category of the stereotype (communication, physical, or technology). Hence, each point in the figure reflects the average bias rating MTURK respondents gave to a given treatment status for a specific stereotype from the perspective of a given group of people. For example, the triangle in the first row of the figure indicates that when respondents were asked for their self-assessment of whether or not the physical stereotype control phrases were age-biased, they, on average, stated that they strongly disagreed.

Table 4. Differences in beliefs by treatment (negative implies more biased against older workers)

Figure 7. Scatterplot of self-beliefs of age bias and cosine similarity (CS) scores. (a) Self-beliefs. (b) Others' perceptions. (c) Others' over 50s' perceptions.Note: CSS, cosine similarity score. Figure plots MTURK respondents' average perceptions of age bias against CSS stereotype ratings from Table 2. Lower numbers on the y-axis indicate higher levels of perceived age bias. Higher CSS scores on the x-axis indicate higher average levels of semantic similarity of a phrase with its respective stereotype. Circular, triangular, and square markers represent control phrases, CSS treatment phrases, and AARP treatment phrases, respectively. Black (solid), blue (shaded), and red (unshaded) markers represent physical, technology, and communication phrases, respectively.

Burn et al. supplementary material

File 5 MB

Article contents

Stereotypes of older workers and perceived ageism in job ads: evidence from an experiment

Abstract

Keywords

1. Introduction

2. Conceptual framework and implications of the evidence

3. Studying job ads

4. Methods

4.1 Selecting the stereotypes

4.2 Creating the treatment (stereotyped) and control job-ad language

4.3 Validating the treatment vs. control differences

5. Experimental evidence

5.1 Baseline survey

5.2 Incentivized survey

5.3 Analyses

6. Results

6.1 Assessing actual job ads

7. Conclusions and discussion

Supplementary material

Acknowledgements

Footnotes

References

Burn et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests