A large body of research indicates that the nutrition environment, including the availability of food stores and restaurants within a community, and the quality and price of healthy food choices within these establishments, influences eating behaviour( Reference Burns, Bentley and Thornton 1 – Reference Morland, Filomena and Granieri 7 ). Food stores that sell foods and beverages that can be prepared at home are an important element of the nutrition environment, particularly in the light of evidence that suggests at-home food preparation is associated with better dietary intake and more family meals( Reference Berge, MacLehose and Larson 8 , Reference Monsivais, Aggarwal and Drewnowski 9 ).
Measures of the nutrition environment are necessary for understanding the factors influencing healthy eating behaviour( Reference Glanz, Sallis and Saelens 10 ). Researchers have used geographic-based measures to capture access to community food sources, including distance to the nearest supermarket, density of food stores within a given area and gravity-based measures that also incorporate travel time and modality( Reference Barnes, Bell and Freedman 11 – Reference Lytle and Sokol 18 ). The availability, price and quality of healthy options within food stores represent the consumer nutrition environment, which reflects what consumers actually encounter within a retail food store( Reference Glanz, Sallis and Saelens 10 ). However, measurement of the consumer nutrition environment can be challenging because of the large number of potential factors that are believed to be related to the purchase of healthy foods (e.g. price, promotions, placement of items within a store, range of choices, freshness, visibility of nutritional information)( Reference Glanz, Sallis and Saelens 10 , Reference Glanz, Sallis and Saelens 19 ). In-person audits have therefore been considered the gold standard to fully document what consumers can actually purchase inside a store( Reference Moudon, Drewnowski and Duncan 20 , Reference Gustafson, Sharkey and Samuel-Hodge 21 ).
Direct observational audits, such as the widely used Nutrition Environment Measures Survey in Stores (NEMS-S)( Reference Glanz, Sallis and Saelens 19 , Reference Partington, Menzies and Colburn 22 ), provide objective and rigorous assessments of the consumer nutrition environment( Reference Ohri-Vachaspati and Leviton 23 ). Trained raters use the NEMS-S to rate the price and availability of ten food categories and assess the quality of fresh fruits and vegetables in food stores( Reference Honeycutt, Davis and Clawson 24 ). The NEMS-S is one of the few audit tools with demonstrated reliability and validity and has been used repeatedly in research on the consumer nutrition environment( Reference Partington, Menzies and Colburn 22 , Reference Honeycutt, Davis and Clawson 24 ). However, onsite assessments with in-person audits are time-intensive and costly, especially when auditing a large number of stores across a wide geographic area( Reference Partington, Menzies and Colburn 22 ).
The vast amount of web-based information on social media holds promise as a cost-effective alternative to in-person audits of the consumer nutrition environment. Social media, such as Twitter, Facebook and Yelp, are web-based forms of communication where people share information and create content( Reference Nguyen, Meng and Li 25 ). A growing body of research has used geographically referenced social media to assess multiple aspects of the nutrition environment, including food-borne illness outbreaks( Reference Nsoesie, Kluberg and Brownstein 26 , Reference Harrison, Jorder and Stern 27 ), the relationship between healthy food-related Twitter posts and proximity to healthy food stores( Reference Chen and Yang 28 ), and to create neighbourhood indicators of healthy food availability( Reference Nguyen, Kath and Meng 29 , Reference Gomez-Lopez, Clarke and Hill 30 ). Researchers have documented associations between the prevalence of healthy food-related postings on social media and the socio-economic characteristics of the local neighbourhood( Reference Nguyen, Kath and Meng 29 , Reference Widener and Li 31 , Reference Ghosh and Guha 32 ). Recent work has also shown a relationship between food-related social media posts and county- and state-level health outcomes( Reference Nguyen, Meng and Li 25 , Reference Nguyen, McCullough and Meng 33 ).
However, no research to date has explored the use of social media for assessing the nutritional content and offerings inside a community food store. The purpose of the present study was to examine the feasibility of using Yelp to assess the consumer nutrition environment. Yelp (www.yelp.com) is a popular social media site that provides a platform for consumers to post reviews of local businesses and services. In 2017, Yelp averaged over 75 million unique users per month( 34 ). User-generated content includes an overall business rating (1 to 5 stars), cost rating (1 to 4 dollar signs, representing ‘inexpensive’, ‘moderate’, ‘pricey’ and ‘ultra high-end’, respectively) and detailed text reviews from users that capture cost, quality and other aspects of the business experience.
We used Yelp reviews to assess the consumer nutrition environment for sixty-nine grocery stores in the City of Detroit, Michigan, USA. Detroit is a rich focus for this work since it has experienced dramatic structural and economic decline since the 1950s, with consequences for the availability and quality of healthy food sources for local residents( Reference Zenk, Schulz and Israel 35 ). Using sentiment analysis( Reference Cambria, Schuller and Xia 36 ) we mined the Yelp review text for indicators of the consumer nutrition environment, including healthy food availability, price and quality. We then assessed the degree to which Yelp review metrics were consistent with NEMS-S scores obtained from a direct observation audit of these sixty-nine stores.
Methods
We focused on ‘full-line’ grocery stores, following the Michigan Department of Agriculture & Rural Development’s definition as ‘a store selling fresh produce, fresh meat, fresh bread, and fresh dairy’( 37 ). An enumerative list of 102 full-line grocery stores in Detroit, Michigan was obtained from Detroit Food Map (http://www.detroitfoodmap.com/), a community-based initiative that assesses the quality of food stores as access points for nutritious and healthy food options in Metropolitan Detroit. Trained raters conducted NEMS-S audits at 102 grocery stores from July 2015 to March 2016. Audits were not completed for two stores, leaving 100 stores with complete NEMS-S audits.
A total of sixty-nine of the 100 grocery stores with complete audit data had records on Yelp. The Yelp Application Program Interface (Yelp Fusion API; https://www.yelp.com/fusion) was used in February 2017 to request the online public information for each of these sixty-nine grocery stores. Metadata for each store (including name, address, number of reviews and store ratings) were retrieved using the Yelp API. Yelp review text was retrieved by separately downloading and parsing the review pages for each store.
NEMS-S measures
The NEMS-S scoring system considers the availability, quality and price of healthy choices within ten food categories: milk, fruits, vegetables, ground beef, hot dogs, baked goods, beverages, bread, potato chips and cereal( Reference Glanz, Sallis and Saelens 19 , Reference Glanz, Basil and Maibach 38 ). Separate scores for availability, price and quality are created and summed to create an overall total score( Reference Glanz, Sallis and Saelens 19 ). Higher scores indicate greater availability and quality and lower cost of healthy options.
NEMS-S availability scores are based on the number of different varieties of fruits and vegetables, as well as the presence of healthier options within each of the non-produce food categories (i.e. low-fat/skimmed milk, lean ground beef, fat-free hot dogs, low-fat baked goods, 100% juice or diet soda, wholegrain bread, baked chips and low-sugar cereal). The total availability score ranges from 0 to 27 with a higher score indicating a greater availability of healthier options.
NEMS-S quality scores are assigned for fruits and vegetables based on the proportion of produce that is rated either acceptable or unacceptable (score=3 if >75% acceptable; score=2 if 50–75% acceptable; score=1 if <50% acceptable). A total quality score is calculated by summing the scores for fruits and vegetables (range 0–6).
NEMS-S price scores are based on the relative price of the healthier option within each non-produce food category (e.g. skimmed milk v. whole milk). The scoring system assigns negative values if the cost of healthier options is greater than the cost of comparable regular options( Reference Glanz, Sallis and Saelens 19 ).
Total NEMS-S scores were calculated by summing the availability, price and quality scores for five key food categories (milk, fruits, vegetables, ground beef and bread). A maximum score of 39 reflects greater availability of, and relatively cheaper prices for, more healthful or recommended food choices and quality produce. Because not all stores had healthier options for sale in all five food categories, total NEMS-S scores were available for only fifty-six of the sixty-nine stores.
Food prices (in US dollars) were calculated for the healthy items in each of six food categories most relevant for nutrition (low-fat milk, fruits, vegetables, lean ground beef, wholegrain bread and cereal). Because not all items were available in all stores, only forty-five stores had data on healthy food prices. An overall total healthy food price for each store was created by generating Z-scores to standardize the prices across the different food categories. The total food price was calculated by summing the Z-scores of each food category for each store.
Yelp measures
For each store, summary Yelp metrics were calculated for overall store rating (number of stars out of 5), cost rating (number of dollar signs out of 4) and total number of reviews. A sentiment analysis was then conducted with the Yelp review text to capture the perceptions of the consumer nutrition environment with respect to the dimensions of the NEMS-S (food availability, price, quality). Sentiment related to the overall shopping experience was also included to tap other dimensions of the consumer nutrition environment not assessed by the NEMS-S (e.g. customer service, cleanliness, spaciousness, crowdedness) that could be important for food store choices( Reference Krukowski, Sparks and DiCarlo 39 ).
Sentiment analysis refers to the use of natural language processing techniques to systematically identify and quantify information and opinions from web-based textual information( Reference Cambria, Schuller and Xia 36 , Reference Jensen, Jensen and Brunak 40 – Reference Grafton, Yu and Carrell 42 ). We used a food-related subset of the Linguistic Inquiry and Word Count (LIWC) tool( Reference Pennebaker, Chung and Ireland 43 ) to generate a comprehensive list of 594 keywords, phrases and adjacency expressions capturing both positive and negative sentiment about (i) food availability, (ii) price, (iii) quality and (iv) shopping experience. We first eliminated LIWC keywords that did not correspond to the NEMS-S categories (e.g. words describing food preparation: ‘braised’, ‘scalded’). We then manually examined the review phrases containing frequently used keywords (>10 occurrences) to confirm that they were being used to describe concepts in the NEMS-S. If needed, keywords were expanded to better capture sentiment, such as by adding phrases and adjacency expressions (e.g. ‘not very fresh’ v. ‘fresh’). Table 1 shows examples of positive and negative sentiment keywords in each of the four dimensions, along with illustrative examples from the Yelp review text.
For each dimension, we calculated a positive sentiment score and a negative sentiment score based on the proportion of positive and negative keywords, respectively, in each store’s reviews. To adjust for the number of reviews per store and the length of each review, we normalized the sentiment scores using the Okapi BM25 method (a text retrieval algorithm), which scales the total number of keywords in a store’s review by the total number of words in the review relative to the average length of the review across all sixty-nine stores( Reference Jones, Walker and Robertson 44 , Reference Robertson 45 ). Weighted proportions were then averaged across all reviews in each store to capture the average positive and negative sentiment on food availability, price, quality and general shopping experience. Net sentiment was also computed by subtracting the average negative sentiment score from the average positive sentiment score for each dimension. All sentiment scores are expressed in percentages.
Statistical analyses
Descriptive statistics (means and standard deviations) were used to summarize the consumer nutrition environment according to the NEMS-S scores and the Yelp review metrics. Pearson correlation coefficients (ρ) were used to assess the agreement of the sentiment scores from the Yelp reviews with the NEMS-S availability scores, total NEMS-S scores and food price Z-scores. All analyses were conducted using the statistical software package SAS version 9.4. Statistical significance of the correlation coefficients was assessed with an α level of 0·05.
Results
Summary statistics for the consumer nutrition environment in these sixty-nine grocery stores are presented in Table 2. NEMS-S availability scores ranged from 8 to 17, with a mean score of 12·9 out of a possible score of 27, reflecting a lack of availability of a large range of healthier options (Table 2). This low availability was also reflected in the total NEMS-S scores for five key food categories in the NEMS-S (milk, fruits, vegetables, ground beef, bread), which considers both availability and price as well as produce quality. On average, total NEMS-S scores were 18·3 out of a possible score of 39.
NEMS-S, Nutrition Environment Measures Survey in Stores.
Food prices across six food categories (milk, fruits, vegetables, ground beef, bread, cereal) were captured in Z-score US dollars (mean=0·5, sd=8·0). Results of unstandardized price comparisons (not shown in Table 2) indicated that healthier food options tended to be more expensive than less healthy options. Over two-thirds of the stores sold whole-wheat bread, lean ground beef, 100% juice and low-fat chips at higher prices than their less-healthy counterparts.
The mean number of Yelp reviews per store was 20·6 (range 1–172) with half the stores having fewer than five reviews (Table 2). The mean store rating on Yelp was 3·3 stars out of a possible 5 stars. Mean store cost, reflected through the number of dollar signs in the review, was 1·6, representing ‘inexpensive’ to ‘moderate’ prices according to Yelp.
Table 2 presents the average percentage of positive and negative sentiment words (normalized) in the Yelp review text across each of the four dimensions (availability, price, quality, overall shopping experience). In general, reviews contained more sentiment words about food availability and the overall shopping experience than about food price and food quality. In addition, positive sentiment in the reviews was greater than negative sentiment for each dimension. For example, the average proportion of positive sentiment words for food availability was 34·4% in the Yelp reviews compared with 7·6% for negative sentiment. The net availability sentiment (26·8%) reflects the dominance of positive words related to food availability over negative words. Similarly, for the overall shopping experience, almost 26% of the review text contained positive words and 8·4% contained negative words, for a net general sentiment of 17·5% on average.
Table 3 presents the Pearson correlation coefficients for the NEMS-S scores and the Yelp review metrics. (Stratified analyses were also conducted by the number of Yelp reviews per store (≤5 v. >5 reviews), but there was no difference in the pattern of results. Therefore, only the unstratified results are presented in Table 3.) Stores with higher availability scores on the NEMS-S tended to be rated as more expensive on Yelp (i.e. more dollar signs; ρ=0·444, Table 3). Similarly, higher total NEMS-S scores (ρ=0·501), which incorporate produce quality and relative food price in addition to availability, also tended to be rated as more expensive on Yelp. These positive correlations suggest that stores selling a greater variety of healthier food choices with high-quality produce tended to be rated as more expensive. The number of Yelp dollar signs was also positively correlated with the actual food prices in the stores determined through direct observation (ρ=0·462) and negative sentiment about food prices in Yelp review text was positively correlated with higher food prices (ρ=0·413, Table 3).
NEMS-S, Nutrition Environment Measures Survey in Stores.
Correlation was statistically significant: *P<0·05, **P<0·01, ***P<0·001.
Higher store food prices (Z-score US dollars) were positively associated with higher store ratings (i.e. number of stars; ρ=0·317) and a greater number of reviews posted for that store (ρ=0·454, Table 3). Thus, more expensive stores tended to be those that were rated highly and also those for which people posted more reviews on Yelp. This is consistent with the positive association between the total NEMS-S scores and the number of reviews (ρ=0·447, Table 3), suggesting that people post more reviews for stores that have higher-quality and more healthy foods, which also tend to be pricier.
A greater availability of healthier food choices (reflected in higher NEMS-S availability scores) was negatively associated with positive review sentiment on food quality (ρ=−0·239) and positively associated with more negative sentiment on the shopping experience (ρ=0·253, Table 3). This suggests that the availability score in the NEMS-S (which is driven by all ten categories of foods, including chips, soda and hot dogs) may not reflect food quality in the eyes of the consumer and stores selling these foods may not provide a quality shopping experience. There were no statistically significant correlations between the food quality review sentiment and total NEMS-S scores (which factor in produce quality only, Table 3).
Discussion
The present study is one of the first to assess the feasibility of using social media to assess the consumer nutrition environment. While social media is increasingly used to assess the nutrition environment with respect to food-borne illness outbreaks( Reference Nsoesie, Kluberg and Brownstein 26 ) and state-level health outcomes( Reference Nguyen, Meng and Li 25 ), no research has explored the potential of using social media to assess the nutritional content and offerings inside food stores. Using sentiment analysis, a method of computational linguistics and text analysis, we compared Yelp review text with NEMS-S scores collected through in-person audits on the availability, price and quality of healthy food options in sixty-nine grocery stores in Detroit, Michigan, USA.
We found that grocery stores that were rated as more expensive on Yelp tended to have higher observed food prices documented during the in-person audit. A larger number of dollar signs on a store’s Yelp page was positively correlated with higher observed food prices for fruits, vegetables, milk, beef and bread. Similarly, more negative sentiment expressed about food prices in the Yelp review text was associated with higher food prices observed in the in-person audit. Thus, Yelp review text and overall cost ratings show promise as a reasonable barometer of the cost of healthy food options actually observed in local grocery stores.
We found no correlation between NEMS-S availability scores and Yelp review text sentiment about food availability. Nor did we find any correlation between total NEMS-S scores and Yelp review text sentiment on food price, food quality or food availability (all of which factor into the total NEMS-S score). Thus, Yelp reviews do not appear to be capturing the availability and quality of healthy food choices within stores, as reflected in the NEMS-S scoring system. However, we did find that stores reviewed as more costly on Yelp (reflected through negative price sentiment and a greater number of dollar signs) were more likely to have higher NEMS-S availability scores and higher total NEMS-S scores. This is consistent with other research demonstrating the cost burden of purchasing healthy food( Reference Rao, Afshin and Singh 46 ) and suggests that grocery stores with a greater availability of quality, healthy food choices also tended to be rated on social media as more expensive overall.
We also showed that Yelp reviews can be used to capture additional metrics about the consumer nutrition environment that are not included in the NEMS-S, including general store appearance, cleanliness and service, which have been shown to be important for consumer food store choices( Reference Krukowski, Sparks and DiCarlo 39 , Reference Young, Swanson and Craven 47 ). While the NEMS-S assesses the quality of fruits and vegetables only, Yelp reviews contained information about the quality of other food categories (e.g. the quality of meat)( Reference Krukowski, Sparks and DiCarlo 39 , Reference Krukowski, West and Harvey-Berino 48 ) and on the overall shopping experience.
Yelp reviews contained more positive sentiment than negative sentiment, suggesting that more positive reviews are posted about grocery stores than negative reviews. Stores with more Yelp reviews and higher store ratings (as reflected by the number of stars on Yelp) also tended to have more expensive food prices observed in the store. Thus, online users may be more inclined to review more high-quality and expensive grocery stores. Alternatively, if these higher quality stores are located in more socio-economically affluent neighbourhoods, then a greater number of reviews could reflect the tendency of residents in these neighbourhoods to post more reviews on social media. In supplemental analyses (not shown) we found that grocery stores located in more affluent neighbourhoods (captured through census tract indicators of higher education, higher household income and more residents with professional occupations) had significantly more Yelp reviews. Conversely, those stores located in more socio-economically disadvantaged neighbourhoods (captured through census tract indicators of poverty, unemployment and more residents on public assistance income) had significantly fewer reviews, consistent with the concept of the ‘digital divide’( Reference Chang, Bakken and Brown 49 ). Differences in both computer literacy and Internet access across these types of neighbourhoods may drive the number of online reviews about the consumer nutrition environment( Reference Jensen, King and Davis 50 , Reference Henly, Tuli and Kluberg 51 ).
Limitations
Yelp reviews were not available for all 100 grocery stores in our sample. The thirty-one stores without reviews were significantly more likely to be in socio-economically disadvantaged census tracts and in areas with low broadband adoption( Reference Veinot, Goodspeed and Clarke 52 ), suggesting that social media may be useful for assessing the consumer nutrition environment only in less disadvantaged areas. Further research in communities other than Detroit is needed to determine the extent to which social media is selectively used across communities with different characteristics. Although our study was exploratory, we conducted a large number of statistical tests. Thus, the statistical significance of the observed correlations may reflect a statistical anomaly.
Conclusion
Despite these limitations, the present study is the first attempt to consider whether information gleaned from social media could be useful for evaluating the consumer nutrition environment. Researchers today can draw on an expanding set of data about the nutrition environment, including social media websites like Yelp, which rely on user contributions( Reference Gomez-Lopez, Clarke and Hill 30 ). The study suggests that, while Yelp cannot replace in-person audits for collecting detailed information on the availability, quality and cost of specific food items, Yelp does hold promise as a cost-effective means to gather information on the overall cost and quality of the consumer nutrition environment. Simple metrics like the number of dollar signs on Yelp or the overall star rating may be useful for researchers or practitioners who are unable to conduct in-person audits in community food stores due to time or cost reasons. Even for those able to conduct detailed observational audits, Yelp may serve to supplement NEMS-S metrics of the consumer nutrition environment by providing other indicators of store and food quality relevant for consumer nutrition.
Acknowledgements
Financial support: This work was funded in part by the Gordon and Betty Moore Foundation through a grant to the University of Michigan (grant number GBMF3943); the Alfred P. Sloan Foundation (grant number 2014-5-05 DS); MCubed and the University of Michigan Office of Research; and the University of Michigan Rackham Graduate School via the Social Sciences Annual Institute Competition Round 6. The funders had no role in the design, analysis or writing of this article. Conflict of interest: None. Authorship: Y.S., P.C., I.N.G.-L., V.G.V.V. and T.C.V. contributed to the conception and design of the study, data acquisition, and data analysis and interpretation; the writing and revision of the article for important intellectual content; and approved the final version of the submitted manuscript. A.B.H. contribution to data acquisition, the revision of the article for important intellectual content, and read and approved the final version of the submitted manuscript. D.M.R., R.G. and V.J.B. contributed to the interpretation of the data and analysis, revised the article for important intellectual content, and read and approved the final version of the submitted manuscript. Ethics of human subject participation: Not applicable.