In nutritional epidemiology, the hierarchical classification of food items into broader categories plays a critical role when examining associations with food consumption and health(Reference Fardet, Rock and Bassama1). For these purposes, international classification systems such as the European Food Safety Authority’s FoodEx2 classification(2) or FAO/INFOODS(Reference Charrondiere, Stadlmayr and Haytowitz3) have been developed to improve the availability and reliability of dietary data obtained from traditional nutrition surveys. These classification systems also make the comparison and reproducibility of the results between different countries more feasible and easier and allow researchers to harmonise their data and food composition databases in a transparent way(Reference Fardet, Rock and Bassama1,Reference Finglas, Berry and Astley4) .
The use of grocery food purchase data by using food retailers’ customer loyalty card data in academic research has gained increasing attention in recent years(Reference Bandy, Adhikari and Jebb5–Reference Sørensen, Nielsen and Møller8). Grocery purchase data are about what, when, where and at what price food has been bought, with or without a personal identifier tag(Reference Vuorinen, Erkkola and Fogelholm7). Customer loyalty card data always include at least some personal data of the person who made the purchase. Hence, loyalty card data provide a unique opportunity to obtain vast amounts of detailed data automatically and objectively over time on different card holders’ grocery purchases(Reference Vepsäläinen, Nevalainen and Kinnunen9). These data can be used to monitor the nutritional quality of food purchases(Reference Vepsäläinen, Nevalainen and Kinnunen9,Reference Lintonen, Uusitalo and Erkkola10) that can lead to health policy actions (e.g. a sugar tax)(Reference Erkkola, Kinnunen and Vepsäläinen11) with the purpose of steering food consumption toward healthier options that could eventually improve public health nutrition(Reference Teng, Jones, Mizdrak and Signal12,Reference Vall Castelló and Lopez Casasnovas13) . Moreover, the data can be used for monitoring and evaluating policies, as well as dietary, environmental, social, and economic sustainability(Reference Jenneson, Pontin and Greenwood6,Reference Meinilä, Hartikainen and Tuomisto14) . Present and future diets should reduce global health risks, like cardiovascular diseases, type 2 diabetes and cancer and reduce the environmental impact of the food systems(Reference Willett, Rockström and Loken15). For evidence-based decision-making, robust and timely information on population-level health and environmental behaviours, obtainable also from loyalty card data, is essential(Reference Beaglehole, Bonita and Horton16).
Food retailers commonly use classification systems that are based on logistics or product placement on the shelf, and they do not essentially reflect products’ nutritional profiles(Reference Pauler and Dick17,Reference Zhong, Xu and Wang18) . Therefore, to harness the full potential of customer loyalty card data for scientific research, thousands of grocery products should be reclassified into categories that are meaningful for nutrition and health research. Although the purchase of single foods can be used for research purposes, the research objectives (e.g. comparing food purchase data to individual’s food consumption measured using traditional dietary assessment methods) may necessitate working on less granular levels(Reference Vepsäläinen, Nevalainen and Kinnunen9). For this, we need hierarchical structures for the grocery products using a suitable theory-based approach(Reference Carlson, Lino and Fungwe19,Reference Todd, Mancino and Leibtag20) .
Only a few of the classification methods used for groceries have been transparently described, such as the Convenience Food Classification Scheme(Reference Peltner and Thiele21) and the NOVA classification(Reference Monteiro, Levy and Claro22). Convenience Food Classification Scheme includes three convenience categories based on the degree of processing, culinary skills required to transform the bought food into a meal, the time needed for meal preparation, the time needed after consumption (e.g. cleaning up and washing dishes) and the context in which a food or meal is consumed (e.g. snack or ready-made meal)(Reference Peltner and Thiele21). The NOVA classification assigns food products to groups based on the degree of processing: Group 1 – Unprocessed or minimally processed foods; Group 2 – Processed culinary ingredients; Group 3 – Processed foods; Group 4 – Ultra-processed foods, that is industrial formulations or foods prepared by the industry, packaged, ready for consumption and with a high content of salt, sugars and fat(Reference Monteiro, Levy and Claro22). Although nutritional quality was associated with the above classification criteria (convenience and degree of processing), nutrition as such was not the starting point. Moreover, even though there are suitable tools available such as the Nutrient-Rich Food Index (NRFI)(Reference Drewnowski23) and the Grocery Purchase Quality Index-2016(Reference Brewster, Durward and Hurdle24), which can be used to evaluate whether a classification eventually succeeds in reflecting the nutritional quality of the grocery purchases, this type of evaluation is rarely done(Reference Vadiveloo, Juul and Sotos-Prieto25,Reference Wu, Fuchs and Lian26) . We argue that a clear, explicit, openly available and critically evaluated grocery product classification is needed to advance the research on the health and environmental impacts of grocery purchases.
We have received a large-scale (n 47 066 card holders) longitudinal customer loyalty card (LoCard) data set from the largest food retailer in Finland (market share about 46 %)(Reference Nevalainen, Erkkola and Saarijärvi27). Since the original product grouping used by the retailer was designed for retail purposes, our first challenge was to design and compile a meaningful product grouping appropriate for nutrition and health research. Thus, the purpose of this paper is to compile, describe and test a reclassification of products suitable for nutrition and health research purposes (labelled as LoCard Food Classification, hereafter LCFC) and to make it openly accessible. To achieve this, we demonstrate how the reclassification process was conducted and illustrate the feasibility by examining variation in nutrition quality within chosen food groups.
Methods
Hierarchy of the food retailer’s original grocery product groups
The food retailer’s original hierarchical structure included 3574 grocery product groups. This study describes the process of how these groups were reclassified in LCFC. Since the purpose of the LCFC was to serve nutrition and health research, a key guiding principle in the development was to design a group hierarchy that reduces variation in nutritional quality towards higher granularity. Neither this process nor the analyses for this paper involved any use of the customer loyalty card data or other personal data from human participants. Further, the process did not use sales data.
The retailer’s product group hierarchy was based on logistics or product placement on the shelf. Consequently, the retailer’s hierarchy on its most granular level included information such as flavour, form of storage and package sizing (e.g. ‘citrus lemonades, canned, stored in fridge’, ‘cola-flavoured drinks, 4 bottles, stored in room temperature’) and packing of the products (e.g. ‘cream cheese, pre-sliced’ or ‘whole breads, pre-sliced’). From nutritional and health perspective, this information was naturally irrelevant and could be excluded.
For most of the product groups, the name of the product group gave adequate information to understand the nutrient quality of that product group for our research purposes. For example, even though we did not know the brand names of beverages the product group name included information if a beverage had added sugar (e.g. ‘cola-flavoured drink, no sugar, canned’). Coca-Cola Zero and Pepsi Max are nutritionally identical (both have zero added sugar). Similarly, regular Pepsi, Coke, Fanta and Sprite are all sugar-sweetened soft drinks, which form a generic, but well-defined and nutritionally homogenous class with the essential nutrient being added sugar 10 % of weight.
Our main challenge was to reclassify grocery product groups when the nutritional quality of the group was not obvious from the name, especially when the main ingredient was unclear. Examples of such groups include ‘Other meat’, ‘Ready-made salads’, ‘Hamburgers’ or ‘Pizzas’. This was further complicated by not having any food item-level information on the product groups due to business secrecy. Fortunately, we received a sample of 26 000 food items including their product name, EAN code, package size and their original product group. This information aided us in reclassifying most of the foods. Additionally, we used the retailer’s online food purchasing service, which provided information about the food items, such as product name, within the grocery product groups.
Principles and selected examples of LoCard Food Classification hierarchy
We built four-level hierarchical classification of product groups in the LCFC. LCFC Class 1 (LCFC-1) had the lowest granularity and was subsequently divided into subclasses of higher granularity starting with LCFC-2, followed by LCFC-3 and, LCFC-4, which had the highest granularity. An example of LCFC hierarchy is given in Fig. 1. The whole LCFC is openly available at ·https://doi.org/10.5281/zenodo.7781352.
In LCFC-1, the grocery product groups were reclassified into 38 groups based on healthiness (28) and main ingredients (Reference Reinivuo, Hirvonen and Ovaskainen29) (see Table 1). Our approach to ‘healthiness’ (as a proxy for nutritional quality) was based on the Nordic Nutrition Recommendations(28). Food groups with a recommendation to limit the intake, including those high in sugar, saturated fat or salt, were to be separated from foods recommended to be included in the diet. Such foods include fruits, vegetables and berries; whole-grain cereal products; low-fat dairy; fish and seafood; plant-based meat alternatives; nuts and seeds; oils and margarine.
Compared with the retailer’s original product group hierarchy, for example, the LCFC-1 separates plant-based protein products into their own main class from the meat product group where they were placed in the food retailer’s original grouping. This LCFC-1 group was named ‘Plant protein products’ and included processed legume products such as those mimicking meat, as well as unprocessed lentils, peas, and beans. Within LCFC-1, we also formed a separate group for plant-based dairy-like products including, for instance, soy and oat milk.
Classification to LCFC-2 was dictated by the type of foods in the product group, purpose of use of the product groups and food culture. This means, for example, that the LFCF-1 group ‘Milk and dairy products’ was further classified to ‘Cheeses’, ‘Ice creams’ and ‘Liquid milk products’ (Table 1). Another example would be classification of edible fats into ‘Butter and fat blends’, ‘Margarine’, ‘Vegetable oils’ and ‘Cooking fat’. The purpose of use of the product groups was also considered in the reclassification at the LCFC-2 level. For example, the purpose of use for nuts may vary based on whether they are plain nuts that are often used in salads, chocolate-coated nuts which can be used as sweets or salted nuts which may resemble the use of other salty snacks. Therefore, plain nuts were classified under ‘Dried fruits and nuts’ whereas chocolate-coated nuts were classified under ‘Sweets and chocolates’ and salted nuts under ‘Snacks’ at the LCFC-1 level.
Classification of traditional Finnish ready-made pea soup is an example of how we considered the national food culture in LCFC-2. Namely, the most common pea soup contains small amounts of meat (< 5 %), but the green pea is the main ingredient. Since pea soup is traditionally served on Thursdays in lunch restaurants, it is also one of the main contributors to the consumption of legumes among the Finnish population. Therefore, we decided to classify it under ‘Peas, beans, lentils and soya’ at the LCFC-2 level, which is under the broader ‘Plant protein products’ category at the LCFC-1 level – not as a red meat product.
At its most granular levels (LCFC-3 and LCFC-4), nutritional quality and carbon footprint were used to guide the classification when reasonable (Table 1). For breads and breakfast cereals, milk and dairy products and alcoholic beverages, we used their fibre, fat and alcohol content to guide the classification process at the LCFC-3 level. To be classified as high-fibre cereal, we used a cut-off of 6 % of fibre, as defined by the European Food Safety Authority(30). For milk, we used 1 % and 3 % cut-offs to separate skimmed, semi-skimmed and whole milk. For other dairy products, low fat was defined as < 1 % of fat. Alcoholic beverages were classified based on the following cut-offs for their alcohol content, based on the Finnish alcohol legislation: < = 1·2 %, 1·3–2·8 %, 2·9–3·5 %, 3·6–4·7 % and 4·8–5·5 %(Reference Uusitalo, Nevalainen and Rahkonen31). For some foods, such as cheeses, it would be desirable to use a cut-off based on their fat content, but this would have been possible for only some of the cheeses due to the retailer’s grocery product grouping. For example, the retailer grouped most of the cheeses by package size, processing and flavouring. In addition to nutritional content, we used carbon footprint as another basis for classification when within-food group variation in the carbon footprint of the foods was large. In other words, if nutrition categorisation was not detailed enough to differentiate between foods with different magnitude of carbon footprint, the categorisation was more detailed. For example, because the average carbon footprint of beef is much greater than that of pork(Reference Hartikainen and Pulkkinen32), in LCFC-4 we classified different types of red meat separately. The details of assigning carbon footprints have been described in prior literature(Reference Meinilä, Hartikainen and Tuomisto14).
For some of the retailer’s grocery product groups, the LCFC remained a compromise due to the lack of detailed information on the food item level. For example, we classified pizzas under cereals and bakery products since they were originally categorised by the retailer based on whether they were fresh or frozen, or if they had thick or thin crust, but not by whether they were, for example, vegetarian or meat pizzas. Thus, we considered the main ingredient in the pizzas to be wheat (cereals). Other examples of such groups include ‘Other canned foods’, ‘Warm dish service’, and ‘Other ready-made soups’. Eventually, there were only 38 grocery product groups (0·01 % of all the product groups) left unclassified under ‘Miscellaneous’ at the LCFC-1 level.
Last, we added tobacco products as a group of its own. It is an important product group to examine along with food and alcohol products regarding health.
Examining the LCFC hierarchy in terms of nutrition quality
The retailer’s grocery product groups were linked with their respective nutrient content by using the Finnish Food Composition Database Fineli(R), version 20 (www.fineli.fi)(Reference Reinivuo, Hirvonen and Ovaskainen29). Fineli’s open database includes 4232 food items and dishes, 1370 basic ingredients and 55 nutrients. For each product group, we selected a food item from Fineli that best represented the product group. In most of the cases, the name of the grocery product group had enough information for us to select the food from Fineli (e.g. pineapple, oat milk, ketchup, etc.). Otherwise, we exploited the small product item-level dataset received from the retailer and decided which food in Fineli describes the group the best. If the Fineli database did not contain a food that would have described the product group sufficiently, food composition databases of other countries (e.g. Swedish and USA databases) were exploited.
Out of the 3574 grocery product groups, 3368 were linked with nutrient content. Tobacco products and vitamin and mineral supplements (122 groups), spices and condiments (25 groups), and miscellaneous (38 groups) and 21 other product groups were left without nutrient content due to the challenge of finding a representative food in the composition databases, or the group did not include foods with relevant nutrient content.
To examine how well our hierarchical reclassification reflects the nutrient quality of the grocery product groups, we calculated a NRFI for each LCFC level following principles of Drewnowski et al. (Reference Drewnowski and Fulgoni33,Reference Fulgoni, Keast and Drewnowski34) . NRFI is a validated method of nutrient profiling aiming to provide an overall nutrient density score based on selected nutrients(Reference Drewnowski and Fulgoni33,Reference Fulgoni, Keast and Drewnowski34) . We calculated the NRFI per 100 grams of product using 11 nutrients, of which eight were regarded as positive (protein, fibre, PUFA, calcium, iron, vitamin D, vitamin C and folate) and three as negative (SFA), saccharose and salt) in terms of anticipated health effects. Recommended values used for the 11 nutrients were from the Finnish nutrition recommendations which are the same as for the Nordic Nutrition recommendations(28), except salt which is 5000 mg in the Finnish recommendations (6000 mg in the Nordic nutrition recommendations).
Among the 11 nutrients that we included in NRFI, intakes of fibre, PUFA and vitamin D have been identified as relatively low at the population level in Finland(Reference Valsta, Kaartinen and Tapanainen35). In contrast, the high intake of SFA and salt has been public health concerns for decades among the Finnish population. Intakes of iron and folate have been low among Finnish women of childbearing age. Including these nutrients in the NRFI was therefore justified. It should be noted that the NRFI does not use any weights for different nutrients(Reference Drewnowski and Fulgoni33,Reference Fulgoni, Keast and Drewnowski34) . The openly available LCFC hierarchy also includes values for the 11 nutrients and NRFI values.
Recommended values for calculating the percentage of daily recommendation (DR%) are as follows:
Protein = 90 g (corresponding 15 % of energy in 2400 kcal diet)
Fibre = 25 g
Polyunsaturated fat (PUFA) = 20 g (corresponding 7·5 % of energy in 2400 kcal)
Calcium (Ca) = 800 mg
Iron (Fe) = 9 mg
Vitamin D = 10 µg
Vitamin C = 75 mg
Folate = 300 mg
Sucrose = 60 g (corresponding 10 % of energy in 2400 kcal)
Saturated fat = 26·7 g (corresponding 10 % of energy in 2400 kcal)
Salt = 5000 mg
Equation 1: Positive score: (DR% protein + DR% fibre + DR% PUFA + DR% Ca + DR% Fe + DR% Vit D + DR% Vit C + DR% folate)/8
Equation 2: Negative score: (DR% sucrose + DR% SFA + DR% salt)/3
Equation 3: (NRFI): positive score – negative score
Boxplot figures including median NRFI values (horizontal line in the box), lower and upper quartiles (outer horizontal lines of box) and expected minimum and maximum values (end of whiskers; calculated as 1·5 × inter-quartile range) for each group at different LCFC hierarchy levels were drawn using R statistical software(36).
Results
Figure 2 gives an overall representation of the hierarchy of the LCFC from the retailer’s grocery product groups to LCFC1–3. Similar figures for all hierarchy levels including the number of the product groups at each level can be found in online Supplementary material 1. The whole detailed classification structure of LCFC1–4 is available at ·https://doi.org/10·5281/zenodo.7781352. Most of the grocery product groups were classified only at LCFC1–2, and LCFC3–4 were used when needed, for example, to distinguish foods with different nutritional or carbon footprint profiles. Therefore, not all foods were classified at the most granular levels.
The largest groups at LCFC-1 (in terms of number of grocery product groups within the class) such as ‘Alcoholic beverages’, ‘Red and processed meat’, ‘Cereals and bakery products’ and ‘Milk and dairy products’ represented about half (1509 out of 3574) of all the retailer’s grocery product groups (Fig. 2). This was mostly related to the original, highly granular classification in the grocery retailer’s hierarchy. The majority of the other half came from ‘Plant protein products’, ‘Sugar-sweetened beverages’, ‘Fish and seafood’, ‘Vegetables, ‘Poultry and poultry dishes’, ‘Low-sugar beverages’, ‘Sweets and chocolates’, ‘Bottled water and mineral water’ and ‘Baby foods’ (listed from the largest to smallest group). These were the next biggest LCFC-1 groups containing 100–200 grocery product groups (1249 retailer’s grocery product groups in total). The smallest 25 LCFC-1 groups included less than 100 grocery product groups each and 816 retailer’s grocery product groups in total.
To illustrate the extent to which the LCFC succeeded in reflecting the nutritional quality of the grocery product groups, Fig. 3 shows the medians and the variation in the NRFI values of the food groups at LCFC-1 level. In general, when the groups at LCFC-1 were ranked by their NRFI median value, the order was logical based on the expected nutritional quality of the groups. Grocery product groups under ‘Dried fruits and nuts’ (median = 0·15), ‘Fish and seafood’ (median = 0·06) and ‘Eggs’ and ‘Fruit juice’ (median = 0·05) were nutrient-rich according to their median NRFI values. On the contrary, ‘Edible fat’ (median = –0·11), ‘Jam and marmalade’ (median = –0·12), ‘Sweets and chocolate’ (median = –0·24) and ‘Plant-based dairy-like products’ (median = –0·25) were less nutrient-rich, as indicated by the negative index (Fig. 3). In general, many foods high in sugar, fat and/or salt are on the lower (left side of the x-axis) side of NRFI, while foods recommended in dietary guidelines(Reference Todd, Mancino and Leibtag20) tend to be positioned higher (right side).
As seen in the boxplots, the variation in NRFI of the food groups at LCFC-1 was large, as nearly all food groups expand both sides of the zero line that separates food groups that are more nutrient-rich from the less nutrient-rich (Fig. 3). The mean sd in NRFI at LCFC-1 was 0·21. ‘Edible fat’ and ‘Sauces’ had the largest sd (fat: sd = 0·35 index points, number of product groups n 22; sauces: sd = 0·35, n 45), followed by ‘Meal ingredients’ (sd = 0·27, n 33) and ‘Red and processed meat’ (sd = 0·16, n 399) (Fig. 3). Less variation was found in ‘All beverages’ (sd = 0·01–0·03, n 66–452), ‘Eggs’ (sd = 0·01, n 9), ‘Mayonnaise salad’ (sd = 0·03, n 16) and ‘Fruits and berries’ (sd = 0·06, n 60).
As an example, we examined the NRFI values closer within ‘Cereals and bakery products’ at the LCFC-2 level (Fig. 4). There was less variation compared with the LCFC-1 level, and many of the food groups within ‘Cereals and bakery products’, on average, more clearly above or below the zero line. The mean variation in NRFI at LCFC-2 was 0·10. Then, we continued the example by selecting ‘Breakfast cereals’ from the LCFC-2 food that are within ‘Cereals and bakery products’. When moving further to the LCFC-3 level in ‘Breakfast cereals’, variation was still reduced within a single food group (‘high-fibre cereal’ and ‘low-fibre cereal’), with the mean variation in NRFI being 0·08.
Discussion
The main purpose of this study was to compile, describe and test a reclassification of grocery product groups (LCFC) that could serve as a well-grounded basis for future examination of associations between grocery purchase data, dietary quality, sustainability, and health outcomes. The LCFC hierarchy contains four levels, of which the broadest was named LCFC-1, including food groups such as ‘Vegetables’ and ‘Alcoholic beverages’. The division into the more detailed three subclasses was done based on the grocery product group’s type, quality (e.g. fibre or fat content), purpose of use, processing, carbon footprint and national food culture. As expected, the nutrient profiles (defined by NRFI) showed that there was more within-group variation in the nutrient quality of the food groups at LCFC-1, compared with the subclasses LCFC-2 to LCFC-4. This indicates that the subtle subclasses are better suited and a prerequisite for examining associations with grocery purchases and dietary quality(Reference Fardet, Rock and Bassama1,Reference Astrup and Monteiro37) .
To place our classification within an international context, it is essential to refer to The Classification of Individual Consumption According to Purpose (COICOP). This international reference classification of household expenditure has been developed by the United Nations Statistics(38). Within the broadest (least granular) structure, food and non-alcoholic beverages are one of the 15 classes (codes 01) in the least granular classification, and this class is further divided hierarchically into 16 subclasses (01.x) and 68 sub-subclasses (91.x.x.). Our broadest classification LCFC-1 falls hence between the granularity of these two COICOP subclasses.
COICOP is the basis for the British Living Costs and Food Survey (LCS)(Reference Rafferty and Walthery39) which uses a 5-digit scoring. The LCS classification covers a broad range of living costs, and ‘food’ is one of the 2nd level groups. LCS classification goes then down towards more granular level from ‘class’ (e.g. bread and cereal) to ‘COICOP expenditure code’ (e.g. rice), and finally to ‘COICOP-plus code’ which is close to our LCFC-4 granularity. The LCS classification has also been used in the UK to assess dietary patterns using supermarket transaction data(Reference Clark, Shute and Jenneson40). In that study, the researchers used 15 broad groups and 82 more detailed categories to identify purchase clusters, as indicators of dietary patterns. To improve the comparability of international reports and scientific research, it is crucial to openly share detailed classification descriptions when using similar hierarchical principles, but slightly different groupings.
Only a few studies have carefully described their justification and the process of classifying food purchase data for the purpose of using it for studying diet quality and health-related outcomes(Reference Carlson, Lino and Fungwe19–Reference Monteiro, Levy and Claro22,Reference Uusitalo, Nevalainen and Rahkonen31–Reference Fulgoni, Keast and Drewnowski34) . The Food Price Database created by the Center of Nutrition Policy and Promotion of USA for the National Food Plans(Reference Carlson, Lino and Fungwe19) is one of the most extensive and oldest classification systems for grocery purchase data. The classification divides 4152 individual foods under 58 food categories and five broad food groups that are based on the similarity of nutrient content, food costs, number of cup or ounce equivalents in MyPyramid(Reference Britten, Lyon and Weaver41) and use in meals. The Quarterly Food-at-Home Price Database (QFAHPD) was developed after the Food Price Database to fill the gap in available food price data and to support research on the economic determinants of diet quality and health outcomes(Reference Todd, Mancino and Leibtag20). Foods were categorised as seven main food groups and further into 26 separate categories based on the 2005 Dietary Guidelines(42). The finest level of 52 categories defines the processing level (e.g. fresh, canned or frozen).
Like our classification, the Food Price Database(Reference Carlson, Lino and Fungwe19) and QFAHPD(Reference Todd, Mancino and Leibtag20) reflect dietary guidelines. The reports discuss the challenges of the classifications. For example, QFAHPD pointed out the difficulty of classifying foods that are composed of several ingredients. Classifying mixed foods was also one of our main challenges, and our solution was the same as in QFAHPD: creating a ‘Miscellaneous’ class. However, we tried to minimise the number of grocery product groups in this class. This may have resulted in greater variation in the overall nutrient quality of the food groups at each class compared with the QFAHPD. This cannot be ascertained, as the nutritional quality of the QFAHPD has not been examined.
Despite the extensive classifications done in the Food Price Database and QFAHPD, we decided to create a new classification for our purposes. The main reasons for this were cultural and research purposes. Namely, although Finland – like the USA – is a high-income economy, there are still differences in our food cultures and grocery food supply (e.g. type of bread and oil used). Moreover, the primary purpose of our LoCard grocery purchase data is to study interactions between food healthiness, environmental impact and price within the context of sociodemographic background and intentional (e.g. new taxation of foods) and sporadic (e.g. COVID-19, Ukraine crisis) transformation; hence, the new LCFC classification is needed to support this research context.
Other classifications that have been well described are the NOVA classification(Reference Monteiro, Levy and Claro22) and the Convenience Food Classification Scheme(Reference Peltner and Thiele21). However, as explained in the Introduction, these classifications differed quite a lot from our principles, and these classifications would not have suited our purposes to link purchases primarily with health impact. The concept of ultraprocessed foods does not allow for nutritionally robust food grouping(Reference Braesco, Souchon and Sauvant43): for example, industrially produced, high- and low-fibre bread are both classified as ultraprocessed foods, despite their different nutritional profile.
We argue that the LCFC could be directly applied in the Nordic and Baltic countries with rather similar food environments. We recommend, however, adapting the classification to the national or regional food culture when it is used in other countries or in multinational studies. The present ‘big data’ era gives many possibilities, but comparable use of data may be a challenge in international collaboration. Hence, there is a need for transparency and international classification ‘libraries’, perhaps also linked to food and diet ontologies(Reference Andrés-Hernández, Blumberg and Walls44). Therefore, it is recommended that any new classifications are openly presented and shared among the science community.
Strengths and limitations
Our starting point was a grocery product grouping received from the food retailer and its original classification hierarchy. The most obvious limitation affecting our classification method was that we could not classify on the most detailed level (product level). This leads to some compromises, as well as making more assumptions of the food items under the grocery product groups. For example, frozen pizzas were classified under cereals because we had no information about whether they were meat or vegetarian pizzas.
In our evaluation of the nutrient quality of the classifications, there are possible weaknesses that need to be discussed. First, although the NRFI is a well-established method to profile groceries based on their nutrient content, it has methodological weaknesses(Reference Drewnowski and Fulgoni33,Reference Fulgoni, Keast and Drewnowski34) . The choice of nutrients included in the index is subject to the researchers’ discretion. Moreover, ranking of the foods by NRFI varies depending on the selection of nutrients in the index, and the equation used can also impact the outcome(Reference Fulgoni, Keast and Drewnowski34). It should also be noted that a difference in NRFI is difficult to interpret in a quantitative way, particularly when different kinds of foods are compared.
In our study, we used 11 nutrients to profile all food groups, but one could have also looked at food groups at LCFC-1 level and created separate indices for each food group with relevant nutrients included. This may have resembled the nutritional quality of the food groups better. For example, vegetables are generally perceived as very healthy, but they are not the main sources of iron, vitamin D, fibre or protein. Thus, judging vegetables by how much they include these nutrients is not relevant. Indeed, the class ‘Vegetables’ had relatively low NRFI, which does not resemble the true nutritional quality of this group.
In addition, NRFI does not have any upper or lower limits, meaning that the underlying assumption of the index is ‘the more nutrients the better’. In practice, this is not true. Nutrient intake that exceeds the recommended value does not bring additional health benefits. This becomes relevant especially when the nutrient profiling is examined together with environmental impacts. For example, in our results, plant-based protein products received a relatively low NRFI value even though the use of these products may be advisable from an environmental perspective(Reference Clark, Springmann and Rayner45).
We chose to use NRFI to examine how well we succeeded in classifying the data based on dietary quality(Reference Drewnowski23). There would have been other options to use for nutrient profiling, such as the Grocery Purchase Quality Index-2016(Reference Brewster, Durward and Hurdle24), which has been shown to associate with the Healthy Eating Index both on food group and total diet levels. There is also a scoring system developed for the QFAHPD to measure the overall quality of grocery purchases, which has been tested against the Healthy Eating Index(Reference Volpe and Okrent46). However, NFRI is well known and widely used in nutrient profiling and allows the examination of all food groups that could be connected to the food composition database. As pointed out above, we do not claim that NRFI would be any better than other profiling systems, and whatever is chosen will always affect the results(Reference Drewnowski and Fulgoni47). However, based on the profiling results, our classification was logical, meaning that food classes that are assumed to have relatively better nutritional quality, such as fruits and vegetables, got higher index values than foods considered to have low nutritional quality, such as sweets or chocolate. Further, our results imply that, on the more detailed levels, food classes became more homogeneous by their nutrient profiles.
Since we had the grocery purchase data on the grocery product group level, we had to select one food from the Finnish Food Composition Database to represent the nutrient content of that grocery product group. Again, since the limitation was that we did not have comprehensive knowledge on which type of grocery items were in some of the grocery product groups, the selected food from the composition database may have not always been the most optimal reference food. An improvement to this approach in the future could be selecting 3–5 of the most purchased foods that represent the grocery product group and that are also among the most consumed ones among the Finnish population and assigning the average nutrient values of those foods to represent the nutrient content of a grocery product group.
Last, the Finnish food retail market is very concentrated, since the two largest chains account for more than 80 % market share. The reclassification was based on data from a single food retailer. However, although there are some differences in how the product categories are designed and managed, the overall selection is very similar across the major food chains. Therefore, we have been working with a food selection that is representative of the entire Finnish food market and do not consider this as a major weakness of the LCFC. The selection of foods is likely to be different in other countries, but still the principles for grouping described in this paper apply.
Conclusion and recommendations
We have shown the multiple steps and amount of work needed to hierarchically classify grocery product groups for nutritional, health and environmental impact research. Based on nutrient profiling and using the NRFI, the nutritional quality of the LCFC was logical from a health viewpoint. The decrease of variation in nutritional quality in LCFC classes with higher granularity was reassuring and indicates good possibility to use the classification in studies linking food purchase data with health, environment, sociodemographic variables and expenditure (price). Hence, we have shown that even without brand-level information, food purchase data can be classified in a meaningful way.
Customer loyalty card data holds manyfold potential for enhancing understanding of individuals’ food purchase profiles and motives. Furthermore, it can contribute to enhanced means to design food systems that promote healthy food selection(Reference Muir, Dhuria and Roe48). Retail stores are core environments in such a food system, with a potential to promote both healthy and unhealthy food selection. The UK government has launched a ground-breaking new legislation to change retailers’ food marketing strategies, starting from October 2022(49). Loyalty card data have a clear and strong potential to evaluate the immediate and long-term effects of such a policy action. As also concluded by Clark et al. (Reference Clark, Shute and Jenneson40), loyalty card data offer exceptional possibilities for multiple aspects of research related to grocery food selection, with broad societal implications.
Acknowledgements
This study was funded by the Juho Vainio Foundation (grant to ME #202200480). The LoCard study is funded by Academy of Finland (research grant to MF and JN #350 852). Linking the purchase data with the food composition database was funded by Juho Vainio and Yrjö Jahnsson foundations (grants to JM #202100202 and #20207300, respectively).
N. K. did the analysis and had the main responsibility of writing and finalising the manuscript. S. K. had the main responsibility of the classification of the foods and linking the foods with a food composition database. M. E., H. V., J. M. and M. F. were part of the group of nutrition experts contributing to the creation of the classification method. J. M. supervised the linking of the purchase and food composition data. Further, M. F., J. N., H. S. and M. E. participated in data acquisition and curation, project administration and obtaining resources. All authors participated in commenting and modifying the manuscript, and all have read and approved the final version of the manuscript.
M. F. is a member of the S Group societal responsibility advisory board. Membership does not include any sort of compensation. H. V. has received a fee from the S Group. The collaboration included offering professional advice to influencers and writing a blog post with regard to interpretation of the nutrition calculator in S Group’s mobile app. Other authors have nothing to declare.
Supplementary material
For supplementary material/s referred to in this article, please visit https://doi.org/10.1017/S0007114524000710