Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-18T08:01:30.920Z Has data issue: false hasContentIssue false

Comparison of cluster and principal component analysis techniques to derive dietary patterns in Irish adults

Published online by Cambridge University Press:  25 June 2008

Áine P. Hearty*
Affiliation:
Institute of Food and Health, Agriculture and Food Science Centre, University College Dublin, Belfield, Dublin 4, Ireland
Michael J. Gibney
Affiliation:
Institute of Food and Health, Agriculture and Food Science Centre, University College Dublin, Belfield, Dublin 4, Ireland
*
*Corresponding author: Á. P. Hearty, fax +353 17161147, email [email protected]
Rights & Permissions [Opens in a new window]

Abstract

The aims of the present study were to examine and compare dietary patterns in adults using cluster and factor analyses and to examine the format of the dietary variables on the pattern solutions (i.e. expressed as grams/day (g/d) of each food group or as the percentage contribution to total energy intake). Food intake data were derived from the North/South Ireland Food Consumption Survey 1997–9, which was a randomised cross-sectional study of 7 d recorded food and nutrient intakes of a representative sample of 1379 Irish adults aged 18–64 years. Cluster analysis was performed using the k-means algorithm and principal component analysis (PCA) was used to extract dietary factors. Food data were reduced to thirty-three food groups. For cluster analysis, the most suitable format of the food-group variable was found to be the percentage contribution to energy intake, which produced six clusters: ‘Traditional Irish’; ‘Continental’; ‘Unhealthy foods’; ‘Light-meal foods & low-fat milk’; ‘Healthy foods’; ‘Wholemeal bread & desserts’. For PCA, food groups in the format of g/d were found to be the most suitable format, and this revealed four dietary patterns: ‘Unhealthy foods & high alcohol’; ‘Traditional Irish’; ‘Healthy foods’; ‘Sweet convenience foods & low alcohol’. In summary, cluster and PCA identified similar dietary patterns when presented with the same dataset. However, the two dietary pattern methods required a different format of the food-group variable, and the most appropriate format of the input variable should be considered in future studies.

Type
Full Papers
Copyright
Copyright © The Authors 2008

Traditionally, nutrition research has been focused on the detailed examination of nutrients and dietary components. However, in more recent years, public health nutrition has seen a marked move from research at the nutrient level to the food level, with the driving force of this movement attributed to the accepted concept that people eat foods not nutrients. This approach at the food level aims to facilitate a more holistic assessment of dietary behaviour, which therefore is more poised to provide tangible food-based dietary advice to the general public(Reference Tucker, Chen, Hannan, Cupples, Wilson, Felson and Kiel1). However, foods are consumed in many complex combinations and studies of individual foods can be difficult to interpret because of strong intercorrelations(Reference Randall, Marshall, Graham and Brasure2Reference Newby and Tucker4). Another major issue with this type of research is that dietary patterns cannot be measured directly. However, two fundamentally different analytical approaches have been developed for measuring dietary patterns in epidemiological studies: a priori and a posteriori methods. A priori methods explore the data using predefined combinations of foods in a dietary index, such as the healthy eating index(Reference Kennedy, Ohls, Carlson and Fleming5). A posteriori methods explore the available data post hoc by either factor or cluster analysis and produce dietary patterns in which nutritional variables (i.e. foods) are reduced to a smaller number of variables(Reference Newby and Tucker4).

Factor analysis, specifically principal component analysis (PCA), is a frequently used exploratory approach to identify dietary patterns in a population(Reference van Dam6, Reference Martinez, Marshall and Sechrest7). New dietary and food pattern variables are derived on the basis of the correlation matrix of the original food variables and individuals receive a factor score (principal components (PC) score) for each of the derived factors. In cluster analysis, dietary data are reduced to patterns based on individual differences in mean dietary intakes. Cluster analysis creates patterns that are mutually exclusive, as each subject in the present study can belong to only one cluster. Multiple clustering algorithms exist. Of these, k-means is probably the most popular non-hierarchical clustering method used in the literature.

Factor and cluster procedures have been compared in relation to their ability to predict disease risk in a few studies(Reference Newby, Muller and Tucker8, Reference Kant, Graubard and Schatzkin9), and they showed that the results from the dietary score and factor analysis techniques were comparable. Bamia et al. (Reference Bamia, Orfanos and Ferrari10) and Crozier et al. (Reference Crozier, Robinson, Borland and Inskip11) also compared cluster and PCA methods in relation to elderly Europeans and young British women, respectively. They both found close similarities between patterns derived using either method. However, Crozier et al. (Reference Crozier, Robinson, Borland and Inskip11) found only two meaningful dietary patterns (labelled as ‘Prudent’ and ‘High-energy’), and they felt that the dichotomous variable produced by cluster analysis was less informative than the continuous variable produced by PCA. As there are very few studies that have compared cluster and PCA, and as these few studies have been in specific population groups, there still exists the need to directly examine these two methods in a more general representative sample.

In the studies of dietary patterns, food variables should be selected so that the emerging patterns make sense from a dietary perspective. Various formats of the food-group variable can be utilised, and it has been advised that research is needed on how the treatment of dietary variables affects the dietary pattern solution(Reference Bailey, Gutschall, Mitchell, Miller, Lawrence and Smiciklas-Wright12).

Therefore, the aims of the present study were to examine and compare the dietary patterns in a representative sample of Irish adults using both the cluster and factor analyses and to examine the impact of the format of the dietary variables on the pattern solutions (i.e. expressed as either g/d of each food group or as the percentage contribution to total energy intake (%TE) from each food group).

Methods

Study sample details

The North/South Ireland Food Consumption Survey was a randomised cross-sectional study of food and nutrient intakes of a representative sample of adults aged 18–64 years from the Republic of Ireland and Northern Ireland between 1997 and 1999(Reference Kiely, Flynn, Harrington, Robson and Cran13, Reference Harrington, Robson, Kiely, Livingstone, Lambe and Gibney14). The dietary survey was completed by 1379 adults (662 males and 717 females; response rate was 63 %), and food intake was collected using a semi-quantitative 7 d food diary. Food and nutrient analysis was conducted using Weighed Intake Software Program (Tinuviel Software, Anglesey, UK). Weighed Intake Software Program uses data from McCance & Widdowson's ‘The Composition of Foods’(Reference Holland, Welch, Unwin, Buss, Paul and Southgate15) plus supplemental volumes to generate nutrient intake data.

Food groups

Food data were reduced to thirty-three food groups to ease the interpretation of cluster and factor components. For the amount of each food group consumed, only edible fraction weights of all foods were considered in the present analysis. Foods expressed as the dry-weight version were corrected to represent the amount as consumed. Two beverage food groups were created to differentiate between the types of beverages consumed in the studies: low energy (e.g. water, tea, coffee, sugar-free drinks, sugar-free squashes) and high energy (e.g. soft drinks, squashes). Whole milk, low-fat milk and fruit juices remained in separate food groups. For both cluster and PCA, food groups were expressed as either g/d intake or the %TE from each food group (Appendix table).

Identification of dietary patterns

Cluster analysis

Cluster analysis was performed using the k-means algorithm. This is based on geometric similarity, which gives a measure of the Euclidean distance from each record to the cluster centre, and from each cluster to the others. A series of steps were taken to select the most suitable number of clusters for the analysis. First, several runs were conducted with a varying number of clusters. For each run, cluster proximities for each cluster centre were examined and the number of iterations per each cluster was increased to ensure minimum error in cluster membership and that the model had converged to a solution. Clusters were also run without outliers to help find the best cluster solutions. Finally, the resulting clusters were examined for sensible patterns to establish the robustness of the clusters.

Principal component analysis

PCA was used to extract dietary factors on the basis of the correlation of the thirty-three food groups consumed. PC with eigenvalues of ≥ 1·5 were retained and the retained factors were orthogonally rotated by the varimax method, so that the factors were uncorrelated, making them easier to interpret. Each rotated PC was interpreted based on the food groups with loadings of ≥ 0·25 or ≤ − 0·25, which were considered as significantly contributing to a pattern(Reference Togo, Heitmann, Sorensen and Osler16). Factor scores were also saved for each PC for each respondent. The scores represent standardised variables with mean 0 and standard deviation 1.

Statistical analysis

SPSS® version 12 for Windows (SPSS® Inc., Chicago, IL, USA) was used for data manipulation and basic statistical analysis of the datasets. SPSS Clementine® version 9.0 (SPSS® Inc.) was used to conduct the data reduction analysis, and this software standardises all variables before analysis. Differences in the mean percentage contribution of each food group across clusters and the differences in the mean nutrient intake across clusters were evaluated using one-way ANOVA. One-way ANOVA was also used to test for significant differences in mean nutrients across quartiles of factor components. Where statistically different effects were encountered (P < 0·05), comparisons of mean nutrient intakes were made using Scheffe's post hoc multiple comparison test. For values that did not comply with Levene's test for homogeneity of variance, the Tamhane post hoc multiple comparison test was used(Reference Coakes and Steed17). Mean PC scores were also calculated and computed for each of the six cluster solutions.

In order to determine whether the clusters and PC of the dietary patterns found in the present study were comparable, binary logistic regression analysis was used. A model was constructed to predict the odds of being in quartile 4 (Q4) of each PC (dependent variable) based on the membership of one of the six clusters (independent variable). A binary variable ‘yes/no’ for being in Q4 of each PC was generated. Confounding factors (i.e. sex, age, social class and smoking) were also adjusted for and the model was run for different scenarios based on these confounders. OR, comparing the outcome of the dependent variable with the reference value, and the corresponding 95 % CI were calculated.

Results

Cluster analysis

Dietary patterns derived by cluster analysis using either format of the food-group variable were compared (Table 1). When the patterns were examined with the food-group variables expressed as g/d, five clusters were found to best represent dietary patterns. However, one of these clusters, representing 39 % of the sample (n 541), did not have any dominating food groups, making it difficult to interpret this major dietary pattern. However, when the food groups were expressed as %TE, a six-cluster solution was found to best represent dietary patterns and these clusters were fully interpretable based on their contributing food groups, so therefore these clusters were chosen for the rest of the analysis presented in the present paper. These clusters were labelled as ‘Traditional Irish’, ‘Continental’, ‘Unhealthy foods’, ‘Light-meal foods & low-fat milk’, ‘Healthy foods’ and ‘Wholemeal bread & desserts’. The differences in %TE per food group per cluster are depicted in Table 2. The mean daily nutrient intakes were also compared across clusters (Table 3). A summary of the cluster profiles is as follows.

Table 1 Comparison of the dietary patterns derived by cluster and principal component analysis (PCA) methods using two forms of the food-group variable, g/d and percentage contribution to daily energy intake

* k-Means algorithm used for the cluster analysis.

n is in relation to the frequency of subjects in clusters only; for solutions derived using the g/d format, there were five outliers, so the total sample size represented by these clusters was 1374.

All principal components described here had eigen values >1·5, and factor loadings for food groups were >0·25.

Table 2 The dietary profile of the six clusters as described by the percentage contribution of each food-group variable to total energy intake*

(Mean values and standard deviations)

a,b,c,d,e Unlike superscript letters denote significant differences between clusters at P < 0·05.

* Percentage intake of energy from each of the food groups.

Cluster 1, ‘Traditional Irish’; cluster 2, ‘Continental’; cluster 3, ‘Unhealthy foods’; cluster 4, ‘Light-meal foods and low-fat milk’; cluster 5, ‘Healthy foods’; cluster 6, ‘Wholemeal bread and desserts’.

Important food groups in the cluster.

Table 3 Comparison of daily nutrient intakes between the six clusters*

(Mean values and standard deviations)

TE, total energy contribution.

a,b,c,d,e Unlike superscript letters denote significant differences between clusters at P < 0·05.

* Cluster 1, ‘Traditional Irish’; cluster 2, ‘Continental’; cluster 3, ‘Unhealthy foods’; cluster 4, ‘Light-meal foods and low-fat milk’; cluster 5, ‘Healthy foods’; cluster 6, ‘Wholemeal bread and desserts’.

Cluster 1: ‘Traditional Irish’

This cluster was the most prevalent cluster (22 % of the sample). It is characterised by providing a relatively high %TE from white bread, whole milk, eggs, butter and spreads, potatoes, red meat and confectionery, which are foods consumed frequently as part of the Irish diet. This cluster was associated with having high-energy (relative to clusters 4 and 5) and added sugar intakes (relative to clusters 2, 4–6) and a high %TE from saturated fat (relative to clusters 2, 3–5).

Cluster 2: ‘Continental’

This cluster is characterised by providing a relatively high %TE from rice and pasta, savouries, cheese, red-meat dishes, poultry dishes, alcoholic beverages, savoury snacks and sauces and low intakes of potatoes and red meat. This cluster was associated with having a low-fibre intake (relative to clusters 5 and 6) and low %TE from carbohydrate (relative to clusters 1, 4–6).

Cluster 3: ‘Unhealthy foods’

This cluster is characterised by providing a relatively high %TE from chips, fruit juices, meat products, sugars and preserves, savoury snacks and high-energy beverages, which, apart from fruit juices, are usually considered as ‘Unhealthy’ foods. This cluster was associated with having a high %TE from fat (relative to clusters 4 and 5) and a low %TE from protein (relative to all other clusters). It was also found to have low calcium (relative to cluster 4), low folate (relative to clusters 5 and 6) and vitamin C (relative to cluster 5).

Cluster 4: ‘Light-meal foods & low-fat milk’

This cluster is characterised by providing a relatively high %TE from foods often associated with light meals, i.e. white bread, low-fat milk, cheese, sugars and preserves and soups. This cluster was associated with having a high intake of calcium (relative to clusters 1–3).

Cluster 5: ‘Healthy foods’

This was the smallest cluster (8·5 % of the population). It is characterised by providing a relatively high %TE from wholemeal bread, breakfast cereals, yogurts, low-fat spreads, vegetables, fruit, fish, poultry and low-energy beverages. These foods are generally associated with providing a healthy diet. This cluster had the lowest-energy intake, the lowest %TE from total fat and saturated fat, and the lowest intake of added sugars, while it also had the highest %TE from protein and carbohydrate than the other clusters. It also had a high intake of folate (relative to clusters 3 and 6) and a high-vitamin C intake (relative to cluster 3).

Cluster 6: ‘Wholemeal bread & desserts’

This cluster is characterised by providing a relatively high %TE from wholemeal bread, biscuits and cakes, and ice cream and desserts. This cluster had a high-fibre intake (relative to clusters 1–4).

Principal component analysis

When the format of the food-group variable was expressed as %TE, the dietary patterns derived using PCA were difficult to interpret as only a few foods were found to have high-factor loadings (i.e. >0·25) per PC (Table 1). However, when the food-group variable was expressed as g/d, many food groups had high-factor loadings, thus making the patterns easier to interpret and label. Therefore, the food groups based on the g/d format are used for the rest of the analysis presented in the present paper. For the g/d format, the extraction of eigenvalues >1·5 produced four dietary patterns, which explained 28·5 % of the total variance. These were labelled as ‘Unhealthy foods & high alcohol’, ‘Traditional Irish’, ‘Healthy foods’ and ‘Sweet convenience foods & low alcohol’ (Table 4). For each factor, quartiles of the total component weights were calculated and compared across nutrient intakes, and the highest quartile (Q4) for each is presented in Table 5. The profiles of each PC are as follows.

Table 4 Loading weights from each food group per extracted principal component (PC)*

* Extraction method: principal component analysis using varimax rotation.

Total sample components: PC 1, ‘Unhealthy foods and high alcohol’; PC 2, ‘Traditional Irish’; PC 3, ‘Healthy foods’; PC 4, ‘Sweet foods and breakfast cereal’.

Factor loadings are only displayed for values ≤ − 0·25 or ≥  0·25, some food groups are excluded as they did not load onto any factor retained.

Table 5 Daily nutrient intakes compared across highest quartile (Q4) of each principal component (PC)*

(Mean values and standard deviations)

TE, total energy contribution.

a,b,c Unlike superscript letters denote significant differences across quartiles at P < 0·05.

* PC 1, ‘Unhealthy foods and high alcohol’; PC 2, ‘Traditional Irish’; PC 3, ‘Healthy foods’; PC 4, ‘Sweet foods and breakfast cereal’.

PC 1: ‘Unhealthy foods & high alcohol’

This pattern was characterised by high loadings for high-energy beverages, chips, meat products, savoury snacks, sugars and preserves, sauces, alcoholic beverages and savouries, and negative loadings for low-energy beverages and breakfast cereals. In relation to Q4, this PC was found to be high in energy (relative to PC 3 and 4), have a high %TE from total fat (relative to all other PC) and saturated fat (relative to PC 3), have the highest intake of added sugars and have the lowest %TE from protein and carbohydrate. It was also found to be low in zinc (relative to PC 2) and have the lowest vitamin C of all PC.

PC 2: ‘Traditional Irish’

This pattern was characterised by high loadings for potatoes, red meat, confectionery, butter and spreads, white bread, wholemeal bread, whole milk, alcoholic beverages, and by negative loadings for savoury snacks, sugars and preserves, fruit juices, and rice and pasta. In relation to Q4, this PC was also found to be high in energy (relative to PC 3 and 4), calcium (relative to PC 4) and folate (relative to PC 1 and 4), and it had the highest zinc intake relative to the other PC.

PC 3: ‘Healthy foods’

This pattern was characterised by high loadings for vegetables, fruit, fish, fruit juices, sauces, yogurts, wholemeal bread and low-energy beverages. In relation to Q4, this PC was found to have a high %TE from carbohydrate (relative to PC 1 and 4) and a lower %TE from saturated fat than other PC. It was also low in added sugars (relative to PC 1 and 2) but high in Southgate fibre (relative to factors 1 and 4), iron (relative to PC 4) and vitamin C (relative to PC 1 and 2).

PC 4: ‘Sweet foods & breakfast cereal’

This pattern was characterised by high loadings for biscuits and cakes, ice cream and desserts, confectionery, breakfast cereals, yogurts, and by a negative loading for alcoholic beverages. This PC had a low-energy intake (relative to PC 1 and 2) and fibre intake (relative to PC 2 and 3), low iron (relative to PC 2 and 3), zinc (relative to PC 2) and folate intake (relative to PC 2 and 3), and it had also had the lowest calcium intake.

Comparison of clusters and principal components

Mean PC scores were calculated and computed across the six cluster solutions, and this is illustrated in Fig. 1. The most striking features of this are in relation to clusters 1, 3 and 5. Cluster 1 scored the highest for PC 2 and the lowest for PC 3, indicating that close similarities exist between the dietary pattern ‘Traditional Irish’ derived by both methods, and that it is most different from the ‘Healthy’ PC. Cluster 3 scored the highest for PC 1, indicating also that for the ‘Unhealthy’ pattern, both cluster and PCA derived very similar patterns. Cluster 5 scored the highest for PC 3 and the lowest for PC 1, again illustrating the close similarities for the ‘Healthy’ pattern (and also how dissimilar this pattern is for the ‘Unhealthy’ PC). Cluster 6 also shares many similarities with the ‘Healthy’ and ‘Traditional Irish’ PC, due to the intake of wholemeal bread in both PC. In relation to the other two clusters, mean PC scores were less striking, but for cluster 4, it appears that none of the factors had a positive score for this, indicating that this cluster explains a dietary pattern not revealed by factor analysis.

Fig. 1 Principal component (PC) score compared across the six clusters of dietary patterns. PC 1 (), ‘Unhealthy foods and high alcohol’; PC 2 (), ‘Traditional Irish’; PC 3 (), ‘Healthy foods’; PC 4 (), ‘Sweet foods and breakfast cereal’. Cluster 1, ‘Traditional Irish’; cluster 2, ‘Continental’; cluster 3, ‘Unhealthy foods’; cluster 4, ‘Light-meal foods and low-fat milk’; cluster 5, ‘Healthy foods’; cluster 6, ‘Wholemeal bread and desserts’.

In order to compare the cluster solutions and PCA, and to be able to quantify the relationship between them, a logistic regression model was run to predict Q4 of each principal component from each cluster with cluster 1 (‘Traditional Irish’) as the reference category (Table 6). The model was also adjusted for sex, age group, social class and smoking status, which were established as confounding factors. In relation to PC 1 (‘Unhealthy foods & high alcohol’), membership of cluster 3 (‘Unhealthy foods’) had the highest OR for predicting this (i.e. OR = 11·1). In relation to PC 2 (‘Traditional Irish’), all other clusters had lower odds of predicting it relative to the reference cluster. In relation to PC 3 (‘Healthy foods’), membership of cluster 5 (‘Healthy foods’) had the highest OR for predicting it (i.e. OR = 13·1). In relation to PC 4 (‘Sweet foods & breakfast cereal’), membership of cluster 2 (‘Continental)’ had the highest OR for predicting it (i.e. OR = 3·9).

Table 6 Prediction of highest quartile (Q4) of each factor from each of the six cluster solutions using binary logistic regression

(Odds ratios and 95% confidence intervals)

PC, principal component.

* PC 1, ‘Unhealthy foods and high alcohol’; PC 2, ‘Traditional Irish’; PC 3, ‘Healthy foods’; PC 4, ‘Sweet foods and breakfast cereal’.

Cluster 1, ‘Traditional Irish’; cluster 2, ‘Continental’; cluster 3, ‘Unhealthy foods’; cluster 4, ‘Light-meal foods and low-fat milk’; cluster 5, ‘Healthy foods’; cluster 6, ‘Wholemeal bread and desserts’.

Model is adjusted for sex, age group, social class and smoking.

Discussion

An insight into the patterns of food intake may contribute to the successful implementation of dietary changes. Although there is a wealth of literature in the area of dietary patterns, there exists few published studies that have directly compared these methods using the same study sample(Reference Bamia, Orfanos and Ferrari10, Reference Crozier, Robinson, Borland and Inskip11, Reference Costacou, Bamia, Ferrari, Riboli, Trichopoulos and Trichopoulou18). This is despite the fact that, when conducting dietary pattern analysis, probably the most important issue is choosing the most appropriate pattern analysis technique. There is also the need to clarify the issue of the appropriate format of the food-group variable, as different formats may impact on the dietary patterns derived, and thus make it difficult to compare results across different studies.

The most popular unsupervised or a posteriori methods of data reduction are cluster and PCA. Both of these methods are statistically quite different from each other. Whereas cluster analysis separates persons into mutually exclusive groups based on the differences in food intakes, factor analysis separates foods into groups based on correlations between foods and persons receiving a score for each of the derived factors. From the present study, it was found that both methods derived quite similar and directly comparable dietary patterns for three of the dietary patterns (i.e. ‘Traditional Irish’, ‘Healthy’ and ‘Unhealthy’). However, for cluster 4, there appeared to be no relationship with any of the patterns derived by PCA, based on the mean PCA score. This implies that PCA may not reveal all of the dietary patterns actually present in the dataset. This could be due to the subjective decisions applied in the present study (i.e. the eigenvalue cut-off), which resulted in choosing only the four PC presented here. Also, it has to be remembered that the four PC retained for further examination in the present study account for only 28·5 % of the total variance, therefore the possibility remains that other dietary patterns exist in the data. Schulze et al. (Reference Schulze, Hoffmann, Kroke and Boeing19) investigated the effect of PCA and variation in foods and nutrients associated with the dietary patterns in a sample of the European Prospective Investigation into Cancer and Nutrition-Potsdam cohort. They concluded that PC may explain food and nutrient intake quite differently, and therefore in some cases PCA may not uncover all dietary patterns in the dataset.

When the dietary patterns identified in the present study were compared using logistic regression analysis in order to quantify the relationships between them, it was also found that the ‘Unhealthy foods’, ‘Healthy foods’ and the ‘Traditional Irish’ clusters predicted the membership of similar patterns identified through PCA. Clusters 4 and 6 had OR of 2·1 and 2·9, respectively, for predicting the membership of PC 3 (‘Healthy’ pattern), which indicates that these clusters have properties similar to the ‘Healthy’ pattern, and that in the case for cluster 4, these were not picked up when the mean score alone was used. Cluster 2 had an OR of 3·9 for predicting the membership for PC 4, which supports the finding when the mean score was used. These results are similar to those found in a study by Costacou et al. (Reference Costacou, Bamia, Ferrari, Riboli, Trichopoulos and Trichopoulou18), who compared cluster analysis, PCA and a Mediterranean diet score in Greek adults, and they found that the Mediterranean dietary score closely predicted a Mediterranean dietary pattern derived by PCA.

Dietary patterns are inherently complex. Without more detailed analyses, they do not enable the specific identification of the particular dietary component within the pattern that may be responsible for the observed differences between population subgroups. Therefore, along with identifying the main patterns in the population sample, it is necessary to understand what these patterns mean, i.e. what foods are they high/low in, what nutrients are they high/low in and also to profile the people who consume them. The decision guiding the choice of dietary pattern technique should be construed according to reasons such as expertise of the research group and, perhaps most importantly, according to the format of the output required. Most published studies that have used either of these two methods have not provided a clear rationale as to their choice of one of these methods over another. When choosing between using either cluster or factor analysis, it should be appreciated that they each approach the data from a different prospective, and thus answer different questions. Cluster analysis examines whether or not there are groups in the population that are distinctly different from one another, and if so, what typifies their diets? Factor analysis explores whether there are underlying patterns that explain variation in how people eat(Reference Moeller, Reedy, Millen, Dixon, Newby, Tucker, Krebs-Smith and Guenther20). Factors identified also do not refer to identifiable groups within the population, and hence do not give an indication of the prevalence of a particular type of diet(Reference Kant21). However, the actual procedure for PCA is more straightforward and logical as cluster analysis places considerable burden on the user in terms of selecting and deciding on the appropriate number of clusters.

Aside from uncovering the dietary patterns in the present study sample, the other aim was to explore the methodological analysis of dietary patterns using different formats of the food variable as unfortunately there is no gold standard technique for this. The comparative evaluations of patterns derived from the two variable formats described in the present paper produced slightly different dietary patterns, and in this case it was found that the food-group variables expressed as %TE produced the most interpretable cluster solution, while food groups expressed in g/d produced the most interpretable principal components. In a recent study by Bailey et al. (Reference Bailey, Gutschall, Mitchell, Miller, Lawrence and Smiciklas-Wright12), they reported that the most appropriate variable for providing interpretable and relevant cluster solutions in their study was the number of servings from food subgroups. Therefore, it should be advised that for all dietary patterning methods, a clear rationale for the format of the dietary variable should be established before analysis in order to identify the most relevant pattern solutions in a particular dataset.

The most beneficial information that can be gathered from the dietary pattern analyses is on what foods in combination are culturally acceptable to the population. Hypothetical ‘ideal’ diets, despite their advantageous health implications to a population, are relatively useless unless they can be incorporated into the culture of the society. It appears from the present study that there are three main dietary patterns that dominate irrespective of the analytical methods used. These are an ‘Unhealthy’, a ‘Traditional Irish’ and a ‘Healthy’ pattern. This coincides with a recent paper which found, from a comprehensive literature review of articles using factor and cluster analyses, that ‘Healthy’, ‘Traditional’ and ‘Sweets’ patterns are fairly reproducible across populations(Reference Moeller, Reedy, Millen, Dixon, Newby, Tucker, Krebs-Smith and Guenther20). The dietary patterns observed in the present study are also similar to those found in a previous study on a sample (non-representative) of Irish men and women(Reference Villegas, Salim, Collins, Flynn and Perry22). They found three clusters in their sample of Irish adults: a ‘Traditional diet’; a ‘Prudent diet’; an ‘Alcohol & convenience food’ cluster. However, using cluster analysis, the present study also found evidence for two other dietary patterns, i.e. a ‘Continental’ and a ‘Light-meal & low-fat milk’ pattern, which may be a particular feature of this Irish population alone.

In general, the majority of studies in the literature, which have examined dietary patterns in different population groups, have revealed two predominant and opposing patterns, often referred to as a ‘Prudent’ and a ‘Western’ dietary pattern(Reference Lopez-Garcia, Schulze, Fung, Meigs, Rifai, Manson and Hu23Reference Gao, Chen, Fung, Logroscino, Schwarzschild, Hu and Ascherio27). The cluster and factor patterns labelled as ‘Healthy foods’ in the present study correspond quite closely to the ‘Prudent’ pattern, and elements of both the ‘Unhealthy foods’ and the ‘Traditional Irish’ clusters and from the ‘Unhealthy foods & high alcohol’ and the ‘Traditional Irish’ factors were similar to the ‘Western’ pattern. As would be suggested from the label, the ‘Healthy’ patterns represent the ideal diet, however only 8·5 % of the population were found to exist in this cluster. This indicates that there is much room for growth into this dietary pattern, but also, most importantly, these results provide evidence that a healthy eating pattern is a part of Irish culture (if somewhat relatively small), and that recommendations based on this eating pattern should be acceptable to the population. The ‘Traditional Irish’ pattern, which was the most dominant cluster, had the highest energy, fat and sugar intakes. Changing this dietary pattern to one of a more ‘Healthy’ pattern provides a realistic but difficult challenge due to the cultural position of this pattern in the Irish society. Information concerning non-use of foods may prove to be useful, particularly for the design of successful nutrition intervention programmes. From both the cluster and factor analyses, it was found that with regard to the ‘Healthy’ and ‘Unhealthy’ patterns, a high consumption of ‘Healthy’ foods was associated with a low consumption of ‘Unhealthy’ foods and vice versa.

Although cluster and factor methods are data driven and may be considered objective because they are conducted a posteriori, the analytical process is filled with subjective decisions. In the k-means cluster algorithm, the user needs to predefine the number of cluster solutions, which can force the data into unrealistic clusters. To prevent this potential pitfall, many scenarios with varying number of clusters are required and cluster solutions need to be compared and contrasted. In the PCA process, the user is also faced with many subjective decisions, such as the criteria used to determine how many factors to extract (e.g. the eigenvalue cut-off), and the method with which to rotate the selected factors. The majority of studies using PCA to derive dietary patterns in the literature have used criteria such as selecting PC based on eigenvalues of >1·0–1·5, orthogonal rotation and factor loadings of the food groups of >0·2 for pattern interpretation(Reference Michels and Schulze28). Therefore, the present paper followed similar criteria. Schulze et al. (Reference Schulze, Hoffmann, Kroke and Boeing29) described an approach to calculate factor scores in the form of a sum of the standardised value of the six most loading foods on each factor. These simplified measures are proposed as an approach to overcome the limitation of factor analyses in relation to their non-comparable risk estimates over different populations and studies. Another method of overcoming the reliance on subjective criteria is to use confirmatory PCA. In confirmatory PCA, only food groups with factor loadings above a defined cut-off are retained to measure a food pattern in order to isolate the ‘core’ of the pattern(Reference Newby, Weismayer, Akesson, Tucker and Wolk30). However, only a few studies have used confirmatory PCA in nutritional epidemiology(Reference Togo, Heitmann, Sorensen and Osler16, Reference Maskarinec, Novotny and Tasaki31Reference Weismayer, Anderson and Wolk34). Other subjective decisions revolve about selecting the food groups, and even the manner in which the clusters and factors are ultimately labelled is also based on subjective criteria and is liable to different interpretations(Reference Martinez, Marshall and Sechrest7, Reference Tseng, Breslow, DeVellis and Ziegler33, Reference McCann, Weiner, Graham and Freudenheim35Reference Newby, Muller, Hallfrisch, Andres and Tucker39).

In summary, the present study has shown that cluster and PCA, although statistically different methods, identify similar dietary patterns when presented with the same dataset, and that these patterns are directly comparable. However, caution needs to be applied to the subjective decisions involved when using PCA, as this can have a direct impact on the number and type of dietary patterns revealed in the data. Also, both methods in the present study required a different format of the food-group variable, and therefore consideration of the input variable should always be given separately for each method. In terms of further analysis of the dietary pattern variables generated, cluster analysis produces mutually exclusive groups that are amenable to profiling, thus offering this method considerable advantage over the other. However, as stated earlier, each research study will have its own specific hypotheses and aims, and it is these that should ultimately drive the choice of the dietary pattern analytical technique.

Appendix Table Intake of each food group for the total population expressed as g/d or as percentage contribution to total energy intake (% TE)

(Mean values and standard deviations)

* Intakes of all foods considered as edible fraction only.

Acknowledgements

We would like to acknowledge the Irish Department of Agriculture, Fisheries and Food for funding the present analysis. A. P. H. carried out the analysis for the present paper and drafted the manuscript. M. J. G. provided guidance for the analysis and reviewed the final drafts of the manuscript. The authors report no conflicts of interest.

References

1Tucker, KL, Chen, H, Hannan, MT, Cupples, LA, Wilson, PW, Felson, D & Kiel, DP (2002) Bone mineral density and dietary patterns in older adults: the Framingham osteoporosis study. Am J Clin Nutr 76, 245252.CrossRefGoogle ScholarPubMed
2Randall, E, Marshall, JR, Graham, S & Brasure, J (1990) Patterns in food use and their associations with nutrient intakes. Am J Clin Nutr 52, 739745.Google Scholar
3Hu, FB, Rimm, EB, Stampfer, MJ, Ascherio, A, Spiegelman, D & Willett, WC (2000) Prospective study of major dietary patterns and risk of coronary heart disease in men. Am J Clin Nutr 72, 912921.Google Scholar
4Newby, PK & Tucker, KL (2004) Empirically derived eating patterns using factor or cluster analysis: a review. Nutr Rev 62, 177203.CrossRefGoogle ScholarPubMed
5Kennedy, ET, Ohls, J, Carlson, S & Fleming, K (1995) The healthy eating index: design and applications. J Am Diet Assoc 95, 11031108.Google Scholar
6van Dam, RM (2005) New approaches to the study of dietary patterns. Br J Nutr 93, 573574.Google Scholar
7Martinez, ME, Marshall, JR & Sechrest, L (1998) Invited Commentary: factor analysis and the search for objectivity. Am J Epidemiol 148, 1719.CrossRefGoogle ScholarPubMed
8Newby, PK, Muller, D & Tucker, KL (2004) Associations of empirically derived eating patterns with plasma lipid biomarkers: a comparison of factor and cluster analysis methods. Am J Clin Nutr 80, 759767.CrossRefGoogle ScholarPubMed
9Kant, AK, Graubard, BI & Schatzkin, A (2004) Dietary patterns predict mortality in a national cohort: the National Health Interview Surveys, 1987 and 1992. J Nutr 134, 17931799.Google Scholar
10Bamia, C, Orfanos, P, Ferrari, P, et al. (2005) Dietary patterns among older Europeans: the EPIC-elderly study. Br J Nutr 94, 100113.Google Scholar
11Crozier, SR, Robinson, SM, Borland, SE, Inskip, HM & SWS Study Group (2006) Dietary patterns in the Southampton Women's Survey. Eur J Clin Nutr 60, 13911399.Google Scholar
12Bailey, RL, Gutschall, MD, Mitchell, DC, Miller, CK, Lawrence, FR & Smiciklas-Wright, H (2006) Comparative strategies for using cluster analysis to assess dietary patterns. J Am Diet Assoc 106, 11941200.CrossRefGoogle ScholarPubMed
13Kiely, M, Flynn, A, Harrington, KE, Robson, PJ & Cran, G (2001) Sampling description and procedures used to conduct the North/South Ireland Food Consumption Survey. Public Health Nutr 4, 10291035.Google Scholar
14Harrington, KE, Robson, PJ, Kiely, M, Livingstone, MB, Lambe, J & Gibney, MJ (2001) The North/South Ireland Food Consumption Survey: survey design and methodology. Pub Health Nutr 4, 10371042.Google Scholar
15Holland, B, Welch, AA, Unwin, ID, Buss, DH, Paul, AA & Southgate, DAT (1995) McCance & Widdowson's the Composition of Foods. London: HMSO.Google Scholar
16Togo, P, Heitmann, BL, Sorensen, TI & Osler, M (2003) Consistency of food intake factors by different dietary assessment methods and population groups. Br J Nutr 90, 667678.Google Scholar
17Coakes, SJ & Steed, LG (1999) SPSS without Anguish Versions 7.0, 7.5, 8.0 for Windows. Brisbane: John Wiley and Sons.Google Scholar
18Costacou, T, Bamia, C, Ferrari, P, Riboli, E, Trichopoulos, D & Trichopoulou, A (2003) Tracing the Mediterranean diet through principal components and cluster analyses in the Greek population. Eur J Clin Nutr 57, 13781385.Google Scholar
19Schulze, MB, Hoffmann, K, Kroke, A & Boeing, H (2001) Dietary patterns and their association with food and nutrient intake in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study. Br J Nutr 85, 363373.Google Scholar
20Moeller, SM, Reedy, J, Millen, AE, Dixon, LB, Newby, PK, Tucker, KL, Krebs-Smith, SM & Guenther, PM (2007) Dietary patterns: challenges and opportunities in dietary patterns research an experimental biology workshop, April 1, 2006. J Am Diet Assoc 107, 12331239.Google Scholar
21Kant, AK (2004) Dietary patterns and health outcomes. J Am Diet Assoc 104, 615635.Google Scholar
22Villegas, R, Salim, A, Collins, MM, Flynn, A & Perry, IJ (2004) Dietary patterns in middle-aged Irish men and women defined by cluster analysis. Pub Health Nutr 7, 10171024.Google Scholar
23Lopez-Garcia, E, Schulze, MB, Fung, TT, Meigs, JB, Rifai, N, Manson, JE & Hu, FB (2004) Major dietary patterns are related to plasma concentrations of markers of inflammation and endothelial dysfunction. Am J Clin Nutr 80, 10291035.Google Scholar
24Fung, TT, Hu, FB, Holmes, MD, Rosner, BA, Hunter, DJ, Colditz, GA & Willett, WC (2005) Dietary patterns and the risk of postmenopausal breast cancer. Int J Cancer 116, 116121.CrossRefGoogle ScholarPubMed
25Michaud, DS, Skinner, HG, Wu, K, Hu, F, Giovannucci, E, Willett, WC, Colditz, GA & Fuchs, CS (2005) Dietary patterns and pancreatic cancer risk in men and women. J Natl Cancer Inst 97, 518524.CrossRefGoogle ScholarPubMed
26Wu, K, Hu, FB, Willett, WC & Giovannucci, E (2006) Dietary patterns and risk of prostate cancer in US men. Cancer Epidemiol Biomarkers Prev 15, 167171.Google Scholar
27Gao, X, Chen, H, Fung, TT, Logroscino, G, Schwarzschild, MA, Hu, FB & Ascherio, A (2007) Prospective study of dietary pattern and risk of Parkinson disease. Am J Clin Nutr 86, 14861494.Google Scholar
28Michels, KB & Schulze, MB (2005) Can dietary patterns help us detect diet-disease associations? Nutr Res Rev 18, 241248.Google Scholar
29Schulze, MB, Hoffmann, K, Kroke, A & Boeing, H (2003) Risk of hypertension among women in the EPIC-Potsdam study: comparison of relative risk estimates for exploratory and hypothesis-oriented dietary patterns. Am J Epidemiol 158, 365373.Google Scholar
30Newby, PK, Weismayer, C, Akesson, A, Tucker, KL & Wolk, A (2006) Longitudinal changes in food patterns predict changes in weight and body mass index and the effects are greatest in obese women. J Nutr 136, 25802587.Google Scholar
31Maskarinec, G, Novotny, R & Tasaki, K (2000) Dietary patterns are associated with body mass index in multiethnic women. J Nutr 130, 30683072.CrossRefGoogle ScholarPubMed
32Togo, P, Osler, M, Sorensen, TI & Heitmann, BL (2004) A longitudinal study of food intake patterns and obesity in adult Danish men and women. Int J Obes Rel Metabol Disord 28, 583593.Google Scholar
33Tseng, M, Breslow, RA, DeVellis, RF & Ziegler, RG (2004) Dietary patterns and prostate cancer risk in the National Health and Nutrition Examination Survey epidemiological follow-up study cohort. Cancer Epidemiol Biomarkers Prev 13, 7177.Google Scholar
34Weismayer, C, Anderson, JG & Wolk, A (2006) Changes in the stability of dietary patterns in a study of middle-aged Swedish women. J Nutr 136, 15821587.Google Scholar
35McCann, SE, Weiner, J, Graham, S & Freudenheim, JL (2001) Is principal components analysis necessary to characterise dietary behaviour in studies of diet and disease? Pub Health Nutr 4, 903908.Google Scholar
36Terry, P, Hu, FB, Hansen, H & Wolk, A (2001) Prospective study of major dietary patterns and colorectal cancer risk in women. Am J Epidemiol 154, 11431149.Google Scholar
37Osler, M, Helms, Andreasen A, Heitmann, B, Hoidrup, S, Gerdes, U, Morch Jorgensen, L & Schroll, M (2002) Food intake patterns and risk of coronary heart disease: a prospective cohort study examining the use of traditional scoring techniques. Eur J Clin Nutr 56, 568574.Google Scholar
38Dixon, LB, Balder, HF, Virtanen, MJ, et al. (2004) Dietary patterns associated with colon and rectal cancer: results from the dietary patterns and cancer (DIETSCAN) project. Am J Clin Nutr 80, 10031011.Google Scholar
39Newby, PK, Muller, D, Hallfrisch, J, Andres, R & Tucker, KL (2004) Food patterns measured by factor analysis and anthropometric changes in adults. Am J Clin Nutr 80, 504513.Google Scholar
Figure 0

Table 1 Comparison of the dietary patterns derived by cluster and principal component analysis (PCA) methods using two forms of the food-group variable, g/d and percentage contribution to daily energy intake

Figure 1

Table 2 The dietary profile of the six clusters as described by the percentage contribution of each food-group variable to total energy intake*†(Mean values and standard deviations)

Figure 2

Table 3 Comparison of daily nutrient intakes between the six clusters*(Mean values and standard deviations)

Figure 3

Table 4 Loading weights from each food group per extracted principal component (PC)*

Figure 4

Table 5 Daily nutrient intakes compared across highest quartile (Q4) of each principal component (PC)*(Mean values and standard deviations)

Figure 5

Fig. 1 Principal component (PC) score compared across the six clusters of dietary patterns. PC 1 (), ‘Unhealthy foods and high alcohol’; PC 2 (), ‘Traditional Irish’; PC 3 (), ‘Healthy foods’; PC 4 (), ‘Sweet foods and breakfast cereal’. Cluster 1, ‘Traditional Irish’; cluster 2, ‘Continental’; cluster 3, ‘Unhealthy foods’; cluster 4, ‘Light-meal foods and low-fat milk’; cluster 5, ‘Healthy foods’; cluster 6, ‘Wholemeal bread and desserts’.

Figure 6

Table 6 Prediction of highest quartile (Q4) of each factor from each of the six cluster solutions using binary logistic regression(Odds ratios and 95% confidence intervals)

Figure 7

Appendix Table Intake of each food group for the total population expressed as g/d or as percentage contribution to total energy intake (% TE)(Mean values and standard deviations)