Hostname: page-component-586b7cd67f-t8hqh Total loading time: 0 Render date: 2024-11-27T19:18:10.206Z Has data issue: false hasContentIssue false

Missing portion sizes in FFQ – alternatives to use of standard portions

Published online by Cambridge University Press:  10 November 2014

Rasmus Køster-Rasmussen*
Affiliation:
The Research Unit for General Practice and Section of General Practice, Department of Public Health, University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark Clinical Institute, University of Southern Denmark, Odense, Denmark
Volkert Siersma
Affiliation:
The Research Unit for General Practice and Section of General Practice, Department of Public Health, University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
Thorhallur I Halldorsson
Affiliation:
Faculty of Food Science and Nutrition, School of Health Sciences, University of Iceland, Reykjavik, Iceland Centre for Fetal Programming, Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
Niels de Fine Olivarius
Affiliation:
The Research Unit for General Practice and Section of General Practice, Department of Public Health, University of Copenhagen, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
Jan E Henriksen
Affiliation:
Clinical Institute, University of Southern Denmark, Odense, Denmark Department of Endocrinology, Odense University Hospital, Odense, Denmark
Berit L Heitmann
Affiliation:
Institute of Preventive Medicine, Capital Region, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark The Boden Institute of Obesity, Nutrition, Exercise & Eating Disorders, University of Sydney, Sydney, New South Wales, Australia National Institute of Public Health, University of Southern Denmark, Odense, Denmark
*
*Corresponding author: Email [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Objective

Standard portions or substitution of missing portion sizes with medians may generate bias when quantifying the dietary intake from FFQ. The present study compared four different methods to include portion sizes in FFQ.

Design

We evaluated three stochastic methods for imputation of portion sizes based on information about anthropometry, sex, physical activity and age. Energy intakes computed with standard portion sizes, defined as sex-specific medians (median), or with portion sizes estimated with multinomial logistic regression (MLR), ‘comparable categories’ (Coca) or k-nearest neighbours (KNN) were compared with a reference based on self-reported portion sizes (quantified by a photographic food atlas embedded in the FFQ).

Setting

The Danish Health Examination Survey 2007–2008.

Subjects

The study included 3728 adults with complete portion size data.

Results

Compared with the reference, the root-mean-square errors of the mean daily total energy intake (in kJ) computed with portion sizes estimated by the four methods were (men; women): median (1118; 1061), MLR (1060; 1051), Coca (1230; 1146), KNN (1281; 1181). The equivalent biases (mean error) were (in kJ): median (579; 469), MLR (248; 178), Coca (234; 188), KNN (−340; 218).

Conclusions

The methods MLR and Coca provided the best agreement with the reference. The stochastic methods allowed for estimation of meaningful portion sizes by conditioning on information about physiology and they were suitable for multiple imputation. We propose to use MLR or Coca to substitute missing portion size values or when portion sizes needs to be included in FFQ without portion size data.

Type
Research Papers
Copyright
Copyright © The Authors 2014 

FFQ are commonly used in large-scale nutritional epidemiology studies, but some FFQ do not have questions about portion sizes( Reference Osler, Heitmann and Gerdes 1 Reference Bazzano, He and Ogden 3 ). Details concerning portion sizes or missing portion size values are rarely accounted for in scientific publications, but when calculating the dietary intake from an FFQ, standard portion sizes are often applied.

The absence of portion size questions in an FFQ can be regarded as a missing data problem. Using standard portion sizes is methodologically equivalent to applying median portion sizes for all subjects. These may be sex-specific, but the size of portions depends on several other factors than sex such as age, BMI and physical activity( Reference Noethlings, Hoffmann and Bergmann 4 ). Hence, the standard portion size used may well be the same for a young physically active man as it is for an elderly sedentary man.

Substituting unknown portion sizes with standard sizes may thus under- or overestimate the ‘true’ intake in certain segments of the population( Reference Greenland and Finkle 5 Reference Rubin and Schenker 7 ). It is now well recognized that missing data are most rationally accounted for through multiple imputation techniques, rather than with deterministic imputations like medians, to avoid flawed (too narrow) confidence intervals( Reference Rubin 8 , Reference Sterne, White and Carlin 9 ). Multiple imputation requires an adequate method for imputation, i.e. a method with error and bias as low as possible.

In the present paper we describe how physiologically meaningful portion sizes can be estimated from information on age, sex, physical activity, weight and height by imputation from participants with complete data or from another FFQ data set with portion sizes (from a comparable population). We invented the ‘comparable categories’ method (Coca) and improved the ‘k-nearest neighbours’ (KNN) and the multinomial regression (MLR) methods by making them suitable for multiple imputation. The basic idea of these advanced imputation methods is that instead of using a median value for substituting missing data, one may condition on other information available in the data set to better estimate a reasonable portion size.

In the present study the dietary intake computed with standard portion sizes (the sex-specific median values), or with portion sizes determined by the MLR, Coca or KNN method, was compared with a reference dietary intake, which was computed with the originally self-reported portion sizes that were quantified by a photographic food atlas embedded in the FFQ.

Experimental methods

The Danish Health Examination Survey collected dietary data from 18 065 adult Danes in 2007–2008 using an Internet-based, 267-item FFQ( Reference Eriksen, Gronbaek and Helge 10 ). This diet inventory has been used in many Danish population studies( Reference Tjonneland, Haraldsdottir and Overvad 2 , 11 ). In the Danish Health Examination Survey, the FFQ was extended with a photographic food atlas consisting of eleven picture series placed at the end of the questionnaire in order to quantify the portion sizes( 11 ). The portion size food atlas was developed by the Danish Veterinary and Food Administration. The picture series covered thirty-nine items (foods or meals) classified into four or six portions of varying sizes. For instance, six photos showed increasing serving sizes of corn flakes in a bowl and the accompanying portion size item was used to quantify all cereal frequency items (muesli, etc.). Another series with six photos of increasing serving sizes of a meat main meal was accompanied by five portion size items covering hamburger steak, steak, beef, fish or poultry. The remaining series of photographs covered bread, toppings for rye bread (eight items), toppings for white bread (eight items), warm stew with meat (three items), potatoes (four items), pasta, rice, vegetable dishes (four items), mixed salad, chocolate and candy. The actual weight in grams of the food on the picture was multiplied with the frequency to obtain the total intake of the food. Leisure-time physical activity was self-reported with the International Physical Activity Questionnaire in four classes, where class 1 was hard training multiple times per week and class 4 was inactive behaviour( Reference Ekelund, Sepp and Brage 12 ). We defined classes 1+2 as active and classes 3+4 as sedentary. Anthropometric measures were obtained by clinical examination in 9384 subjects. The present study population consisted of the 3728 subjects with complete information on anthropometry and portion sizes (no missing values). The characteristics of the study participants are described in Table 1. The involved institutions’ review boards have approved the study proposal.

Table 1 Characteristics of the subjects with complete portion size data, included in the present study, compared with the excluded subjects with incomplete portion size data, Danish Health Examination Survey 2007–2008

Statistical methods

We analysed four methods of imputing portion size. The subjects were randomly divided (SAS procedure: proc surveyselect) into two data sets: (i) a learning data set A (n 1864) for generating data for imputation; and (ii) a test data set B (n 1864) for analysing the validity of the imputed data. For data set B the ‘mean daily total energy intake’ (TE) was computed with the complete set of authentic self-reported portion sizes and this TE served as the reference.

The population sex-specific medians were used as standard portion sizes. With each of the three stochastic imputation methods, we imputed portion sizes from data set A to data set B and used these estimated portion sizes to compute a new TE. This was done ten times (on different splits of the data) and subsequently ten TE values were computed with each imputation method.

The mean TE from each imputation method was then compared with the reference TE by determining the bias (defined as the mean error) and the root-mean-square error (RMSE). In the present paper the ‘error’ is defined as the reference value minus the estimated value. Spearman’s ρ was used to compare the ranking of the subjects, comparing the reference TE with the TE calculated with imputed portion sizes. T statistics were used to determine the bias in TE related to TE (Fig. 1). Energy and nutrient intakes were computed with FoodCalc® ( 13 ) and the Danish national food composition tables( 14 ).

Fig. 1 Total energy intake (TE) computed with the reference portion sizes (x-axis) is plotted against the difference between the reference TE and the TE computed with the portion sizes from each imputation method (y-axis): (a) median imputation in men (B=0·15, se=0·008, T=17·7); (b) MLR imputation in men (B=0·10, se=0·010, T=9·5); (c) KNN imputation in men (B=0·04, se =0·013, T=2·7); (d) Coca imputation in men (B=0·11, se=0·013, T=8·5); (e) median imputation in women (B=0·16, se=0·010, T=17·1); (f) MLR imputation in women (B=0·11, se=0.011, T=10·5); (g) KNN imputation in women (B=0·12, se=0·013, T=9·4); (h) Coca imputation in women (B=0·11, se=0·012, T=8·8). In this variation of a Bland–Altman plot, the x-axis denotes the reference value (and not the mean) as the error pertains solely to the imputed measure. The horizontal lines denote zero, the mean difference, +2 sd and −2 sd. B=the slope of a regression line: y=Bx+c. T=B/se; thus T denotes the tendency to underestimate portion sizes in subjects with high TE (and the reverse). High values of T denote stronger tendencies; the significance is implicit as T>1·95 implies P<0·05. Note: a positive value on the y-axis indicates an underestimation of the reference energy intake (imputation method: median, standard portion sizes, defined as sex-specific medians; MLR, multinomial logistic regression; KNN, k-nearest neighbours; Coca, ‘comparable categories’)

The four imputation methods were:

  1. 1. The ‘median’ method or ‘standard portion sizes’. Imputation of median values is equivalent to applying a standard portion size as it implies uniform portion sizes for all subjects (here thirty-nine medians, one for each of the thirty-nine portion size items). In this model we used the sex-specific median values from the entire sample (from data sets A+B) to define thirty-nine sex-specific standard portion sizes in data set B (using the sex-specific median from data set A only would induce bias as explained in the online supplementary material, chapter 4). Based on earlier reports and physiological reasoning we hypothesized that portion sizes depend on age, sex, physical activity, weight and height( Reference Noethlings, Hoffmann and Bergmann 4 , Reference Clapp, McPherson and Reed 6 ). Individual data on these five variables are readily available in most epidemiological studies and they informed the following three, more advanced imputation methods that are all based on stochastic principles:

  2. 2. The ‘comparable categories’ (Coca) method. The subjects were divided into thirty-two categories. Supplemental Table S1 in the online supplementary material demonstrates how the categories were created by first dividing the subjects by level of physical activity (into active or sedentary), then dichotomized on approximate median values of height (166 cm), then divided by sex, split on rough median values of weight (74 kg) and age (48 years). Each of these categories contains individuals sharing approximately the same physiological characteristics, e.g. in category 13 everyone was sedentary, >166 cm, female, <74 kg and <48 years. For each subject in data set B, the portion sizes were substituted by a complete set of portion sizes from one random subject in the ‘comparable category’ in data set A.

  3. 3. The ‘k-nearest neighbours’ (KNN) method( Reference Parr, Hjartaker and Scheel 15 ). A missing portion size in data set B was substituted by a random value from the k (a predefined number) most similar observations (‘neighbours’) in data set A. The similarity is defined as the proximity measured by Euclidean distance between the informing variables (here age, sex, physical activity, weight and height). While traditional KNN would impute the portion size most prevalent among the k neighbours, our version of KNN imputed a random value among the k neighbours with probability proportional to the proximity, making it suitable for multiple imputation. k>20 yielded no extra accuracy.

  4. 4. The ‘multinomial logistic regression’ (MLR) method. MRL models were constructed based on data set A: age, weight and height were continuous covariates, sex and physical activity were categorical covariates, and the portion sizes were the categorical outcomes. Portion sizes in data set B were determined by probability sampling from the prevalence of the categorical portion size values obtained by inserting the data set B values for age, weight, height, sex and level of physical activity in the regression model.

The set-up was run in the SAS statistical software package version 9·2, but the methods can be applied on any type of software. SAS codes for KNN, MLR, Coca and a wrapper for (linear) regression analysis combining the results from multiple imputed (by any method) data sets are given in the online supplementary material.

Results

More women than men participated in the Danish Health Examination Survey. The subjects included in the present study were a little younger than the excluded subjects. Furthermore, the included men were more active and the included women were slightly heavier. However, differences were numerically small (Table 1).

Overall, compared with the reference energy intakes, the RMSE were equally low with the median and MLR methods, and equally high with Coca and KNN. The bias of the median method was numerically larger than in any of the other methods (Table 2). KNN had a negative bias in men (overestimating the portion sizes), but a positive bias in women (underestimating the portion sizes). The biases of MLR and Coca were equally low in both men and women.

Table 2 Mean daily energy intake among 3728 adults with complete portion size data (reference), compared with energy intakes calculated with portion sizes derived from four imputation methods, Danish Health Examination Survey 2007–2008

RMSE, root-mean-square error; bias, mean error; median, sex-specific median imputation which is equivalent to using sex-specific standard portion sizes; Coca, ‘comparable categories’; KNN, k-nearest neighbours; MLR, multinomial logistic regression; Ref., referent category.

The four methods were compared by their ability to predict the reference. The reference energy intakes were computed with a set of complete reported portion sizes. The results presented are mean values of ten imputations with each method (on random splits of the data). Note that a positive bias indicates an underestimation of the reference and a negative bias indicates an overestimation.

More results are presented in the online supplementary material (Supplemental Table S2), including ‘non sex-specific’ standard portion sizes and different versions of Coca (with different informing variables and less categories). Results with selected micronutrients and macronutrient subtypes were essentially similar to the analyses of macronutrients (results not shown).

All of the methods had high Spearman’s rank correlation, but median and MLR imputation performed slightly better than KNN and Coca. All correlations were >0·90 and all confidence intervals between 0·89 and 0·97 (see online supplementary material, Supplemental Table S3).

Figure 1 illustrates how all methods resulted in a bias of TE dependent on TE, i.e. an underestimation of TE in subjects with a high energy intake and an overestimation of TE in subjects with a low energy intake. The magnitude of this bias (the T value) was markedly higher with median imputation than with the other methods. Figure 2 shows that when stratifying by BMI group, age group and physical activity class, a larger variation was seen among men than women regarding the accuracy of the imputation methods. The mean total energy intake was 12·5 MJ calculated with maximum portion sizes for all and 7·5 MJ with minimum portion sizes for all. Thus, up to 40 % of the calculated energy intake was potentially determined by the portion sizes. However, Fig. 2 indicates that the mean energy intakes calculated differed by up to 2 MJ (18 %) in men between the methods and by to 0·75 MJ (9 %) in women.

Fig. 2 Mean daily total energy intake is plotted against BMI (a, b), age (c, d) and level of physical activity (e, f), separately for men (a, c, e) and women (b, d, f). The reference (——) is computed with the originally reported portion sizes. The total energy intake has been computed with portion sizes determined by four different imputation methods: —□—, median (equivalent to sex-specific standard portions); —L—, MLR (multinomial logistic regression); —○—, Coca (‘comparable categories’); ——, KNN (k-nearest neighbours). The results presented are mean values of ten imputations with each method (on random splits of the data)

Discussion

Overall, the MLR method provided the best agreement with the reference dietary intake. However, the differences between the stochastic methods were small and the confidence intervals of the bias in MLR and Coca were overlapping in most segments of the data. In MLR and Coca the bias did not differ substantially between men and women, whereas in KNN the bias was negative in men and positive in women. The median method (equivalent to sex-specific standard portion sizes) had relatively low RMSE but was inferior to the other methods in terms of bias. All of the methods underestimated the reference dietary intake, except KNN that overestimated the portion sizes in men. The use of standard portion sizes systematically underestimated the energy intake of subjects with large portion sizes; a bias that diminished, for instance, differences in dietary intake between age groups. For example, a young man was assigned the same standard portion size as an elderly man even though we know that age is a determinant of energy intake as demonstrated in Fig. 2 and by the fact that age is an input variable in calculating the BMR( Reference Frankenfield, Roth-Yousey and Compher 16 ). This bias may well affect parameter estimates in multivariate analyses( Reference Eekhout, de Vet and Twisk 17 ). On the other hand, the median method performed better than the other methods in Spearman’s rank test. However, the confidence intervals were overlapping with MLR, and Coca and KNN also had high correlations with the reference energy intake.

Figure 2 demonstrates how all imputation methods were better in predicting portion sizes in women than in men. The greater variation in men is in part explained by the higher energy intake, but probably also by a greater variation in portion sizes in men.

Evaluation of the methods

We used ‘sex-specific median imputation’ as ‘standard portions’. Standard portions can of course be defined differently, but any deterministic portion size will contain the same sort of bias and the median sizes were probably a reasonable choice.

The simple Coca method worked surprisingly well and, compared with the other stochastic methods, the computer run time was much faster. Depending on the size of the learning data set and the number of categories, empty or tiny categories may occur. This can be solved by fitting cut-off values in the dichotomization or by merging related categories. The relatively basic categorization can probably be altered to improve performance. More considerations about the different versions of the methods are presented in the online supplementary material.

External validity

The variables physical activity, sex, age, height and weight informed the three multiple imputation methods. Consequently, the three models had access to the same information. We also tested the methods including resting heart rate and ‘number of potatoes with warm meals’. By including the latter, all of the methods performed slightly better, and by including heart beat rate all of the methods performed slightly worse, but the methods performed approximately equally. The present five informing variables were chosen as they are readily available in most data sets.

The external validity of the methods may be questioned as the included subjects differed slightly from the excluded. However, the question is not whether the included and the excluded were comparable, but rather whether the relationship between physiology and portion sizes was different among the included and excluded, which does not seem very plausible.

Our reference or ‘gold standard’ was calculated from self-reported FFQ data with varying portion sizes and did not take into account information bias. It is well documented how self-reported values only to some degree reflect true intakes and that reporting of specific macronutrients may be differentially biased according to sex, weight and BMI( Reference Heitmann and Lissner 18 , Reference Fraser, Yan and Butler 19 ). All of the methods were affected by this reporting bias. Median and MLR are model-based and thereby the reporting error affected the model and had an overall effect on all imputations, i.e. possible over- and under-reporting will be spread out over the whole data. In contrast, Coca and KNN imputations are based on pairing similar individual observations and hence a systematic error will persist within the corresponding segments of the data.

Missing single values

Concerning FFQ with individual portion size questions, the MLR, Coca or KNN method can be used to substitute missing single values. In the Danish Health Examination Survey, from where the present data derive, 17·7 % of the questions on portion sizes were missing which is not uncommon in an FFQ( Reference Subar, Kipnis and Troiano 20 ). Currently, most studies probably ‘fill in the blanks’ with median values or standard portions( Reference Eekhout, de Boer and Twisk 21 ). As demonstrated in the present study, median imputation generates bias. If only a few values are missing the resulting bias may be negligible, but the impact of median imputation bias increases with the number of missing values. If one of the stochastic methods is used for imputation of single missing values, a comparable data set is always at hand: the subset of data with no missing values. We have supplied Coca SAS codes for this use in the online supplementary material.

FFQ without portion sizes

MLR or Coca may be used to include portion sizes in FFQ without individual portion size questions. In this case the portion sizes will have to be imputed from a comparable data set with portion sizes. Often traditional FFQ have later been improved with portion size questions and if the populations are similar, data from newer semi-quantitative FFQ can be used as learning data set. We have supplied SAS codes for this use also in the online supplementary material.

Multiple imputation

When applying multiple imputation, the multivariate analyses are run on multiple (e.g. ten) data sets each with different imputed values. The resulting parameter estimates are then the mean values of the ten analyses( Reference Rubin and Schenker 7 ). In the present paper we did not test our imputation methods’ ability to predict parameter estimates, but solely the ability to predict the reference TE, using ten imputations for each method. The online supplementary material provides SAS codes on how to do multiple regression modelling with multiple data sets.

In summary

MLR and Coca are both valuable methods for including portion sizes in FFQ or substituting missing portion size values. The KNN method seemed less attractive due to the differential bias in men and women, and the relatively high RMSE. In general, these three stochastic methods allowed for estimation of meaningful portion sizes by conditioning on information about physiology and they were suitable for multiple imputation. Application of sex-specific standard portion sizes inferred more bias than the other methods tested and diminished differences in energy intake related to age, for instance. We propose to use the MLR or Coca method to substitute missing portion size values or when portion sizes need to be included in FFQ without portion size data.

Acknowledgements

Acknowledgements: The authors thank Jesper Lauritsen and the Danish Diet, Cancer and Health project for developing the freeware FoodCalc®. Financial support: The Danish Health Examination Survey (DANHES) was funded by the Ministry of the Interior and Health and the Tryg Foundation. The survey was carried out by the National Institute of Public Health, University of Southern Denmark. The present work was supported by the Danish PhD School of Molecular Metabolism, Region Southern Denmark, University of Southern Denmark; the Research Unit for General Practice in Copenhagen, Denmark; and the A.P. Møller Foundation for Advancement of Medical Science. The funders had no role in the design, analysis or writing of this article. Conflict of interest: None. Authorship: R.K.-R., V.S., T.I.H., N.d.F.O., J.E.H. and B.L.H. participated in formulating the research questions and in designing the study; B.L.H. provided the data; R.K.-R., V.S. and T.I.H. performed the statistical analyses; R.K.-R., V.S., T.I.H., N.d.F.O., J.E.H. and B.L.H. analysed the results and contributed to the writing and editing of the manuscript draft; R.K.-R. wrote the manuscript. All authors read and approved the final manuscript. Ethics of human subject participation: The DANHES study was approved by the Danish local ethics committees and the Danish Data Protection Agency. The involved institutions’ review boards have approved the present study proposal.

Supplementary material

To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1368980014002389

References

1. Osler, M, Heitmann, BL, Gerdes, LU et al. (2001) Dietary patterns and mortality in Danish men and women: a prospective observational study. Br J Nutr 85, 219225.Google Scholar
2. Tjonneland, A, Haraldsdottir, J, Overvad, K et al. (1992) Influence of individually estimated portion size data on the validity of a semiquantitative food frequency questionnaire. Int J Epidemiol 21, 770777.Google Scholar
3. Bazzano, LA, He, J, Ogden, LG et al. (2002) Fruit and vegetable intake and risk of cardiovascular disease in US adults: the first National Health and Nutrition Examination Survey Epidemiologic Follow-up Study. Am J Clin Nutr 76, 9399.Google Scholar
4. Noethlings, U, Hoffmann, K, Bergmann, MM et al. (2003) Portion size adds limited information on variance in food intake of participants in the EPIC-Potsdam study. J Nutr 133, 510515.Google Scholar
5. Greenland, S & Finkle, WD (1995) A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol 142, 12551264.Google Scholar
6. Clapp, JA, McPherson, RS, Reed, DB et al. (1991) Comparison of a food frequency questionnaire using reported vs standard portion sizes for classifying individuals according to nutrient intake. J Am Diet Assoc 91, 316320.CrossRefGoogle ScholarPubMed
7. Rubin, DB & Schenker, N (1991) Multiple imputation in health-care databases: an overview and some applications. Stat Med 10, 585598.Google Scholar
8. Rubin, DB (1987) Multiple Imputations for Nonresponse in Surveys. New York: Wiley & Sons.CrossRefGoogle Scholar
9. Sterne, JA, White, IR, Carlin, JB et al. (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393.CrossRefGoogle ScholarPubMed
10. Eriksen, L, Gronbaek, M, Helge, JW et al. (2011) The Danish Health Examination Survey 2007–2008 (DANHES 2007–2008). Scand J Public Health 39, 203211.Google Scholar
11. National Institute of Public Health, University of Southern Denmark (2007) Danish Health Examination Survey FFQ. http://www.si-folkesundhed.dk/upload/kost-spørgeskema.pdf (accessed October 2014).Google Scholar
12. Ekelund, U, Sepp, H, Brage, S et al. (2006) Criterion-related validity of the last 7-day, short form of the International Physical Activity Questionnaire in Swedish adults. Public Health Nutr 9, 258265.Google Scholar
13. Lauritsen J & Danish Diet, Cancer and Health project (2013) FoodCalc®. http://www.ibt.ku.dk/jesper/foodcalc/ (accessed October 2014).Google Scholar
14. Danish Veterinary and Food Administration (2013) Danish national food composition tables. http://www.foodcomp.dk/download/Den_lille_levnedsmiddeltabel-4udg.pdf (accessed October 2014).Google Scholar
15. Parr, CL, Hjartaker, A, Scheel, I et al. (2008) Comparing methods for handling missing values in food-frequency questionnaires and proposing k nearest neighbours imputation: effects on dietary intake in the Norwegian Women and Cancer study (NOWAC). Public Health Nutr 11, 361370.CrossRefGoogle ScholarPubMed
16. Frankenfield, D, Roth-Yousey, L & Compher, C (2005) Comparison of predictive equations for resting metabolic rate in healthy nonobese and obese adults: a systematic review. J Am Diet Assoc 105, 775789.Google Scholar
17. Eekhout, I, de Vet, HC, Twisk, JW et al. (2014) Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. J Clin Epidemiol 67, 335342.Google Scholar
18. Heitmann, BL & Lissner, L (1995) Dietary underreporting by obese individuals – is it specific or non-specific? BMJ 311, 986989.Google Scholar
19. Fraser, GE, Yan, R, Butler, TL et al. (2009) Missing data in a long food frequency questionnaire: are imputed zeroes correct? Epidemiology 20, 289294.Google Scholar
20. Subar, AF, Kipnis, V, Troiano, RP et al. (2003) Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. Am J Epidemiol 158, 113.Google Scholar
21. Eekhout, I, de Boer, RM, Twisk, JW et al. (2012) Missing data: a systematic review of how they are reported and handled. Epidemiology 23, 729732.Google Scholar
Figure 0

Table 1 Characteristics of the subjects with complete portion size data, included in the present study, compared with the excluded subjects with incomplete portion size data, Danish Health Examination Survey 2007–2008

Figure 1

Fig. 1 Total energy intake (TE) computed with the reference portion sizes (x-axis) is plotted against the difference between the reference TE and the TE computed with the portion sizes from each imputation method (y-axis): (a) median imputation in men (B=0·15, se=0·008, T=17·7); (b) MLR imputation in men (B=0·10, se=0·010, T=9·5); (c) KNN imputation in men (B=0·04, se =0·013, T=2·7); (d) Coca imputation in men (B=0·11, se=0·013, T=8·5); (e) median imputation in women (B=0·16, se=0·010, T=17·1); (f) MLR imputation in women (B=0·11, se=0.011, T=10·5); (g) KNN imputation in women (B=0·12, se=0·013, T=9·4); (h) Coca imputation in women (B=0·11, se=0·012, T=8·8). In this variation of a Bland–Altman plot, the x-axis denotes the reference value (and not the mean) as the error pertains solely to the imputed measure. The horizontal lines denote zero, the mean difference, +2 sd and −2 sd. B=the slope of a regression line: y=Bx+c. T=B/se; thus T denotes the tendency to underestimate portion sizes in subjects with high TE (and the reverse). High values of T denote stronger tendencies; the significance is implicit as T>1·95 implies P<0·05. Note: a positive value on the y-axis indicates an underestimation of the reference energy intake (imputation method: median, standard portion sizes, defined as sex-specific medians; MLR, multinomial logistic regression; KNN, k-nearest neighbours; Coca, ‘comparable categories’)

Figure 2

Table 2 Mean daily energy intake among 3728 adults with complete portion size data (reference), compared with energy intakes calculated with portion sizes derived from four imputation methods, Danish Health Examination Survey 2007–2008

Figure 3

Fig. 2 Mean daily total energy intake is plotted against BMI (a, b), age (c, d) and level of physical activity (e, f), separately for men (a, c, e) and women (b, d, f). The reference (——) is computed with the originally reported portion sizes. The total energy intake has been computed with portion sizes determined by four different imputation methods: —□—, median (equivalent to sex-specific standard portions); —L—, MLR (multinomial logistic regression); —○—, Coca (‘comparable categories’); ——, KNN (k-nearest neighbours). The results presented are mean values of ten imputations with each method (on random splits of the data)

Supplementary material: File

Køster-Rasmussen Supplementary Material

Supplementary Material

Download Køster-Rasmussen Supplementary Material(File)
File 229.9 KB