Anthropometric variables are often used to estimate laboratory-measured body composition(Reference Bellisari, Roche, Heymsfield, Lohman, Wang and Going1). This validation methodology involves using cross-sectional data and multiple regression to develop valid prediction equations to estimate a referent percentage fat (body fat percentage; BF%) criterion(Reference Sun, Chumlea, Heymsfield, Lohman, Wang and Going2). The initial research approach was to develop population-specific equations using relatively homogeneous samples, such as young and middle-aged men and women(Reference Katch and Michael3–Reference Pollock, Hickman, Jackson, Kendrick and Dawson8). In the 1970s, researchers(Reference Durnin and Wormsley9–Reference Jackson, Pollock and Ward11) published what have been termed ‘generalised body composition equations’. The generalised equations used large, variable samples of men and women and modelled the data to account for age and the non-linear relationship between body density and skinfold fat. The Jackson–Pollock (JP) generalised equation validation research(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11) has been cited over 1300 times in the scientific literature and the men's study was reproduced in 2004 as a British Journal of Nutrition citation classic(Reference Trayhurn12).
The original JP generalised equations(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11) were published with data obtained in the 1970s. The American population has become more racially and ethnically diverse and, during this time, the prevalence of obesity in the American population has increased(Reference Flegal and Troiano13). Race/ethnic diversity has been shown to be associated with body composition variation(Reference McDowell, Fryar, Hirsch and Ogden14). The subjects used to develop the original JP generalised equations were predominately white men and women and the body composition reference criterion was hydrodensitometry-determined body density(Reference Going, Heymsfield, Lohman, Wang and Going15) converted to BF% with the Siri two-component (2-C) model(Reference Siri, Brozek and Hanschel16). Multicomponent models(Reference Lohman, Harris, Teixeira and Weiss17–Reference Wang, Shen, Withers, Heymsfield, Heymsfield, Lohman, Wang and Going19) have replaced the 2-C model as the reference criterion. The changes in body composition and race/ethnic composition of the American population and the adoption of multicomponent body composition reference criteria raise questions concerning the validity and accuracy of the JP generalised equations when used with contemporary men and women. Our purpose was to cross-validate the generalised equations on samples of non-Hispanic white, Hispanic and African–American men and women using dual-energy X-ray absorptiometry (DXA) as the BF% referent criterion (BF%-DXA).
Methods
Samples
Subjects were drawn from the 5-year Training Intervention and Genetics of Exercise Response (TIGER) study. The diverse sample consisted of 706 women and 423 men who ranged in age from 17 to 35 years. The race/ethnic breakdown of the total sample was: non-Hispanic white (white), 37·1 %; Hispanic white (Hispanic), 28·8 %; African–American, 34·1 %. The TIGER subjects were students enrolled at the University of Houston (Houston, TX, USA) who agreed to participate in the TIGER study. The published JP descriptive data used to develop the original generalised equations were compared with the TIGER subjects. The JP men and women came from two general sources: students, faculty and staff at Wake Forest University (Winston-Salem, NC, USA) and patients and research volunteers at the Cooper Institute (Dallas, TX, USA). The racial/ethnic composition of these men and women was not reported, but nearly all were white. All TIGER subjects completed a written informed consent before being measured. All procedures were approved by the protection of human subjects committees at the University of Houston and Baylor College of Medicine.
Measurement methods
The cross-sectional TIGER data were obtained from the baseline visit. Height was determined with a stadiometer (Seca Road Rod; Seca Corp., Hanover, MD, USA) and weight was measured with a digital scale (Seca 770). Each subject reported birth date, sex, and race/ethnicity using a coded self-report demographic form. Whole-body DXA scans were completed on two Hologic units (Hologic, Bedford, MA, USA). The data for the year 2003 were obtained on a Hologic Delphia-A unit (adult whole body software v. 11·2) at the body composition laboratory at the US Department of Agriculture/Agricultural Research Service (USDA/ARS) Children's Nutrition Research Center (Baylor College of Medicine, Houston, TX, USA). The DXA scans for the final 4 years were obtained on a Hologic Discovery W instrument (adult whole body software QDR version 12.3; Hologic) housed in the Obesity Research Center in the Health and Human Performance Department (University of Houston, Houston, TX, USA). The same trained technicians administered all DXA scans. The instruments were calibrated daily with a spine criterion and weekly with a step calibrator, as described by the manufacturer. All female participants completed a criterion urine pregnancy test before DXA testing to ensure that they were not pregnant. Subjects were asked to lie in the supine position, remain still, and the entire scan was completed in less than 6 min. Whole-body (minus the head) fat mass, lean mass, bone mass and BF% were determined using the software supplied by the manufacturer. Following the recommendation of Lohman & Chen(Reference Lohman, Chen, Heymsfield, Lohman, Wang and Going18), DXA-measured body mass was compared with body mass measured with the digital scale. The correlation between scale-measured and DXA-determined body mass was 0·997 (standard error of the estimate (see) 1·6 kg).
The JP sum of three-skinfold generalised equations(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11) was applied to the TIGER data to estimate BF% from Siri's two-component percentage fat equation (BF%-GEN). The male skinfold sites were chest, abdomen and thigh. The three female sites were triceps, supra-ilium and thigh. The TIGER skinfold measurement methods replicated those used in the original work(Reference Baumgartner, Jackson, Mahar and Rowe20, Reference Jackson and Pollock21). The generalised equations have functions to estimate body density from the quadratic form of the sum of the three skinfolds in combination with age. Estimated body density was converted to BF% using the Siri 2-C model(Reference Siri, Brozek and Hanschel16). The published validity correlations and see expressed in the metric of body density and Siri 2-C BF% are: men, R 0·905 (see 0·008 kg/l, 3·40 %); women, R 0·842 (see 0·008 kg/l, 3·92 %)(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11).
Statistical methods
STATA software (version 10; StataCorp LP, College Station, TX, USA)(22) was used for all statistical analyses. ANOVA evaluated the mean differences between the JP and TIGER samples, and among the three ethnic/race groups. The cross-validation of the generalised equations on the TIGER men and women involved several steps. First, product–moment correlations examined the relationship between BF%-GEN and BF%-DXA. General linear models (GLM) defined the relationship between BF%-GEN and BF%-DXA and examined the effect of sex. The method outlined by Pedhauzur(Reference Pedhauzur23) was used to test for homogeneity of the male and female regression slopes and intercepts. The accuracy of the BF%-GEN was examined graphically by the Bland–Altman method(Reference Altman and Bland24, Reference Altman and Bland25). GLM determined if race/ethnic group accounted for BF%-DXA variance independent of BF%-GEN, and if race/ethnic group interacted with BF%-GEN.
Results
Table 1 gives the descriptive statistics of the men and women for both samples. ANOVA confirmed that TIGER men and women differed (P < 0·001) from the JP subjects on all variables. The male and female trends were similar. The TIGER men and women were younger, shorter and heavier. The TIGER women and men were 2·5 and 3·0 cm shorter than the JP men and women. The TIGER men were 6·75 kg heavier than the JP men, and the weight difference for women was nearly 13 kg. These height and weight differences produced significant BMI differences. The mean BMI differences were: men, 2·82 kg/m2; women, 5·36 kg/m2. The proportion of men and women who exceeded the BMI overweight criterion of ≧ 25 kg/m2 was higher for the TIGER subjects. Nearly 62 % of the TIGER men were overweight compared with 42 % of the JP men. Just 6 % of the JP women had a BMI ≧ 25 kg/m2 compared with nearly 46 % for the TIGER women. The mean differences for the sum of three skinfolds for the TIGER and JP men and women were about 10 mm and 20 mm, respectively. These skinfold fat differences produced BF%-GEN differences of 0·9 % for men and nearly 8 % for females. Table 1 gives the BF%-DXA descriptive statistics for the TIGER samples and the difference between BF%-DXA and BF%-GEN (BF%-Diff). The BF%-DXA of the TIGER subjects was higher than BF%-GEN. The mean BF%-Diff was 1·32 % (t (422) = 7·51; P < 0·001) for men and 2·99 % (t (705) = 19·55; P < 0·001) for women.
BF%-GEN, body fat percentage from Siri's two-component percentage fat equation; BF%-DXA, body fat percentage from dual-energy X-ray absorptiometry; BF%-Diff, difference between BF% from dual-energy X-ray absorptiometry and BF% from Siri's two-component percentage fat equation.
Table 2 gives the descriptive statistics for the TIGER men and women contrasted by race/ethnic group. The Table includes the group sample sizes and the means and standard deviations. ANOVA with Bonferroni contrasts(22) confirmed that white and African–American men and women were taller than Hispanic men and women (P < 0·001). The mean weight of the African–American women was significantly higher than white and Hispanic women (P < 0·001). The BMI of white women was significantly lower than African–American women (P = 0·003), but within chance variation of Hispanic women (P = 0·152). The mean weight (P = 0·572) and BMI (P = 0·528) of the male race/ethnic groups was not significantly different. The women's race/ethnic group differences were within chance variation for the sum of skinfold fat (P = 0·281) and BF%-GEN (P = 0·335), but the mean BF%-DXA of Hispanic women was significantly (P < 0·001) higher than white and African–American women. The sum of skinfolds, BF%-GEN and BF%-DXA means of the African–American men were significantly (P < 0·01) lower than the means of white and Hispanic men. The BF%-GEN and BF%-DXA means of white and Hispanic men were not significantly different.
BF%-GEN, body fat percentage from Siri's two-component percentage fat equation; BF%-DXA, body fat percentage from dual-energy X-ray absorptiometry; BF%-Diff, difference between BF% from dual-energy X-ray absorptiometry and BF% from Siri's two-component percentage fat equation.
Fig. 1 gives the BF%-DXA and BF%-GEN bivariate plots of the TIGER men and women's data. Provided are the male and female regression lines (line of best fit) and, for reference, the line of identity (slope = 1·0, intercept = 0). The correlation between BF%-DXA and BF%-GEN for the males and females combined was 0·91. When contrasted separately, the correlations were 0·85 for women and 0·93 for men. GLM were used to test for differences between the slopes and intercepts of the BF%-DXA and BF%-GEN male and female regression lines. This analysis showed that the male and female slopes (0·78 v. 0·88) were significantly different (F (1, 1125) = 125·62; P < 0·001). The women's line of best fit (grey line) was significantly steeper than the male line (black line). Fig. 2 gives the male and female Bland–Altman plots(Reference Altman and Bland24, Reference Altman and Bland25) of the BF%-DXA and BF%-GEN differences and averages. For reference, the male and female regression lines are provided and a dashed line for a mean difference of 0. The slope (b = 0·03) of the female line (grey line) was not significantly different from 0 (P = 0·160), but the male slope of − 0·17 (black line) was P < 0·001. The data in Fig. 2 documented that BF%-GEN underestimated BF%-DXA over all levels of the women's average of BF%-DXA and BF%-GEN. The men's generalised equation underestimated BF%-DXA at the lower levels of the average of BF%-DXA and BF%-GEN, which was below about 27 %. The 95 % limits of agreement were − 5·14 to 11·1 % for women and − 5·58 to 8·51 % for men.
Table 2 gives the BF%-Diff means and standard deviations contrasted by race/ethnic group and sex. The BF%-Diff for Hispanic and African–American men and women were larger than for white men and women, suggesting that race/ethnic group was a source of prediction bias. The GLM was used to determine if race/ethnic group accounted for BF%-DXA variance independent of BF%-GEN. Table 3 gives these analyses. Provided are two GLM models for men and women: GLM I includes just BF%-GEN; GLM II adds race/ethnic group to the model. Race/ethnic group was entered into the GLM as a categorical variable using white men and women as the referent groups (regression coefficient = 0). Race/ethnic group accounted for 0·64 % of additional BF%-DXA variance beyond BF%-GEN for men and 2·73 % for women. These increases were statistically significant for both men (F (4, 419) = 15·85; P < 0·001) and women (F (4, 702) = 57·66; P < 0·001). Post hoc analysis showed that, for the same BF%-GEN, the BF%-DXA of African–American men was overestimated by 0·81 % and underestimated BF%-DXA of Hispanic men by 0·73 %. These effects were small, but statistically significant (P < 0·03). The regression weight (b = 2·22) for Hispanic women was significantly (P < 0·001) different from zero, but the African–American regression weight of − 0·35 was within chance variation (P = 0·303) of white women. The test for the BF%-GEN × race/ethnic group interaction of males (F (2, 417) = 0·51; P = 0·600) and females (F (2, 700) = 0·78; P = 0·460) was not statistically significant.
*P < 0·01, **P < 0·03.
The Bland–Altman (Fig. 2) and GLM analysis (Table 3) documented that the methods were highly correlated, but the generalised equations did not accurately estimate BF%-DXA. The GLM was used to derive calibration equations to estimate BF%-DXA from the JP sum of three skinfolds. Table 4 gives these calibration models for men and women. The JP equations in the Siri 2-C BF% metric are provided for reference. Like the original JP analyses that used Siri 2-C BF% as the dependent variable, these analyses showed that the relationship between BF%-DXA and skinfold fat was quadratic (P < 0·001). Table 4 shows that the linear and quadratic regression weights for the calibrated models are similar to the JP regression weights. Unlike the original JP analyses, age was not statistically significant for either men (P = 0·834) or women (P = 0·423). Provided in Table 4 are race/ethnic group-specific equations. The race/ethnic group bias was accounted for by summing the intercepts of white men and women with the significant race/ethnic group regression coefficient. The calibration model fit statistics (R, see) were slightly better than the original published JP statistics(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11). The male and female residual distributions were graphically examined with histograms and standardised normal probability plots(22). The graphic plots (data not presented) showed there were no observed deviations from a normal distribution.
see, Standard error of the estimate; BF%-GEN, body fat percentage from Siri's two-component percentage fat equation; ∑F, sum of triceps, supra-ilium and thigh skinfolds; ∑M, sum of chest, abdomen and thigh skinfolds.
Discussion
These results showed that BF%-GEN and BF%-DXA were highly correlated. The male and female correlations in Table 3 were consistent with the validity correlations of 0·905 and 0·842 reported in the JP original research(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11). A correlation is a measure of association and not accuracy between the two measures. Equation accuracy involves analysing measurement error and determining the level of agreement between the two measures(Reference Altman and Bland24, Reference Altman and Bland25). The limits of agreement analyses showed that the generalised equations gave biased BF%-DXA estimates when applied to these samples of diverse men and women.
Figs. 1 and 2 show that sex was a source of measurement bias. This suggests that the cohort differences may be a source of the sex bias. The data in Table 1 compared the body composition differences between the TIGER and JP men and women. Both BMI and skinfold fat of the TIGER men and women were significantly, and substantially higher, than the JP subjects. A comparison of BMI means of the JP men with Second National Health and Nutrition Examination Survey (NHANES II) 1976–80 data(Reference Flegal and Troiano13) showed that the body composition of the JP and NHANES II men was similar. The mean BMI of the NHANES II men was 24·3 kg/m2 for the 20–29 years age group. The mean BMI of JP men (age about 33 years) was 24·5 (95 % CI 24·2, 24·8) kg/m2. The BMI means of the NHANES II women(Reference Flegal and Troiano13) for 20–29 and 30–39 years age groups were 23·1 and 24·9 kg/m2, respectively. The mean BMI of the JP women was much lower (21·1 (95 % CI 20·8, 21·3) kg/m2). This difference supports the conclusion that the JP women were leaner than the general American population. A comparison of the TIGER men and women with Centers for Disease Control and Prevention (CDC) anthropometric reference data obtained in the years 1999–2002(Reference McDowell, Fryar, Hirsch and Ogden14) suggested that the body composition of the TIGER samples were representative of American adults. The BMI of the TIGER men (27·3 (95 % CI 26·8, 27·9) kg/m2) was within chance variation of the mean BMI of 27·0 kg/m2 for the 1999–2002 NHANES American men aged 20–29 years. The mean BMI for the 1999–2002 NHANES women was 26·8 kg/m2, within chance variation of the TIGER mean of 26·4 (95 % CI 25·9, 26·9) kg/m2. The TIGER data exhibited the same race/ethnic group trends as the CDC anthropometric reference data(Reference McDowell, Fryar, Hirsch and Ogden14). The mean BMI difference among TIGER men differed by less than 0·1 kg/m2 while the BMI of Hispanic and African–American women was higher than white women. The NHANES data support the assumption that the TIGER men and women were representative of contemporary Americans and that the generalised equations were developed on men who were representative of the US population at the time that the generalised equations were developed, but on women who were leaner than the general population.
The GLM analyses (Table 3) confirmed that race/ethnic group was a source of prediction bias independent of BF%-GEN. With BF%-GEN statistically controlled, the BF%-DXA of Hispanic women was systematically underestimated by 2·22 (95 % CI 1·50, 2·95) % compared with white and African–American women. Compared with white men, the BF%-GEN prediction bias was 0·73 (95 % CI 0·07, 1·39) % for Hispanic men and − 0·81 (95 % CI − 1·52, − 0·11) % for African–American men. Several investigators(Reference Chang, Wu, Chang, Yao, Yang, Wu and Lu26–Reference Jackson, Stanforth, Gagnon, Rankinen, Leon, Rao, Skinner, Bouchard and Wilmore31) have reported that BF% estimated with BMI regression equations developed using data from white men and women yielded biased estimates for non-white race groups. This bias has been attributed to variation in bone mineral content(Reference Wagner and Heyward32) and body build(Reference Deurenberg, Yap and van Staveren28, Reference Deurenberg, Deurenberg-Yap, Wang, Lin and Schmidt33–Reference Guricci, Hartriyanti, Hautvast and Deurenberg35). We examined the likelihood that bone mineral content was a source of the bias with these TIGER subjects. Bone mineral content, expressed as the percentage of total DXA weight, was added as an independent variable. Adding bone mineral content altered the men's GLM results presented in Table 3. The African–American effect for men was no longer statistically significant (P = 0·749), but the effect for Hispanic men remained. Adding bone mineral content to the women's GLM did not influence the women's results. These post hoc GLM analyses suggested that bone mineral content variation was a source of bias for African–American men, but not Hispanic men or women. Race/ethnic group bias has been documented(Reference Chang, Wu, Chang, Yao, Yang, Wu and Lu26–Reference Jackson, Stanforth, Gagnon, Rankinen, Leon, Rao, Skinner, Bouchard and Wilmore31) relating BMI to BF%, but this is the first study, to our knowledge, that documented a race/ethnic group bias with skinfold prediction equations.
While the sample differences in body composition shown in Table 1 were a likely source of prediction bias, the use of different referent, BF% criteria was another potential source of bias. The criterion for the JP generalised equations(Reference Jackson and Pollock10, Reference Jackson, Pollock and Ward11) was hydrostatically measured body density(Reference Going, Heymsfield, Lohman, Wang and Going15) converted to BF% with the Siri 2-C model(Reference Siri, Brozek and Hanschel16). The referent criterion for the TIGER subjects was BF%-DXA, a multicomponent model(Reference Lohman, Chen, Heymsfield, Lohman, Wang and Going18). A search of the literature found twenty-three sets of data(Reference Ball and Altena36–Reference Withers, LaForgia, Pillans, Shipp, Chatterton, Schultz and Leaney45) with published means for BF%-DXA and either body density or Siri 2-C BF%. Hydrostatic weighing was used to measure body density for eighteen of the paired datasets and air-displacement plethysmography was used for the remaining five. The sample sizes ranged from ten to 160 subjects and the mean ages ranged from 21 to 49 years. To control for variation in sample size, each mean was multiplied by its sample size (mean × n = ΣX). The calculated ΣX values for each dataset were summed and used to compute the mean for all 1177 subjects and for men and women. The mean difference between DXA and Siri 2-C BF% for all men and women was small, just 0·04 %. When compared by sex, the grand mean of the 473 females was 1·86 %, compared with − 0·72 % for the 704 men. This showed that the Siri 2-C BF% tended to underestimate BF%-DXA of women, which is consistent with the present results.
Other investigators have reported that the generalised equations underestimated multicomponent BF% measurements. Clasey et al. (Reference Clasey, Kanaley, Wideman, Heymsfield, Teates, Gutgesell, Thorner, Hartman and Weltman37) reported that the generalised equations underestimated four-compartment BF% by 5·9 % with seventy-six young and older men and women. When examined by age and sex groups, the differences ranged from 2·4 % for young women to 7·9 % for older women. Peterson et al. (Reference Peterson, Czerwinski and Siervogel46) reported that the JP women's equation underestimated the four-compartment BF% of ninety-one women by 6·6 %. Ball et al. (Reference Ball, Altena and Swan47) reported that the generalised equation underestimated BF%-DXA by slightly more than 3 % in 160 men who ranged in age from 18 to 62 years. The results of these studies showed that the JP generalised equations underestimated multicomponent BF% of contemporary adults, which is consistent with our findings.
These findings suggest that the inaccuracy of the generalised equations may also be due to using different BF% referent criteria. To examine this closer, the TIGER data using just white men and women were combined with the JP data. GLM evaluated the relationship between sum of skinfold fat and measured BF%, controlling for sex and age. The GLM dependent variable was the BF% referent criterion (Siri 2-C BF% or BF%-DXA) and the independent variables were the linear and quadratic sum of skinfolds, and sample (JP and TIGER). Age was included as a covariate and sample as a categorical variable. The GLM fit statistics for the combined JP and TIGER white data were R 0·91, see 3·3 % for men and R 0·87, see 3·9 % for women. The analysis of the women's data showed that the sample × skinfold fat interaction was not statistically significant (F (2, 506) = 1·18; P = 0·279), but sample effect was significant (F (1, 506) = 20·71; P>0·001). An examination of the trends showed that, with age controlled, BF%-DXA of white women was 2·86 % higher than Siri 2-C BF% for the same sum of triceps, supra-ilium and thigh skinfolds. In contrast, the men's sample × skinfold interaction was statistically significant (F (1, 580) = 25·30; P < 0·001). Examining the men's GLM results showed that, with age controlled, the sum of chest, abdomen and thigh skinfolds of 90 mm estimated both Siri 2-C BF% and BF%-DXA at 25 %. For a sum of skinfolds < 90 mm, the Siri 2-C BF% estimates were lower than BF%-DXA estimates. Estimated Siri 2-C BF% values of 10, 15 and 20 % represented BF%-DXA values of 13·1, 17·3 and 21·7 %. The analysis of the JP sample with the white TIGER men and women also showed that the bias was a function of sex. The generalised equations underestimated the BF%-DXA of both men and women, but the prediction bias of the generalised equations was consistent over all BF% levels of women and below about 25 % for men. This trend was also shown with the Bland–Altman analyses provided in Fig. 2. These data suggested that the sources of inaccuracy of the generalised equations were not just due to JP and TIGER sample differences, but also different BF% referent criteria. The equations in Table 4 provide an accurate calibration method of BF%-DXA for white, Hispanic and African–American young men and women.
The results of the present study showed that the JP generalised equations were highly correlated with BF%-DXA, but lacked accuracy, the generalised equations systematically underestimating BF%-DXA. The GLM analysis documented that race/ethnic group was an independent source of the prediction bias. Compared with white men and women, the generalised equations systematically underestimated the BF%-DXA of Hispanic men and women, and overestimated the BF%-DXA of African–American men. Public health data document that the American population is becoming more obese and diverse(Reference Flegal and Troiano13, Reference McDowell, Fryar, Hirsch and Ogden14). The results of the present study demonstrate that the generalised equations need to be re-examined and updated as the body composition characteristics of populations evolve. The calibration equations provide a valid and accurate statistical model to estimate the BF%-DXA of white, Hispanic and African–American men and women, aged 17–35 years. Further calibration research is needed with older men and women.
Acknowledgements
Support for the present study was provided by the National Institute of Diabetes and Digestive and Kidney Diseases/National Institutes of Health grant R01-DK062148 and by US Department of Agriculture/Agricultural Research Service (USDA/ARS) contract 6250-51 000-046. A. S. J. designed the study, conducted the data analyses and prepared the manuscript. K. J. E. and B. K. M. were responsible for the DXA measurements and reviewed and edited the submitted manuscript. M. H. S. was responsible for database management and aided in the data analyses. M. S. B. aided in the design, data analyses and writing the manuscript. The authors have no professional relationships with companies or manufacturers who might benefit from the results of the present study.