Undernutrition is a serious global public health problem, which results in high mortality and overall disease burden(Reference Black, Victora and Walker1) and is common among under-five children, particularly in low and middle-income countries(Reference Black, Victora and Walker1,Reference Black, Allen and Bhutta2) . Even though global rates have declined, undernutrition rates remain high among children in sub-Saharan Africa(Reference Svedberg3,Reference Tzioumis, Kay and Bentley4) , with Eastern Africa having one of the highest stunting rates (exceeding 30 %)(5), including Ethiopia(6). In Ethiopia, undernutrition in the form of under-five stunting (low height for age) decreased from 58 % in 2000 to 38 % in 2016, a reduction of about one-third. Besides, under-five underweight (low weight-for-age) declined from 41 % to 24 % during the same period(6–Reference Negash, Whiting and Henry9). Despite these achievements which followed an improvement in food security due to several government policy interventions(Reference Van der Veen and Tagel10), undernutrition among children remains very high making it difficult to achieve Ethiopia’s commitment to the Seqota Declaration of ending child undernutrition by 2030(11). This may be caused by a myriad of factors including population pressure, drought, disease outbreak, chronic poverty, pre- and post-harvest crop losses(Reference Endalew, Muche and Tadesse12) as well as increasing food prices(Reference Nandy, Daoud and Gordon13) which constrain food security and nutritional status in the country.
Meanwhile, several studies have examined the spatial variations and determinants of undernutrition among under-five children in Ethiopia based on the traditional analytical approach(Reference Negash, Whiting and Henry9,Reference Alemu, Ahmed and Yalew14–Reference Umeta, West and Verhoef16) . Most of these studies focussed only on specific parts of the country such as rural parts of Tigray and Somali regions, or are limited to specific localities(Reference Hagos, Hailemariam and WoldeHanna17) which are not nationally representative. The few studies(Reference Gurmu and Etana18,Reference Sohnesen, Ambel and Fisker19) that show evidence on the spatial variations in undernutrition among children in Ethiopia mainly focussed on stunting and overlooked other indicators of child undernutrition, such as wasting and underweight.
Furthermore, machine learning (ML) is a powerful approach that intersects artificial intelligence and statistical learning in the process of discovering unknown relationships or patterns(Reference Alghamdi, Al-Mallah and Keteyian20). Modern ML algorithms have shown superior predictive ability in addressing classification problems when compared with classical statistical models. Various ML algorithms have been applied in medical research(Reference Choi, Kim and Yoo21–Reference Zhao, Healy and Rotstein24). For instance, ML algorithms such as random forest (RF), support vector machine and artificial neural networks have been used to predict the status of diseases such as acute appendicitis and diabetes(Reference Choi, Kim and Yoo21,Reference Yu, Liu and Valdez22) . A related study in Bangladesh has shown that the RF algorithm was superior to other ML algorithms such as linear discriminant analysis, k-nearest neighbours (k-NN), support vector machines and logistic regression(Reference Talukder and Ahammed25). Moreover, a study in Nigeria used Bayesian Additive Regression Trees to show that maternal education decreases severe child undernutrition when mothers acquire 10 years of education or higher(Reference Kraamwinkel, Ekbrand and Davia26). Nevertheless, a scoping review conducted by Kino et al. (Reference Kino, Hsu and Shiba27) has shown that among the huge volumes of social determinants of health studies published annually, only a few used ML techniques, which creates the opportunity to conduct this research further. As well, most of these ML studies used United States data and, therefore, provides a direction to explore public health concerns from other parts of the world(Reference Kino, Hsu and Shiba27). As a corollary, in this study, we used various ML algorithms that were not extensively used in previous studies to predict child undernutrition determinants in Ethiopia.
Ultimately, a comparison of five ML algorithms was illustrated for three indicators of child undernutrition (stunting, wasting and underweight). The study initially presented a spatial map for under-five nutritional status in Ethiopia to provide an overview of child undernutrition disparities across the regions of the country. The main goal of this study is to provide evidence on the best predictive algorithm for child undernutrition risk factors in Ethiopia. This study will provide much understanding of how the various indicators of child undernutrition vary with space and the risk factors that underlie these variations, which would be necessary for targeting programs and interventions given the limitation of resources in the country.
Methods
Data source
This study uses data from the 2016 Ethiopian Demographic and Health Survey. The 2016 Ethiopian Demographic and Health Survey is currently the latest and part of the world demographic and health survey series that is conducted every 5 years. It is a nationally representative household survey that collects data on a broad range of population and health issues to enhance maternal and child health in Ethiopia(6). The Ethiopian Demographic and Health Survey survey used a multi-stage stratified sampling procedure to select respondents from households in a total of 624 clusters(6). The study sample is limited to 9471 children below age five. This was based on retrospective information obtained from mothers about the BMI of their children within the 5 years preceding the survey (2011–2016).
Study variables and measurements
The outcomes of interest in this study are under-five stunting, wasting and underweight status. Z-scores of anthropometric measurements – height-for-age (stunting), weight-for-age (underweight) and weight-for-height (wasting) – were used to evaluate the nutritional status. According to WHO, undernutrition indicators are determined by the following standard measures: stunting: height-for-age < –2 sd; wasting: weight-for-height < –2 sd and underweight: weight-for-age < –2 sd of the WHO Child Growth Standards median(28,Reference de Onis, Borghi and Arimond29) . Severe stunting, wasting and underweight were those children whose height-for-age, weight-for-height and weight-for-age Z-score below minus 3 (−3) sd. This study, thus, considered all three undernutrition indicators to predict childhood undernutrition determinants. In this regard, the outcomes were binary coded as 1 for stunted, wasted and underweight if the standard was met else 0 for not stunted, not wasted and not underweight. A set of covariates were considered as the possible risk factors for childhood undernutrition in Ethiopia (See Appendix). In the ML algorithms, we incorporated as many variables as possible from the DHS which have less percentage of missing data. Essentially, the only variables excluded from the study were those that have more than 50 % missing data due to their impact on the performance of the algorithms.
Analytic strategy
The R programming language (version 3.6.0)(30) and the R packages caret(Reference Kuhn31) and caretEnsemble(Reference Mayer and Knowles32) were used to perform the data processing and analysis. Five ML algorithms (xgbTree, generalised linear model (GLM), NNet, RF, k-NN) were applied to determine the predictive power of ML algorithms and to identify the top-20 most important determinants for each of childhood undernutrition indicators (stunting, wasting and underweight).
Logistic regression
The binomial GLM is typically used to analyse binary data and is commonly used as an inferential tool in population health research, but it also can be used as a binary classification algorithm. No tuning is needed for GLM because the algorithm has no hyperparameters and assumes a logit relationship between response and predictors.
Random forest
RF is a supervised ensemble learning method that acts based on decision trees(Reference Ho33). RF algorithm repeatedly samples the variables in the training data set many times, each time using a random set of predictor variables to produce a regression classification tree. After many of these trees are formed, the predictive performance of each variable is measured, and the best set of variables is obtained. It is very flexible and fast that can be used for both classification and regression.
Extreme gradient boosting
xgbTree is a scalable ensemble technique that has been demonstrated to be a reliable and efficient ML challenge solver(Reference Bentéjac, Csörgő and Martínez-Muñoz34). The xgbTree is chosen because it uses an efficient and scalable implementation of the gradient boosting framework and supports various objective functions, including regression, classification and ranking(Reference Chen, He and Benesty35). It has better control against overfitting by using more regularised algorithm formalisation, in comparison to prior algorithms. It has a high rate of success in Kaggle competitions, particularly for structured features(Reference Chen and Guestrin36).
Neural networks
Neural networks represent a method of statistical learning based on the model of neurons in the brain. In some sense, they can be thought of as nonlinear regression based on how the observed data can affect the outcome. Visually, however, they can be seen as layers of inputs and outputs. Weighted combinations of the inputs are created and put through a function (e.g. the sigmoid function) to produce the next layer of inputs(Reference Clark37). The next layer goes through the same process to produce either another layer or to predict the output, which is the final layer. All the layers between the input and output are usually referred to as ‘hidden’ layers. Some of the strengths include having good prediction generally, incorporating the predictive power of different combinations of inputs and having tolerance for correlated inputs(Reference Clark37).
k-nearest neighbours
k-NN is a robust and adaptive classification algorithm that is part of the supervised ML family. It is a non-parametric algorithm that does not rely on any strict assumptions about the underlying data. The decision boundary of the algorithm depends on a few input points and their particular positions. Thus, the classification of new cases is based on a similarity or the use of observations in the training set that are closest in metric space(Reference Hastie, Tibshirani and Friedman38).
ML approach
Following the standard methods for ML techniques, the data were split into two sets (training and testing) to learn from the data, train the classification algorithms and identify patterns within the data. Once the algorithms were trained, they were applied to the test dataset, and algorithm accuracy was assessed. The data were trained twice – with (60 % train, 40 % test) and (70 % train and 30 % test) – but a reasonable outcome was observed in the widely used classification of 70 % train and 30 % test. Thus, the training set consisted of 70 % of the observed data while the remaining 30 % of the cases were held out as a test or validation set. Five ML algorithms (xgbTree, GLM, NNet, RF, k-NN) were applied by using a sample of 70 % of the individuals in each group (training data set, n 5147) and validated in the remaining 30 % (test data set, n 1716). Missing cases were then disposed of while running the ML algorithms. All algorithms were trained based on 10-fold cross-validation. We used 10-fold cross-validation on the training set, and the performance was estimated on the testing set.
Combining algorithms into ensemble predictions
To increase the accuracy of the algorithms, we used ‘Stacking’, the most popular method for combining the predictions from different algorithms. Using ‘Stacking’, multiple algorithms (typically of differing types) can be built and a supervisor algorithm that learns how to best combine the predictions of the primary algorithms be generated(Reference Hastie, Tibshirani and Friedman38). Thus, in this study, the predictions of the selected caret algorithms (xgbTree, GLM, NNet, RF, k-NN) were combined using stacking.
Algorithm evaluation
To verify the algorithm’s performance in terms of classifications, a confusion matrix (also known as an error matrix) is used. A confusion matrix of a binary classification is a two-by-two table showing values of True Negatives, False Negatives, True Positives and False Positives resulting from predicted classes of data. The confusion matrix allows the measures of rates such as prediction accuracy, sensitivity and specificity(Reference Brownlee39).
Accuracy
Accuracy is the basis of estimating the performance of any predictive algorithm. It estimates the ratio of right predictions to the total number of data points evaluated. This study was comprised of the best accuracies that were obtained by several ML algorithms after applying the feature selection as well as k-fold techniques.
Sensitivity
Sensitivity is the proportion of real positive cases that got predicted as positive (or true positive). It is also termed recall. This implies that there will be another proportion of real positive cases, which would get predicted incorrectly as negative (termed as the false negative). This can also be presented in the form of a false-negative rate.
Specificity
Specificity is the proportion of real negative cases that got predicted as the negative (or true negative). This implies that there will be another proportion of real negative cases, which would get predicted as positive and could be termed as false positives. This can also be presented in the form of a false-positive rate.
Cohen’s κ
The κ statistic (or value) is a metric that compares an Observed Accuracy with an Expected Accuracy (random chance). The κ statistic is used not only to evaluate a single classifier but also several classifiers amongst themselves. The calculation of the Observed Accuracy and Expected Accuracy is important for the comprehension of the statistic which is usually illustrated using a confusion matrix. Landis and Koch(Reference Landis and Koch40) provide the following to measure the values of this statistic: 0 indicates no agreement, 0–0·20 as slight, 0·21–0·40 as fair, 0·41–0·60 as moderate, 0·61–0·80 as substantial and 0·81–1 as almost perfect.
Total accuracy is simply the sum of true positive and true negatives, divided by the total number of items, that is:
Results
Descriptive results
Out of the 9471 children below 5 years in the study sample, 38·4 % of them were reported to be stunted, 10 % were wasted and 23·3 % were underweight. Close to half of the children (46·6 %) experienced some form of malnutrition (were either stunted, wasted or underweight). About half of the children (50·4 %) were aged less than 30 months, and the majority (64·6 %) belonged to mothers aged less than 20. More than half of the children (51·9 %) were males. Two-third (67·2 %) of these children were born at home, with the remaining children (32·8 %) being born in health facilities. About 46 % of the children were from poor households, while 89 % resided in rural settings. The majority were at least third-order births (65·4 %) and 2–4 years interval births (55·8 %). Also, about 44 % of the children did not have access to an improved water source while about 91 % of them had no access to improved toilet facilities. Further, about 45 % of them were children of mothers with two children (parity 2) while 17·4 % of them were children of mothers with three or more children (Table not shown).
Spatial distribution of childhood undernutrition indicators
Figure 1 presents a visualisation of the spatial variations of the three childhood undernutrition outcomes. The results show considerable regional variations in stunting, wasting and underweight as measures of undernutrition in the country. It is visually clear that Amhara, Benishangul-Gumuz, Affar and Dire Dawa regions were the most affected by stunting with Gambela and Somali being the least affected regions. Wasting was most prevalent in the eastern part of the country, comprising of the Somali and Affar regions and followed by Gambella and Benishangul-Gumuz, among others in the west. Amhara and Southern Nations, Nationalities and Peoples (SNNP) regions were, however, least affected by wasting. Underweight was most prevalent in the Affar region in the northeast, and the Benishangul-Gumuz region in the western part of the country. However, underweight was the least prevalent in the Gambella region. Severe stunting, wasting and underweight showed similar patterns of variations even though at comparatively lower levels (Fig. 2).
Predictive algorithms for child undernutrition indicators
Stunting
The under-five stunting prediction accuracy was found to be low for all algorithms, between 62·9 and 67·7 % accuracy on the test set, although the xgbTree had the highest overall accuracy (Table 1). The xgbTree had relatively higher sensitivity, meaning that it was accurate at distinguishing the stunting cases from the non-stunted cases, but had low specificity, meaning that it was not good at discerning the non-stunting cases. More metrics show that the algorithm is relatively better at predicting both positive (stunted) and negative (no-stunted) cases. The algorithm was able to correctly identify 72 % of the stunted, which suggests that it was relatively better at predicting the stunted cases. The GLM algorithm showed slightly lower accuracy (65·5 %), compared to xbgTree but higher than other ML algorithms (Table 1, Fig. 3).
Wasting
The under-five wasting prediction accuracy was again found to be highest for the xgbTree with a slightly higher level of accuracy (88 %) (Table 1). Interestingly, all the selected algorithms showed more or less similar accuracy. The best predicting algorithms (xgbTree) were able to correctly identify 88·2 % of the wasted cases, which is an indication of slightly lower prediction power compared to the GLM algorithm in predicting the wasting cases. The GLM algorithm, however, showed a slightly lower overall accuracy (87·0 %) (Table 1, Fig. 4).
Underweight
As with stunting and wasting, the xgbTree algorithm was found to have the highest predictive ability (75·7 %), with a sensitivity of 77·5 % and specificity of 55·50 %. However, the k-NN algorithm indicated the lowest performance with accuracy, sensitivity and specificity of 73·0 %, 74·6 % and 57·1 %, respectively (Table 1, Fig. 5).
The important determinants of childhood undernutrition indicators
As described in the above section, the accuracy results indicated that the XgbTree algorithm was the best for all the three predicting factors (stunting, wasting, and underweight), in terms of their accuracy, area under the curve – receiver operating characteristics (AUC-ROC) curve. Based on the most accurate algorithm (xgbTree), the top-20 important variables are presented out of a total number of thirty-seven variables used according to their mean decreasing Gini (Figs 6–8).
Interestingly, the top five most important among these variables were varied across all the three indicators of undernutrition. For stunting, time to water source (time_to_water), child age 30+ months (child_age_Greater_than_30_months), number of under-five children (no_u5_children), television ownership (has_tv.yes) and small birth size (child_size. Small) were the top-five important variables. For wasting, child age 30+ months (child_age_Greater_than_30_months), poorest wealth status (wealth_index.poorest), time to the water source (time_to_water), Somali ethnicity (ethnicity. Somali) and small birth size (child_size. Small) were found to be the top-five important variables. Likewise, time to the water source (time_to_water), no maternal education (mother education0.Noeducation), small birth size (child_size. Small), months (child_age_Greater_than_30_months) and maternal underweight status (mother_bmi. Underweight) were shown to be the top-five important variables predicting childhood underweight status. Time to the water source, child age 30+ months and small birth size appeared to be the common top-five important variables across the three outcomes.
Discussion
Our descriptive findings show that there are substantial variations in all three nutritional indicators (stunting, wasting and underweight) among the regions in Ethiopia. Stunting is most prevalent among the northern regions such as Affar and Amhara, and in the western region such as Benishangul-Gumuz but least prevalent in Gambella and Somali in the south-west and south-east regions, respectively. For wasting, the prevalence is highest in the Somali region but lowest in the Amhara region. Also, underweight is most prevalent in the Affar region but least prevalent in Gambela. Evidence of similar geographical variabilities in stunting, wasting and underweight has been shown in Ethiopia(Reference Alemu, Ahmed and Yalew14). It has been shown that food diversity and the number of meals that children eat per day play a significant role in stunting and underweight while food insecurity also has an important role to play in wasting(Reference Motbainor, Worku and Kumie41). Regions such as Amhara, Affar and Tigray are prevalent in food insecurity, and calorie intake per adult has been found to decrease in Beneshangul Gumuz and Amhara in recent years(42). Reductions in the number of meals per day have also been shown to be common in these regions that are more frequently affected by drought and are targets of Productivity Safety Net programs(Reference Endalew, Muche and Tadesse12,Reference Negash43) despite the observed positive effects of various policy interventions on food security in some regions(Reference Van der Veen and Tagel10). These considerable regional disparities in the nutrition indicators have profound implications for the nutritional status of under-five children in the country.
Regarding the predictive algorithms, the xgbTree algorithm appeared to have the highest predictive accuracy for all the undernutrition outcomes. It is, therefore, noteworthy that even though the traditional logistic regression algorithm (GLM) has shown the lowest predictive accuracy compared to the xgbTree and the RF, the advantage it has over the others is that its results are quite interpretable in terms of the estimated predictors in the algorithm. Similarly, a variety of ML approaches have been applied to health issues including childhood anaemia(Reference Khan, Chowdhury and Islam44) and nutritional status(Reference Khare, Kavyashree and Gupta45) and have demonstrated high quality and valid predictions.
Findings from the best predicting algorithm (xgbTree) show that the key factors underlying undernutrition are diverse across the three indicators of undernutrition. Nevertheless, time to the water source, child age greater than 30 months, and small birth size appears to be the commonest important predictors across the three indicators. Water sources that can be accessed in shorter time – such as pipe-borne water – are typically located within households and usually better and safer for drinking and use. Hence, shorter or easy access to water sources has been shown to be associated with reduced risk for undernutrition particularly wasting and stunting among children(Reference Cardoso, Allwright and Salvucci46,Reference Kamiya47) while the source of drinking water is an important predictor of child nutritional status(Reference Habyarimana48). Furthermore, it appears that children who are 30 months old and beyond have an increased risk for all kinds of undernutrition outcomes, particularly stunting and wasting. The importance of a child’s age in predicting the undernutrition status of children is adequately documented in the literature(Reference Kamiya47–Reference Akombi, Agho and Merom50) and provides support for the findings of this study. The child size at birth also appears to play an important role in determining childhood nutritional status, with children of a small birth size being greatly disadvantaged in undernutrition risks. Similar evidence of this effect has been adequately shown in the literature(Reference Poda, Chien-Yeh and Chao49,Reference Aheto, Keegan and Taylor51,Reference Masibo52) and directly supports the findings of this study.
Furthermore, the number of under-five children in the household and television ownership has shown top-five importance for stunting alone but have been rarely documented by previous studies. Also, we find evidence of considerable disadvantage in wasting risks among children from poor households in Ethiopia. Much research in sub-Saharan Africa has shown that poor household wealth is significantly associated with child undernutrition(Reference Poda, Chien-Yeh and Chao49,Reference Akombi, Agho and Merom50,Reference van den Bold, Quisumbing and Gillespie53) . Quite expectedly, poorer households may have difficulty providing sufficient nutritious food for their under-five children, which may be necessary for child growth and development. In this study, ethnic minorities such as the Somalis also emerge as one of the top five important factors for wasting risks alone even though this has seldomly been shown in the literature.
As well, the findings show that lack of maternal educational attainment proffers increased risks of childhood underweight. As such, children of educated women have considerably reduced underweight risks(Reference Boah, Azupogo and Amporfro54), possibly because highly educated women may likely have higher access to better employment opportunities with better salaries and benefits that may help to afford good nutrition for their children. This has crucial implications for child undernutrition and further underscores the need to increase women’s education to enhance child health outcomes in developing countries(Reference Gurung55). Further, we find that children of underweight mothers have a considerable disadvantage in underweight risks. This supports the findings of myriads of studies particularly in sub-Saharan Africa(Reference Poda, Chien-Yeh and Chao49,Reference Boah, Azupogo and Amporfro54) . This may appear unsurprising, as under-five children may likely be exposed to the same risk factors faced by their underweight mothers. The importance of the sex of children has also emerged in this study, with male children appearing to be disadvantaged in undernutrition risks than females, which directly supports the extant literature in sub-Saharan Africa(Reference Abubakar, Uriyo and Msuya56,Reference Sulaiman, Bushara and Elmadhoun57) . However, this may seem to reflect cultural-based preferential treatments between both sexes.
The findings of this study have implications for the relevance of ML algorithms in population health research. Similarly, several studies have confirmed the usefulness of ML for population health research and policy decision making in various areas including child undernutrition(Reference Kraamwinkel, Ekbrand and Davia26), women’s height(Reference Daoud, Kim and Subramanian58), CVD risks(Reference Manuel, Tuna and Bennett59) and mortality(Reference Allen, Mataraso and Siefkas60) as well as defining treatment effects in epidemiological studies(Reference Wiemken and Kelley61) which highlights how ML is increasingly being applied to predict population health outcomes(Reference Morgenstern, Buajitti and O’Neill62). These findings may also be useful in bias reduction(Reference Allen, Mataraso and Siefkas60) as ML methods can accurately quantify uncertainty when data are scarce, as can be found in sub-Saharan Africa.
It is noteworthy that this study is not without a few potential limitations. While algorithms with high representation power may have the risk of overfitting the noisy training data, algorithms with lower power may suffer from underfitting and, thus, risking failing to capture the regularity in the training data set. The underfitting problem may be usually caused by insufficient data or a high-bias algorithm (i.e. the algorithm being too simple to capture a complicated hypothesis function)(Reference Bagui, Fang and Kalaimannan63). In this study, the overall lower predictive ability observed especially in the case of stunting may reflect underfitting related to a lower study sample size. In this situation, little can be done to improve predictive power, except to gather more data (more records, more features) and/or switch algorithms by considering the previous survey years’ data (Ethiopian Demographic and Health Survey 2000–2016). As well, there is a limitation of results interpretability. Unlike the traditional logistic regression algorithm (GLM) where the population parameters generated are interpretable in terms of odds ratios and the other parameters, results from ML algorithms are mainly less interpretable as they have no parameters. Notwithstanding, the ML algorithms have been widely touted for their prediction power, and this study provides an invaluable contribution to the undernutrition literature in the context of ML.
Conclusions
This study shows considerable regional variations in childhood undernutrition and how commonly used ML algorithms could be applied to predicting child stunting, wasting and underweight determinants in Ethiopia. The findings show that the xgbTree algorithm offers better predictive accuracy than the traditional algorithm GLM. Furthermore, the best-predicting ML algorithm has shown diverse combinations of important predictors for stunting, wasting and underweight, even though there are a few common top-five predictors among them. The algorithms may, therefore, be useful to child nutrition and other population health researchers, and aid workers among other stakeholders, particularly where large data are available. The study, thus, provides evidence on how the ML approach can be leveraged to better predict the underlying risk factors of childhood undernutrition among other population health outcomes. This may create a better understanding of a child’s nutritional status and help to develop more effective policies to advance childhood nutritional status in the country. The findings reinforce the need for committed efforts to improve upon access to potable water supply and food security, as well as the socio-economic wellbeing of women in Ethiopia. There is also the need for policies and interventions to put special focus on children of small birth size, children who are over 30 months old and children of underweight mothers.
Acknowledgements
Acknowledgements: This study is based on data from the DHS Program. Financial support: No funding was received for this study. Conflict of interest: The authors declare that they have no conflict of interest. Authorship: F.H.B. conceived and designed the study. F.H.B. and C.S.S. performed the analysis with technical support from S.H.N. F.B. wrote the initial draft of the manuscript with technical support from S.H.N. and C.S.S. All authors critically reviewed the intellectual content of the manuscript and then approved the final version for submission. Ethics of human subject participation: Not applicable.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S1368980021004262