Providing regular and effective information on the body weight (BW) of animals is very important for sustainable animal husbandry and breeding. Accurate BW determination or estimation enables more precise calculation of the ideal feed allocation and will make it easier to decide drug doses, for example, and identify the most appropriate slaughter time and likely marketing price (Tırınk et al., Reference Tırınk, Önder, Francois, Marcon, Şen, Shaikenova, Omarova and Tyasi2023a). Unfortunately BW is rarely measured by small farmers due to the lack of weighscales (Lukuyu et al., Reference Lukuyu, Gibson, Savage, Duncan, Mujibi and Okeyo2016; Tebug et al., Reference Tebug, Missohou, Sourokou Sabi, Juga, Poole, Tapio and Marshall2018). Of the various methods for measuring or estimating BW, the weighscale, although it is the most accurate method, is less preferred by producers because it is cumbersome, slow, expensive to implement and stressful for the animals (Wangchuk et al., Reference Wangchuk, Wangdi and Mindu2018). On the other hand, visual measurement techniques such as image analysis require mathematical models to predict features such as BW and are currently only really applicable in research studies (Stajnko et al., Reference Stajnko, Brus and Hočevar2008; Altay and Delialioğlu, Reference Altay and Delialioğlu2022; Coşkun et al., Reference Coşkun, Şahin, Delialioğlu, Altay and Aytekin2023a). Therefore, there is a need for developing other practical methods that are low price and easy for small farmers to apply in practice (Dingwell et al., Reference Dingwell, Wallace, McLaren, Leslie and Leslie2006; Oliveira et al., Reference Oliveira, Abreu, Fonseca and Antoniassi2013). Alternative methods do exist based on biometric measures such as withers height (WH), heart girth (HG), hip width (HW), rump height (RH), and body length (BL: Heinrichs et al., Reference Heinrichs, Rogers and Cooper1992; Dingwell et al., Reference Dingwell, Wallace, McLaren, Leslie and Leslie2006; Lesosky et al., Reference Lesosky, Dumas, Conradie, Handel, Jennings, Thumbi, Toye and de Clare Bronsvoort2012; Bretschneider et al., Reference Bretschneider, Cuatrin, Arias and Vottero2014; Lukuyu et al., Reference Lukuyu, Gibson, Savage, Duncan, Mujibi and Okeyo2016; Herrera-López et al., Reference Herrera-López, García-Herrera, Chay-Canul, González-Ronquillo, Macías-Cruz, Díaz-Echeverría, Casanova-Lugo and Piñeiro-Vázquez2018; Putra et al., Reference Putra, Said and Arifin2020).
Milk production systems in the tropical regions of Mexico typically use crosses from Bos taurus and Bos indicus breeds and forage as the sole source for main feed, maintenance and milk production (Magaña et al., Reference Magaña, Ríos and Martínez2006; Rojo-Rubio et al., Reference Rojo-Rubio, Vázquez-Armijo, Pérez-Hernández, Mendoza-Martínez, Salem, Albarrán-Portillo, González-Reyna, Hernández-Martínez, Rebollar-Rebollar, Cardoso-Jiménez, Dotantes-Coronado and Gutierrez-Cedillo2009; Román-Ponce et al., Reference Román-Ponce, Ruiz-López, Montaldo, Rizzi and Román-Ponce2013), supplying around 20% of the milk consumed in the country (Magaña et al., Reference Magaña, Ríos and Martínez2006; Rojo-Rubio et al., Reference Rojo-Rubio, Vázquez-Armijo, Pérez-Hernández, Mendoza-Martínez, Salem, Albarrán-Portillo, González-Reyna, Hernández-Martínez, Rebollar-Rebollar, Cardoso-Jiménez, Dotantes-Coronado and Gutierrez-Cedillo2009; Román-Ponce et al., Reference Román-Ponce, Ruiz-López, Montaldo, Rizzi and Román-Ponce2013). Some studies have evaluated the relationship between biometric measures and BW in cross-bred cattle (Reis et al., Reference Reis, Albuquerque, Valente, Martins, Teodoro, Ferreira, Monteiro, de Almeida and Madalena2008; Mota et al., Reference Mota, Berchielli, Canesin, Rosa, Ribeiro and Brandt2013; Oliveira et al., Reference Oliveira, Abreu, Fonseca and Antoniassi2013; Franco et al., Reference Franco, Marcondes, Campos, Freitas, Detmann and Valadares2017) as well as buffalo (Ramos-Zapata et al., Reference Ramos-Zapata, Dominguez-Madrigal, García-Herrera, Camacho-Perez, Lugo-Quintal, Tyasi, Gurgel, Ítavo and Chay-Canul2023; Cruz-Tamayo et al., Reference Cruz-Tamayo, Ramírez-Bautista, Mota-Rojas, Escobar-España, García-Herrera, Gurgel, Dias-Silva, de Araújo, Santana, Aguiar, Ítavo and Chay-Canul2024), but models have not yet been developed for animals of this type under conditions of the humid tropics of Mexico, nor have models that are available been evaluated for local applicability.
Studies in the last two decades were developed for predicting BW using multiple linear regression analysis, however, these regression analyses are often inadequate for prediction because of non-linearity (Ruchay et al., Reference Ruchay, Kolpakov, Kalschikov, Dzhulamanov and Dorofeev2021). Various machine-learning approaches have been performed to calculate BW of cattle, sheep, camels and goats. Common features of these studies report the potential of various machine learning algorithms to predict linear or nonlinear relationships between BW and biometric traits accurately and reliably (Ruchay et al., Reference Ruchay, Kolpakov, Kalschikov, Dzhulamanov and Dorofeev2021). However, studies reporting the prediction of BW of tropical dairy cows through machine-learning methods are limited. Therefore, this research targeted calculation of the relationship between BW and biometric measures in Holstein × Zebu crossbred cows through MARS algorithm.
Material and methods
Data recording, study site, animals and handling
The data of BW and biometric measurements were recorded in 157 crossbreed dairy cows (Holstein × Zebu). The age of the cows ranged between 3 and 6 years and the cows grazed paddocks of star grass (Cynodon nlemfuensis) and humidicola grass (Brachiaria humidicola), without supplementation. The data were collected in the commercial farm ‘Rancho la Esperanza’, located at 17°36′27″N, 93°11′35″W; 120 masl and 10 km of the road Juárez-Reforma, in the municipality of Juarez, Chiapas, in southern Mexico.
Biometric measurements were expressed in cm and recorded as described by Oliveira et al. (Reference Oliveira, Abreu, Fonseca and Antoniassi2013) and Bretschneider et al. (Reference Bretschneider, Cuatrin, Arias and Vottero2014). These were: heart girth (HG), withers height (WH), rump height (RH), hip width (HW), body length (BL) and diagonal body length (DBL). We used a flexible fibre tape glass (Truper®) and a big caliper of 65 cm (Haglof®, Sweden). The animals were weighed on a scale fixed platform with a capacity of 2000 kg and accuracy of 1 kg.
Statistical analysis
The multivariate adaptive regression splines (MARS) algorithm is a non-parametric regression procedure that assists in a more applicable explanation of linear, nonlinear and interaction results among all variables examined within a cause-and-effect relationship. The most important advantage of this algorithm is that it does not necessarily need to meet the assumptions that the classical regression approach requires (Eyduran et al., Reference Eyduran, Akin and Eyduran2019; Akin et al., Reference Akin, Eyduran, Eyduran and Reed2020; Coşkun et al., Reference Coşkun, Şahin, Altay and Aytekin2023b; Tırınk et al., Reference Tırınk, Piwczyński, Kolenda and Önder2023b). This procedure generates the basic functions according to the stepwise procedures, considering all possible interaction effects among candidate knots and explanatory variables (Arthur et al., Reference Arthur, Temeng and Ziggah2020). The initial procedure is called the forward pass stage, and the next procedure is named the backward pass stage. In the forward pass stage, the algorithm starts with an intersection for the initial model and iteratively incorporates the initial models combined with the least training error to develop the model. Generally, this process characteristically provides an over-fitted model that influences extreme entanglement (Friedman, Reference Friedman1991; Eyduran et al., Reference Eyduran, Akin and Eyduran2019). Besides being predominantly worthy, the model constructed from the forward transition process may be weak for the dataset prior to the unstable constructed model, requiring overfitting difficulty with regard to generalization capability. The primary model that will identify the minimum quantity of the estimate model is eliminated in the last process, which is carried out to resolve the overfitting difficulty (Zaborski et al., Reference Zaborski, Ali, Eyduran, Grzesiak, Tariq, Abbas, Waheed and Tirink2019; Arthur et al., Reference Arthur, Temeng and Ziggah2020; Faraz et al., Reference Faraz, Tirink, Eyduran, Waheed, Taukir, Nabeel and Tariq2021). The equation of the MARS procedure carried out to estimate BW from explanatory variables can be given as:
Where: $\hat{y}$ expresses the expected BW value, β0 expresses the intercept of the model, βm is the basis functions coefficient, K m expresses the interaction order limit parameter, h km (X v(k,m)) term is expresses the basis function of the prediction model and v(k,m) is an indicator of the explanatory variables in the mth component of the kth product. Basic functions which can decrease the model performances achieved after aforementioned two procedure are eradicated by means of the generalized cross-validation error (GCV), whose equation is given below (Eyduran et al., Reference Eyduran, Akin and Eyduran2019; Zaborski et al., Reference Zaborski, Ali, Eyduran, Grzesiak, Tariq, Abbas, Waheed and Tirink2019; Çanga and Boğa, Reference Canga and Boğa2022):
where, n expresses the training set's sample size, yi expresses the observed value of BW, $\widehat{{{\rm y}_i}}$ expresses the predicted value of BW and M(λ) is the penalty term that will resolve the complexity of the model containing the λ terms.
At the beginning of the MARS procedure, the multicollinearity relationship between the explanatory variables must be tested to ensure lack of conflict. For this, the data were divided into proportions (80:20, 70:30 and 65:35) for training and test sets, respectively. In the training process, the 10-fold cross-validation procedure was used to choose the best MARS model among tested 180 MARS model (degree = 1:6 and nprune = 2:38). The criteria of the goodness of fit, whose equations are given below, was used to compare the performances of the models obtained from the MARS algorithm for the train and test sets at the different proportions (Grzesiak and Zaborski, Reference Grzesiak, Zaborski and Karahoca2012; Eyduran et al., Reference Eyduran, Akin and Eyduran2019; Olfaz et al., Reference Olfaz, Tirink and Onder2019; Zaborski et al., Reference Zaborski, Ali, Eyduran, Grzesiak, Tariq, Abbas, Waheed and Tirink2019; Tırınk et al., Reference Tırınk, Önder, Francois, Marcon, Şen, Shaikenova, Omarova and Tyasi2023a, Reference Tırınk, Piwczyński, Kolenda and Önder2023b).
1. Pearson correlation coefficient (r):
(3)$$r = \displaystyle{{\mathop \sum ( {x_i-\bar{x}} ) ( {y_i-\bar{y}} ) } \over {\sqrt {\mathop \sum {( {x_i-\bar{x}} ) }^2} \mathop \sum {( {y_i-\bar{y}} ) }^2}}$$2. Root-mean-square error (RMSE):
(4)$${\rm RMSE} = {\rm \;}\sqrt {\displaystyle{1 \over {\rm n}}\mathop \sum \limits_{{\rm i} = 1}^{\rm n} {( {{\rm y}_{\rm i}-\widehat{{{\rm y}_i}}} ) }^2} $$3. Standard deviation ratio (SDR):
(5)$${\rm S}{\rm D}_{{\rm ratio}} = \displaystyle{{{\rm S}_{\rm m}} \over {{\rm S}_{\rm d}}}$$4. Performance Index (PI)
(6)$${\rm PI} = \displaystyle{{{\rm rRMSE}} \over {1 + {\rm r}}}$$5. Global relative approximation error (RAE):
(7)$${\rm RAE} = {\rm \;}\sqrt {\displaystyle{{\mathop \sum \nolimits_{{\rm i} = 1}^{\rm n} {( {{\rm y}_{\rm i}-\widehat{{{\rm y}_i}}} ) }^2} \over {\mathop \sum \nolimits_{{\rm i} = 1}^{\rm n} {\rm y}_{\rm i}^2 }}} $$6. Mean absolute percentage error (MAPE):
(8)$${\rm MAPE} = \displaystyle{1 \over {\rm n}}{\rm \;}\mathop \sum \limits_{{\rm i} = 1}^{\rm n} \left\vert {\displaystyle{{{\rm y}_{\rm i}-\widehat{{{\rm y}_i}}} \over {{\rm y}_{\rm i}}}} \right\vert \times 100$$7. Akaike information criterion (AIC):
(9)$$\left\{{\matrix{ {{\rm AIC} = {\rm n}{\rm .ln}\left[{\displaystyle{ 1 \over {\rm n}}{\sum\nolimits_{{\rm i\ = \ 1}}^{\rm n} {( {{\rm y}_{\rm i} - {\rm y}_{{\rm ip}}} ) } }^ 2} \right]{\rm} + {\rm 2k, \;\;\;\;\;\;if\;n/k> 40\;}} \cr {{\rm AI}{\rm C}_{\rm c}{\rm} = {\rm AIC} + \displaystyle{{{\rm 2k}( {{\rm k} + 1} ) } \over {{\rm n} - {\rm k} - 1}}{\rm \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;otherwise}} \cr } } \right.$$
where, n represents that training set's sample size, k is the number of explanatory variables in the model, y i is the observed value of BW, $\widehat{{{\rm y}_i}}$ is the expected value of BW, s d is the standard deviation of BW, and s m is the standard deviation representing the errors of the optimal model. Goodness-of-fit criteria were carried out to define the best model for smallest AIC, RMSE, SDratio, MAPE, RAE, PI, and CV values for both sets, and the utmost R 2 and r values for whole models (Tatliyer, Reference Tatliyer2020).
Descriptive statistical evaluation was carried out with the R software (R Core Team, 2018). To tabulate descriptive statistics for whole variables we used the ‘psych’ package (Revelle, Reference Revelle2017). To show the relationship between explanatory and response variables, Pearson's correlation coefficient was determined by using the ‘performance analytics’ package (Peterson and Carl, Reference Peterson and Carl2020). To test the multicollinearity problem, the function of the variance inflation factor was used with the ‘car’ package in R software (Fox and Weisberg, Reference Fox and Weisberg2019). The MARS algorithm was carried out with the ‘caret’ package for different proportions (Kuhn, Reference Kuhn2022). To show the performances of the ‘made for all’ models, the ‘ehaGoF’ package was used (Eyduran, Reference Eyduran2020).
Results
Table 1 shows the calculated descriptive statistics for the data. The coefficient of variation (CV %) was calculated to be lower than 30% for all traits, meaning that the measured data were reliable for the data analysis.
BW, Body weight; HG, Heart girth; WH, Withers height; RH, Rump height; HW, Hip width; BL, Body length; DBL, diagonal body length.
Pearson's correlation analysis was performed to present the relationship between explanatory and response variables. The estimated Pearson correlation coefficients and their significance levels are given in Figure 1. All correlation coefficients in Figure 1 were determined to be statistically significant. The greatest correlation coefficient was determined between BW and HG. In addition, HW and WH had higher coefficients with BW than with each other. The lowest correlation coefficients were determined between BW and BL (0.51), DBL (0.47) and RH (0.37), respectively. Although there is a relatively high correlation coefficient among the other variables, it is still possible to discuss a positive relationship.
Before implementing the MARS algorithm, the multicollinearity problem was assessed. For this aim, the variance inflation factors existing between explanatory variables were determined. Values were 2.18, 2.16, 1.46, 1.80, 1.42 and 1.40 for HG, WH, RH, HW, BL and DBL, respectively. Since all were below 10 there was no multicollinearity problem that would cause overfitting and hence the MARS algorithm could be applied.
To compare the models obtained through the MARS algorithm, model comparison criteria were applied for different proportions of the training and test sets as shown in Table 2. The outcomes of the model evaluation criteria showed that the greatest analytical model power was obtained for the 80:20 training/test proportion, which had the lowest AIC values for both sets. Also, R 2 and r values were determined for this model to be 0.836 and 0.711 for the training and test sets, respectively. In addition, relative importance values were calculated for all proportions, as shown in Table 3. HG had the biggest effective variable for determining the BW for all proportions.
HG, heart girth; WH, withers height; HW, hip width; DBL, diagonal body length.
The best predictive model was provided by the 80:20 proportion. Equation (10) shows that the BW can be described with the five basis functions in the MARS prediction model.
According to this, the first term of the selected best MARS prediction model was an intercept that had a coefficient of 476.671. In the second term, HG was determined with the cutpoint of 187 cm and negative coefficient of −6.183. The third term (HG-187) had a cutpoint of 187 cm with a coefficient of 4.249. The fourth term and the third basis function were for HW, with a cutpoint of 50 cm with a coefficient of 5.929. For changes of HW of 50 cm, the effective fourth term on body weight was affected by 5.929. The fifth term was for DBL, again, with cutpoint 116 cm with a coefficient of 5.472.
Discussion
The dearth of studies for predicting BW from biometric measurements in Holstein × Zebu crossbred cattle is a challenge for researchers and producers alike. Only a few studies have focused on this issue, with most of the research conducted on Holstein or Zebu breeds. Nevertheless, some studies do exist, using some of the same biometric measurements as us (HG, WH, HW, BL: Reis et al., Reference Reis, Albuquerque, Valente, Martins, Teodoro, Ferreira, Monteiro, de Almeida and Madalena2008; Mota et al., Reference Mota, Berchielli, Canesin, Rosa, Ribeiro and Brandt2013; Oliveira et al., Reference Oliveira, Abreu, Fonseca and Antoniassi2013; Franco et al., Reference Franco, Marcondes, Campos, Freitas, Detmann and Valadares2017). Our correlation coefficients among biometric measurements were estimated lower than the study of Putra et al., (Reference Putra, Said and Arifin2020) on Pasundan cows, whilst in our hands the BW predictive power of WH was higher but BL and HG were lower than in Ongole cows (Putra, Reference Putra2020). Bene et al. (Reference Bene, Nagy, Kiss, Polgar and Szabo2007) compared different beef breed cattle. Our correlation between BW and WH was higher than Angus, Hereford and Hungarian Simmental whereas in our hands RH was less useful as a predictor. We obtained similar correlations to those reported for BW and BL (Bene et al., Reference Bene, Nagy, Kiss, Polgar and Szabo2007). Our best correlation (HG) was lower than the results of Kashoma et al. (Reference Kashoma, Luziga, Werema, Shirima and Ndossi2011) for Tanzanian shorthorn Zebu cattle, and in another study using the same crossbred type of cattle as ourselves, Mota et al. (Reference Mota, Berchielli, Canesin, Rosa, Ribeiro and Brandt2013) found high correlation coefficients between BW and HG, hip height and rump height. On the other hand, our data may be more reliable since we used far more cattle (156 compared to 24).
Reis et al. (Reference Reis, Albuquerque, Valente, Martins, Teodoro, Ferreira, Monteiro, de Almeida and Madalena2008) report that the accuracy of estimating BW can be affected by breed, age, body size, body condition and physiological state. Franco et al. (Reference Franco, Marcondes, Campos, Freitas, Detmann and Valadares2017) reported an R 2 of 0.83 between BW and HW in Holstein crossbred heifers. These authors conclude that although HW was highly correlated with BW, it showed a low R 2 with a high coefficient of variation when compared with other variables such as body length, hip height and rump height. Using HG and WH to estimate the BW of dairy cows in low-input systems in Senegal, Tebug et al. (Reference Tebug, Missohou, Sourokou Sabi, Juga, Poole, Tapio and Marshall2018) reported that R 2 varied from 0.77 to 0.94; they also reported that the RMSE of the developed models corresponded to 9.4 to 12.33% (29.27 to 39.24 kg) of the average BW of animals. Also, Bretschneider et al. (Reference Bretschneider, Cuatrin, Arias and Vottero2014) determined that the RMSE of their model was 5.8% of the average BW (15.95 kg). Mota et al. (Reference Mota, Berchielli, Canesin, Rosa, Ribeiro and Brandt2013) concluded that the correlations between measurements and body development of heifers with different parentages are distinct, and that specific equations are necessary for predicting body weight. Additionally, Tedde et al. (Reference Tedde, Grelet, Ho, Pryce, Hailemariam, Wang, Plastow, Gengler, Brostaux, Froidmont, Dehareng, Bertozzi, Crowe, Dufrasne and Soyeurt2021) indicated that estimating BW through biometric measurements can be approached as a regression problem, where the input features are the body measurements, and the target value is the BW that the regression model predicts.
Ruchay et al. (Reference Ruchay, Kolpakov, Kalschikov, Dzhulamanov and Dorofeev2021) stated that the random forest regression algorithm, one of the machine learning methods, is the most effective algorithm in predicting the BW of Hereford cows and may be more effective than traditional models. Similarly, Dang et al. (Reference Dang, Choi, Lee, Lee, Alam, Park, Han, Lee and Hoang2022) determined that the light generalized boosted regression tree-based model had the best performance, and will use these findings to develop a method for indirectly estimating the live weight of Hanwoo cows using machine vision technology that measures ten different body features. Celik (Reference Celik2019) compared MARS, Chi-square automatic interaction detection (CHAID), exhaustive-CHAID and classificastion and regression tree (CART) algorithms for predicting BW in Pakistani goats. Goodness of fit criteria were used to evaluate the model performances, and it was reported that the model obtained with the MARS procedure was most reliable among the models studied. Our evaluation criteria yielded similar results to theirs. Canga (Reference Canga2022) used MARS to predict hot carcass weight from several features, and once again the MARS algorithm gave similar results to the results of the current study within the scope of the model comparison criteria.
We can conclude that the use of statistical procedures for predicting BW from biometric measurements is an important and useful tool that can be applied to crossbred cattle. They are relatively easy to use and require minimal effort. The MARS algorithm is particularly useful. It is a non-parametric approach, which allows for the incorporation of non-linear relationships between the independent and dependent variables. This makes it especially useful for predicting body weight from body measurements, as these can often be subject to non-linear relationships. The MARS algorithm also provides an efficient and accurate way to construct prediction models, without the need to perform many variable transformations. Furthermore, MARS allows for the incorporation of both continuous and categorical variables, thus making it an ideal method for predicting BW from biometric measurements.