Introduction
Turfgrass is widely grown in urban landscapes, including athletic, commercial, and residential lawns; golf courses; roadsides; and parks (Pincetl et al. Reference Pincetl, Gillespie, Pataki, Porse, Jia, Kidera, Nobles, Rodriguez and Choi2019). Turf provides various advantages, including evaporative cooling in urban areas, soil remediation, atmospheric pollutant absorption, and beautifying residential and nonresidential landscapes (El-Haggar and Samaha Reference El-Haggar, Samaha and Amer2019; Stier et al. Reference Stier, Steinke, Ervin, Higginson and McMaugh2013). Nevertheless, weed competition is a severe constraint for turf management. Weeds compete with turfgrasses for environmental resources such as sunlight, moisture, and soil nutrients (Hamuda et al. Reference Hamuda, Glavin and Jones2016; Liu and Bruch Reference Liu and Bruch2020), reducing turf aesthetics and functionality (Monteiro Reference Monteiro2017; Pincetl et al. Reference Pincetl, Gillespie, Pataki, Porse, Jia, Kidera, Nobles, Rodriguez and Choi2019). Weed management in turfgrass landscapes traditionally relied heavily on broadcast herbicide application (McCullough et al. Reference McCullough, Yu, Shilling, Czarnota and Johnston2015; McElroy and Martins Reference McElroy and Martins2013), although weeds almost always present in nonuniform and patchy distributions (Dai et al. Reference Dai, Xu, Zheng and Song2019; Yu et al. Reference Yu, Schumann, Cao, Sharpe and Boyd2019a), leading to herbicide application on areas where weeds do not occur. The excessive use of synthetic herbicides poses a potential risk to human health and may result in environmental pollution (Alengebawy et al. Reference Alengebawy, Abdelkhalek, Qureshi and Wang2021; Hasanuzzaman et al. Reference Hasanuzzaman, Mohsin, Bhuyan, Bhuiyan, Anee, Masud, Nahar and Prasad2020; Mennan et al. Reference Mennan, Jabran, Zandstra and Pala2020; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019b). For example, atrazine, a photosystem II inhibitor, is commonly used in warm-season turfgrasses, yet it is frequently detected in groundwater (Yu and McCullough Reference Yu and McCullough2016). Consequently, it has been classified as a restricted-use pesticide in the United States (USEPA 2023). Manual spot spraying of herbicide can reduce herbicide input but is time-consuming and labor-intensive, and thus is impractical for large landscape areas (Kakarla et al. Reference Kakarla, Costa, Ampatzidis and Zhang2022).
Machine vision–based precision herbicide application technologies offer a viable solution to minimize herbicide use and weed control costs (Jin et al. Reference Jin, Liu, Chen and Yu2022b; Partel et al. Reference Partel, Kakarla and Ampatzidis2019; Shuping et al. Reference Shuping, Yu, Chenming and Fengbo2023; Upadhyay et al. Reference Upadhyay, Sunil, Zhang, Koparan and Sun2024a, Reference Upadhyay, Zhang, Koparan, Rai, Howatt, Bajwa and Sun2024b). Traditional machine learning methods analyze plant imagery, considering factors such as color (Tang et al. Reference Tang, Chen, Miao and Wang2016), morphology (Perez et al. Reference Perez, Lopez, Benlloch and Christensen2000), textural traits (Bakhshipour et al. Reference Bakhshipour, Jafari, Nassiri and Zare2017), and hyper- or multispectral features (Jiang et al. Reference Jiang, Jiang, Ru, Wang, Xu and Zhou2020; Pantazi et al. Reference Pantazi, Moshou and Bravo2016), for the purpose of identifying target weeds or distinguishing between crops and weeds. Nevertheless, detecting and differentiating weeds within crops is inherently difficult due to their resemblances in color and morphology (Al-Badri et al. Reference Al-Badri, Ismail, Al-Dulaimi, Salman, Khan, Al-Sabaawi and Salam2022; Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021).
In recent years, improvements in graphics processing unit (GPU) computing capabilities have greatly advanced the development of deep convolutional neural networks (DCNNs) (Krichen Reference Krichen2023; Tulbure et al. Reference Tulbure, Tulbure and Dulf2022). Many innovative concepts, such as activation functions, parameter optimization, model size, and inference architecture, have been explored to further enhance the performance of DCNNs (Khanam et al. Reference Khanam, Hussain, Hill and Allen2024). DCNNs have shown impressive capabilities in weed detection within turf environments, as demonstrated in recent research (Jin et al. Reference Jin, Han, Zhao, Wang, Chen and Yu2024; Xie et al. Reference Xie, Hu, Bagavathiannan and Song2021; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019b). For instance, research elucidated the efficacy of employing object detection (DetectNet) and image classification neural networks (including AlexNet, GoogLeNet, and VGGNet) to detect weeds in bermudagrass [Cynodon dactylon (L.) Pers.] and perennial ryegrass (Lolium perenne L.) turfgrasses. The findings highlighted that image classification neural networks excelled in detecting images containing broadleaf and grassy weeds within turfgrass (Jin et al. Reference Jin, Bagavathiannan, McCullough, Chen and Yu2022a; Yu et al. Reference Yu, Schumann, Cao, Sharpe and Boyd2019a, Reference Yu, Sharpe, Schumann and Boyd2019c, Reference Yu, Schumann, Sharpe, Li and Boyd2020). Nevertheless, deep learning–based methods for weed detection in turf enhance accuracy at the expense of increasing computational load and decreasing detection speed, which limits their practical application. Many studies have demonstrated that models developed on high-performance computers often have excessive parameters, which complicates efficient inference on terminal devices (Chen and Ran Reference Chen and Ran2019; El-Rashidy et al. Reference El-Rashidy, El-Sappagh, Islam, El-Bakry and Abdelrazek2020; Shakarami et al. Reference Shakarami, Shahidinejad and Ghobaei-Arani2021; Yang et al. Reference Yang, Wang, Qiao, Qu, Han, Yuan, Li, Wu and Peng2022). Consequently, it is fairly challenging to develop alternative approaches for identifying weeds in turf while balancing the model’s accuracy with real-time weed detection.
Knowledge distillation is a contemporary neural network technique aimed at diminishing neural network size while maintaining or enhancing performance (Hinton et al. Reference Hinton, Vinyals and Dean2015). In knowledge distillation, a small student model is typically supervised by a large teacher model (Ba and Caruana Reference Ba and Caruana2014; Buciluǎ et al. Reference Buciluǎ, Caruana and Niculescu-Mizil2006; Hinton et al. Reference Hinton, Vinyals and Dean2015; Urban et al. Reference Urban, Geras, Kahou, Aslan, Wang, Caruana, Mohamed, Philipose and Richardson2017). Basically, a knowledge distillation system consists of three fundamental components: knowledge, a distillation algorithm, and teacher–student architecture. It involves the student model emulating the teacher model to achieve competitive or even superior performance. Existing distillation algorithms typically use a fixed temperature as a hyperparameter in the softmax layer to control the smoothness of the distribution and accurately determine the difficulty level of the loss minimization process (Li et al. Reference Li, Li, Yang, Zhao, Song, Luo, Li and Yang2023). In recent years, there has been growing research in the utilization of knowledge distillation in the domain of agriculture, such as crop segmentation (Angarano et al. Reference Angarano, Martini, Navone and Chiaberge2023), fruit and vegetable defect detection (Cai et al. Reference Cai, Zhu, Liu, Yu and Xu2024; Nithya et al. Reference Nithya, Santhi, Manikandan, Rahimi and Gandomi2022; Zhou et al. Reference Zhou, Song, Song, Wen, Sun and Gao2023), and plant leaf segmentation (Jung et al. Reference Jung, Lee and Kim2022).
Knowledge distillation offers a method for converting complex weed detection models into lightweight versions, facilitating deployment on resource-constrained mobile or embedded devices without sacrificing performance. This research hypothesized that applying knowledge distillation to turf weed detection could enhance weed detection performance while optimizing the use of limited time and computational resources, thereby improving the efficiency of developing effective neural network models. This approach shows significant promise for smart weeding robots, boosting their capability for real-time and precise herbicide application. A key factor in knowledge distillation is the hyperparameter temperature (T), which plays a crucial role in balancing the knowledge transfer between the teacher and student models. Therefore, the objectives of this research were to (1) assess the performance of three teacher models in detecting weeds in turf across datasets of different scales, (2) compare the results of three student models at different temperatures after knowledge distillation to determine their respective optimal temperatures, and (3) evaluate three student models individually at their respective optimal temperatures to identify the most suitable model for practical application.
Materials and Methods
Dataset
The experimental images in this research were captured at different times from various turf landscapes containing diverse weed species. Some images were captured in spring 2021 using a Panasonic® digital camera (model DMC-ZS110) at two distinct locations in China: sod farms in Jiangning District, Nanjing City, Jiangsu Province, China (31.95°N, 118.85°E) and sod farms in Shuyang, Jiangsu Province, China (34.12°N, 118.79°E), while others were obtained in autumn 2018 using a SONY® Cyber-Shot Digital Still Camera (model DSC-HX1) from two separate locations in the United States: the University of Georgia Griffin Campus in Griffin, GA, USA (33.26°N, 84.28°W), and multiple golf courses in Peachtree City, GA, USA (33.39°N, 84.59°W). The turf species in these locations was bermudagrass, and the most commonly observed weed species were dallisgrass (Paspalum dilatatum Poir.), dandelion (Taraxacum officinale F.H. Wigg. ssp. officinale), doveweed [Murdannia nudiflora (L.) Brenan], Florida pusley (Richardia scabra L.), lawn pennywort (Hydrocotyle sibthorpioides Lam.), old world diamond flower (Oldenlandia corymbosa L.), purple nutsedge (Cyperus rotundus L.), smooth crabgrass [Digitaria ischaemum (Schreb.) Schreb. ex Muhl.], and white clover (Trifolium repens L.). The camera was configured in automatic mode for parameters such as exposure, focus, and white balance. Images were captured at a height that resulted in a ground-sampling distance of 0.05 cm pixel−1, under varying lighting conditions, including clear, cloudy, and partially cloudy weather. All the images were taken in a 16:9 ratio, with a resolution of 1,920 by 1,080 pixels.
The raw images were initially cropped to dimensions of 240 by 240 pixels using Irfanview (v. 5.50, Irfan Ski jan, Jajce, Bosnia). As demonstrated in Figure 1, each resulting image block was then categorized into one of two classes: “weed,” representing sub-images containing weeds, and “turf,” representing sub-images without weeds. As shown in Table 1, 5,000 images per class were selected to create the small training dataset D-10k, and each class was expanded to include 10,000 images for the large training dataset D-20k. Additionally, an additional 500 images per class were set aside for the validation dataset, while another 500 images per class were allocated for the testing dataset.
a Images at a resolution of 240 × 240 pixels were used for training, validation, and testing.
b D-20k indicates the training dataset with 10,000 “Turf” class images and 10,000 “Weed” class images. D-10k indicates the training dataset with 5,000 “Turf” class images and 5,000 “Weed” class images.
Neural Network Models
In the context of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)-2015, the champion ResNet (He et al. Reference He, Zhang, Ren and Sun2016) introduced “residual blocks” to address issues of gradient vanishing and declining training set accuracy in deep neural networks. Instead of attempting to directly learn the complete underlying mapping from inputs to outputs, these blocks enable the network to focus on learning the difference (residual) between the input and the desired output. This architectural innovation shifts the network’s task from fitting the entire low-level mapping to modeling the residual in relation to the original network, thereby significantly reducing training complexity.
DenseNet (Huang et al. Reference Huang, Liu, Van Der Maaten and Weinberger2017) shares a similar goal of overcoming the challenge of training deep neural networks by incorporating “skip connections” or “shortcut connections.” It strongly emphasizes “dense connectivity,” where every layer is densely connected to every other layer, creating a tightly interconnected network structure. This architectural approach promotes extensive feature reuse across the network, thereby facilitating feature propagation and gradient flow throughout the entire model, ultimately contributing to more effective training of deep networks.
EfficientNet (Tan and Le Reference Tan and Le2019) is a set of eight convolutional neural network models ranging from B0 to B7. EfficientNet achieves more efficient results through uniform scaling of depth, width, and resolution while shrinking the model size. The initial phase of compound scaling involves a grid search to determine the relationships among different scaling dimensions of the baseline network under fixed resource constraints. Subsequently, appropriate scaling factors are determined and applied to scale the baseline network to the target network. The primary building block of EfficientNet is the MBConv module, consisting of a layer that first expands and then compresses channels, utilizing depth-wise separable convolutions to competently reduce the number of parameters.
Knowledge Distillation
Complex models often possess an extended parameter space, enhancing performance and generalization capabilities. Knowledge distillation (Hinton et al. Reference Hinton, Deng, Yu, Dahl, Mohamed, Jaitly, Senior, Vanhoucke, Nguyen and Sainath2012) leverages the knowledge acquired by complex models to guide the training of smaller models, thereby compensating for the limited expressive capacity imposed by the smaller scale of the latter. This leads to an improvement in the performance of smaller models.
Assuming a reliable teacher model is accessible, the student model can calculate the probability of each category output from the teacher model, denoted as the “soft label.” In contrast, the actual image labels are considered to be the “hard label.” Classification models typically utilize a softmax layer to compute the probability of each output category. The formula for this calculation is as follows, where qi represents the output probability of class i, and zi represents the output logit of class i.
Using the softmax output of the teacher model directly as the soft label is not a practical approach. This is because when the entropy for the probability distribution of the softmax output is low, the probability of the negative category label tends to be close to 0, and as a result, its contribution to the loss function becomes negligible. Therefore, a new variable called “temperature” can be introduced, and the softmax function can be calculated using the following formula, where T represents the temperature.
After introducing the temperature factor T, the soft targets produced by the softmax classifier largely preserve the probability relationships between different sample classifications.
The application of knowledge distillation requires both a teacher model and a student model, and the final loss is composed of the cross-entropy functions of both models, calculated through linear weighting. The training process is depicted in Figure 2. The soft loss can mitigate the overfitting of hard labels by student models (Cho and Hariharan Reference Cho and Hariharan2019), and the final loss function is represented as Equation 3, where pj represents the output of the teacher model; pi represents the output of the student model; y denotes the true label; CE stands for the cross-entropy function; and λ is the hyperparameter that adjusts the weighting of the loss function.
Experimental Environment and Procedure
This study examined three teacher models: ResNet101, DenseNet201, and EfficientNetB5; and three student models: ResNet18, DenseNet121, and EfficientNetB0. Compared with the teacher models, student models had shallower and less complex architectures. Using ResNet as an example, the subsequent numerical values indicate the diverse depths of various models. ResNet101 has a greater depth, consisting of 101 convolutional layers, while ResNet18 is relatively shallower, with only 18 convolutional layers (He et al. Reference He, Zhang, Ren and Sun2016).
In this study, a total of 48 image classification neural networks were trained and tested, comprising 8 teacher models and 40 student models with varying temperatures and structures. Initially, the weights of the teacher models, which were pretrained on the ImageNet dataset (Deng et al. Reference Deng, Dong, Socher, Li, Li and Li2009), were delivered to their corresponding model architectures using transfer learning. Three teacher models were subjected to fine-tuning to adjust their fully connected layer outputs for binary classification. To assess their performance, each teacher model was independently trained on two datasets of varying sizes: the larger dataset, designated as D-20k, and the smaller dataset, referred to as D-10k. This separate training approach allowed for a comparative analysis of the model’s performance across datasets of different scales. Subsequently, a knowledge distillation approach was employed to transfer the acquired knowledge to lightweight student models. The validation set accuracy and model stability were compared under different temperature settings to determine the optimal teacher and student models. The hyperparameters used for training in different experimental setups are presented in Table 2. All models were trained and tested on the open-source PyTorch deep learning framework (v. 1.8.1, Facebook, San Jose, CA, USA), which was installed on a workstation equipped with a GeForce RTX 3080 Ti GPU (NVIDIA) and 64 GB of memory.
a SGD, stochastic gradient descent.
Evaluation
For both teacher and student image classification neural networks, the assessment results were organized in a binary classification confusion matrix encompassing four outcomes: a true positive (TP), a true negative (TN), a false positive (FP), and a false negative (FN). TP indicates the count of correctly predicted weed-free samples, whereas TN represents the count of correctly predicted samples with weeds. TP and TN are indicators reflecting the true condition of weeds. Conversely, in cases where samples are actually infested with weeds, FP signifies the incorrect prediction of samples as weed-free, and FN signifies the incorrect prediction of samples as infested with weeds. TP and TN expose instances of prediction errors in recognizing the weed condition. The performances of the neural networks were evaluated using Accuracy (ACC), precision, recall, F1 score, and the Matthews’ correlation coefficient (MCC) via confusion matrices (Sokolova and Lapalme Reference Sokolova and Lapalme2009).
ACC measures the percentage of accurately classified samples within a specified dataset and was calculated using the following formula:
Precision measures the ability of the model to accurately detect the target and was computed using the following formula:
Recall measures the effectiveness of the neural network to correctly identify the target and was defined using the following formula:
F1 score measures the overall performance of the neural network and represents the harmonic mean of precision and recall, which was determined using the following formula:
MCC is a metric to quantify a predictive model’s performance quality. It provides a more balanced assessment, yielding values between −1 and 1. A score of 1 indicates a perfect prediction, 0 represents random predictions, and −1 signals total disagreement between the model and actual outcomes. It was calculated using the following formula:
Furthermore, frames per second (FPS) is a critical metric in the realm of computer graphics technology. It measures the number of individual images processed and predicted by a neural network model in a single second (Stewart et al. Reference Stewart, Nowlan, Bacchus, Ducasse and Komendantskaya2021). Higher FPS values result in faster image classification speeds, indicating more robust real-time processing performance.
Finally, model size was utilized as an important metric for comparing the parameter scale and complexity level of the teacher and student models.
Results and Discussion
Teacher Model Performance
In the present study, the three teacher models were trained on the D-20k and D-10k datasets, and the performance of weed detection was evaluated using the same validation and testing dataset, as shown in Tables 3 and 4. Additionally, Figure 3 illustrates the confusion matrices of teacher models on the testing dataset, providing a more detailed presentation of the model’s classification outcomes. In general, the performances of weed detection neural networks exhibited a minor improvement on the testing dataset relative to the validation dataset. Specifically, following training on the dataset D-20k, the accuracy for ResNet101, EfficientNetB5, and DenseNet201 showed increases of 0.8%, 0.4%, and 0.8%, respectively, when evaluated on the testing dataset. Similarly, when trained on the dataset D-10k, the models demonstrated accuracy improvement of 1.3%, 0.6%, and 0.7%, respectively. The three distinct teacher models maintained consistently exceptional performance, achieving ACC values of 0.974 or higher in distinguishing between turf and weeds. This outcome could be attributed to the characteristics of the dataset. Empirical observations suggested that dataset D-10k contained a sufficiently diverse array of images, thereby facilitating the effective adaptation of the models to the data.
a D-20k indicates the training dataset with 10,000 “Turf” class images and 10,000 “Weed” class images. D-10k indicates the training dataset with 5,000 “Turf” class images and 5,000 “Weed” class images.
b ACC, accuracy.
c MCC, Matthews’ correlation coefficient.
a D-20k indicates the training dataset with 10,000 “Turf” class images and 10,000 “Weed” class images. D-10k indicates the training dataset with 5,000 “Turf” class images and 5,000 “Weed” class images.
b ACC, accuracy.
c MCC, Matthews’ correlation coefficient.
d FPS, frames per second.
For the teacher model ResNet101, training on both datasets resulted in no significant differences in performance metrics. The ACC, precision, recall, F1 score, and MCC values of ResNet101 were consistent across the D-20k and D-10k datasets. Notably, the F1 score of ResNet101 on the testing dataset reached 0.987, marginally lower by 0.6% compared with that of EfficientNetB5 evaluated on the same testing dataset. These findings suggested that ResNet101 is less sensitive to dataset size and remains robust even with limited training data. Furthermore, ResNet101 exhibited a pronounced advantage in processing speed, operating at 554.5 FPS, compared with EfficientNetB5’s 375.2 FPS, in identifying and distinguishing sub-images with weeds, demonstrating an approximate 1.48-fold increase in processing speed. In summary, while EfficientNetB5 achieved the highest accuracy and F1 score, ResNet101 exhibited a significantly higher FPS, demonstrating superior processing speed. This trade-off between accuracy and speed is critical for real-time applications such as precision herbicide spraying, where timely detection is essential.
Figure 4 depicts the progression of ACC for each teacher model trained on datasets of varying sizes. Initially, all models demonstrated high ACC levels as a result of their pretraining on the ImageNet database (Deng et al. Reference Deng, Dong, Socher, Li, Li and Li2009), which endowed them with substantial generalization capabilities. Upon initiation of the learning process and adjustment of their weights, a rapid increase in the ACC curve was observed during the initial stage. Subsequently, after 30 epochs, the models exhibited marginal fluctuations before stabilizing at a level exceeding 95% accuracy. Notably, across both datasets D-20k and D-10k, the ACC values of the teacher models consistently surpassed 80%, with certain models even exceeding 98%.
Comparing Figure 4A and 4B, it is evident that the teacher model ResNet101 exhibited a consistently rising trajectory in its ACC curve when trained on both D-20k and D-10k datasets. Additionally, after 100 training epochs, the EfficientNetB5 model showed a 0.7% and 1% increase in ACC over the DenseNet121 and ResNet101 models, respectively, on the dataset D-20k. When trained on the D-10k dataset, the EfficientNet model showed an increase in ACC by 1.1% and 1.3% compared with DenseNet121 and ResNet101, respectively. During the stable period, the ACC curve of EfficientNetB5 consistently outperformed those of the DenseNet and ResNet models, potentially owing to its utilization of compound scaling. This method can effectively balance the model’s width, depth, and resolution, thereby maximizing the utilization of computational resources.
Student Model Performance
The outcomes of student models on the validation dataset after knowledge distillation under different T settings are illustrated in Table 5. Significantly, applying temperature was intended to balance soft and hard target losses (Cho and Hariharan Reference Cho and Hariharan2019). The optimal temperature for knowledge distillation was found to be 1 for the ResNet, DenseNet, and EfficientNet models, on both datasets D-20k and D-10k. These findings suggested that these models possess inherent complexity alongside robust generalization abilities, thereby enabling them to successfully perform classification tasks on turf–weed datasets. Therefore, there is no need for additional temperature factor adjustments in the student models to balance the capability and complexity. In certain instances, elevated temperature settings may even have a negative impact on model performance (Wei et al. Reference Wei, Zhang, Shi, Yang, Han and Li2022).
a D-20k indicates the training dataset with 10,000 “Turf” class images and 10,000 “Weed” class images. D-10k indicates the training dataset with 5,000 “Turf” class images and 5,000 “Weed” class images.
b ACC, accuracy.
c MCC, Matthews’ correlation coefficient.
The optimal temperature of 1 indicated that during the process of knowledge distillation, knowledge transfer between the teacher and student models occurred with a high degree of confidence. The student models endeavored to precisely replicate the prediction probability distribution of the teacher models, rigorously adhering to the decisions made by the teacher models. Overall, the optimal temperature of 1 observed across all three models signified that the student models effectively inherited and leveraged the knowledge from the teacher models, facilitating model deployment and application.
The fluctuations in ACC values during the training process of student models are depicted in Figure 5. At the same time, the teacher models (represented by the red lines) are also compared with relevant student models at different T settings. Considering Figure 5 from the perspective of varying T settings for the same model, it can be observed that under conditions T = 4 and T = 5, each student model exhibited notable fluctuations during the initial stages of training. Upon entering the stabilization phase, the ACC metrics were observed to be lower compared with those under alternative temperature settings. Additionally, the ResNet model displayed the highest initial ACC values, exceeding 0.85 with minimal curve fluctuation, indicating its superior efficacy in knowledge distillation.
The evaluation metrics of each student model at the corresponding optimal temperature on the testing dataset are documented in Table 6. Additionally, the confusion matrices for these models on the testing dataset are presented in Figure 6. It can be observed that the primary cause for model errors was the incorrect classification of sub-images belonging to the “weed” category as those devoid of weeds, with only minimal instances of cases where the sub-images containing turf only were erroneously identified as containing weeds. The result shows that the student models could reliably detect weeds growing on turf.
a D-20k indicates the training dataset with 10,000 “Turf” class images and 10,000 “Weed” class images. D-10k indicates the training dataset with 5,000 “Turf” class images and 5,000 “Weed” class images.
b ACC, accuracy.
c MCC, Matthews’ correlation coefficient.
d FPS, frames per second.
A comparative analysis between the data presented in Tables 4 and 6 reveals that the teacher model, ResNet101, had a model size of 340.8 MB. Conversely, following knowledge distillation, the student model ResNet18 had a reduced size of 260.2 MB. Additionally, after knowledge distillation, the EfficientNet model’s size decreased from 227.9 MB to 146.9 MB, and the DenseNet model’s size decreased from 146.3 MB to 130.1 MB. In all three cases, the model size was reduced, indicating a reduction in model parameters and a decrease in model complexity, demonstrating the effectiveness of knowledge distillation.
Relative to the teacher models, ResNet18, EfficientNetB0, and DenseNet121 exhibited a substantial increase in FPS on D-10k, with enhancements of 207.9, 315.5, and 55.9, respectively. This suggests improvements in model light-weighting, enhanced image processing speed, and improved computational efficacy. ResNet18 still had the highest FPS at 762.4, signifying faster inference rates and surpassing the other neural networks in real-time classification. Moreover, among the three student models evaluated at their respective optimal temperatures, ResNet18 exhibited superior performance. On the large dataset D-20k, the ACC, F1 score, and MCC values for ResNet18 were 0.991, 0.991, and 0.982, respectively. These values exceeded those of EfficientNetB0 by 0.9%, 0.9%, and 1.8% and surpassed the values for DenseNet121 by 0.6%, 1.1%, and 2.2%, respectively. On the small dataset D-10k, the ACC, F1 score, and MCC values for ResNet18 were 0.989, 0.989, and 0.978, respectively. These values surpassed those of EfficientNetB0 by 0.2%, 0.2%, and 0.4% and exceeded those of DenseNet121 by 0.9%, 0.9%, and 1.8%, respectively. Overall, the distilled student model, ResNet18, achieved a balance between ACC and efficiency, which was more appropriate for the binary classification task of turf–weed images.
Considering our results in comparison with other studies, Ghofrani and Toroghi (Reference Ghofrani and Toroghi2022) leveraged the knowledge distillation technique to improve the accuracy of a small client-side model in plant disease recognition, achieving a 97.58% ACC. Similarly, Wei et al. (Reference Wei, Zhang, Shi, Yang, Han and Li2022) applied knowledge distillation to the neural network training process, resulting in a 98.7% ACC on the Oxford102 flower dataset. On the other hand, Zhou et al. (Reference Zhou, Song, Song, Wen, Sun and Gao2023) developed a surface defect detection system for carrot (Daucus carota L.) combine harvesting based on multistage knowledge distillation, achieving an accuracy of only 90.7%. In contrast, the distilled student model used in this research, ResNet18, demonstrated competitive capabilities with an ACC of 98.9%. This model effectively balanced ACC and efficiency, making it particularly well suited for the binary classification task of turf–weed images.
Various herbicides, such as synthetic auxins (e.g., 2,4-D, dicamba, and MCPP) (McElroy and Martins Reference McElroy and Martins2013; Reed et al. Reference Reed, Yu and McCullough2013), acetyl-CoA carboxylase inhibitors (e.g., clethodim, sethoxydim, and fenoxaprop-P-ethyl) (McCullough et al. Reference McCullough, Yu, Raymer and Chen2016; Tate et al. Reference Tate, McCullough, Harrison, Chen and Raymer2021), as well as protoporphyrinogen oxidase inhibitors (e.g., sulfentrazone) (Brosnan et al. Reference Brosnan, Elmore and Bagavathiannan2020; Yu et al. Reference Yu, McCullough and Czarnota2018) are used for weed control in turf. Precise application can significantly reduce herbicide usage, lowering costs and mitigating adverse environmental impacts. In a recent study, Jin et al. (Reference Jin, Liu, Yang, Xie, Bagavathiannan, Hong, Xu, Chen, Yu and Chen2023) evaluated a smart sprayer prototype designed for precision herbicide application in turf. Significantly, their study employed DCNN models, which incorporated a vast array of parameters, and succeeded in achieving an F1 score exceeding 0.989 in identifying weeds in turf. However, the research did not extend to the real-time implementation of precision spraying, primarily due to the slow inference speed.
To the best of our knowledge, no previous research has explored the impact of knowledge distillation on the development of lightweight and efficient weed detection models. In the present study, our results suggest that the knowledge distillation approach from teacher models to student models offers three key advantages: (1) feasibility with a relatively small training dataset containing 5,000 images per class, (2) a substantial reduction in model size, and (3) the capability for real-time weed detection. Further research is needed to evaluate the performance of employing knowledge-distilled models in the machine vision subsystem of smart sprayers for real-time weed detection and precision herbicide spraying in turfgrass landscapes.
In summary, knowledge distillation can achieve superior weed detection performance in turf while balancing accuracy and efficiency. All three teacher models displayed no significant difference on different scales of datasets, including both the D-20k and the D-10k. Each of the three student models achieved the best performance when T = 1, indicating their reliable identification capabilities for weed detection in turf. Moreover, ResNet18 achieved higher ACC of ≥0.989 and MCC values of ≥0.978 and maintained higher FPS rates of ≥742.9. Both ACC and FPS metrics are essential in real-world scenarios for achieving accurate and efficient weed detection. Therefore, we conclude that the ResNet18 model delivered superior results and was better suited for succeeding deployment on resource-constrained devices, although the EfficientNetB0 and DenseNet121 models had smaller sizes. Compared with DCNN models with substantial computational workload, knowledge distillation can reduce model size and delay through teacher–student learning, thereby facilitating real-time weed detection and precision spraying. Additional research is ongoing to optimize the distillation algorithm and match different model structures for teacher–student models.
Funding statement
This work was supported by the Key R&D Program of Shandong Province, China (ZR202211070163), the National Natural Science Foundation of China (Grant No. 32072498), the Taishan Scholar Program, and the Weifang Science and Technology Development Plan Project (Grant No. 2024ZJ1097).
Competing interests
The authors declare no conflicts of interest.