Background
The delineation of the gross tumour volume (GTV) and organs at risk (OARs) on computed tomography (CT) images is an important part of the radiotherapy planning process. Failure to accurately define the GTV and OARs could result in a geographical tumour miss and unnecessary dose to normal tissue, eventually leading to poor local control and more side effects. Despite the importance of this process, the manual definition of the GTV and OARs is prone to observer variation and has often been described as ‘the weakest link in radiotherapy’ Reference Njeh1,2 . Various factors have been found to contribute to observer variation. The contouring difficulty depends on the structure being contoured. For example, some structures, like the lung and bladder, have clear boundaries that allow for consistent delineation, while others, such as the parotids and small bowel, present ambiguous borders, leading to higher variability in interpretation among different observers Reference Mercieca, Belderbos and Herk3 . Moreover, the healthcare professionals responsible for contouring may have different training backgrounds Reference Arculeo, Miglietta and Nava4,Reference Shessel and Rosewall5 , experience levels Reference Giraud, Elles and Helfre6 and visual perception skills, leading to variations in identifying and outlining specific structures. In many radiotherapy centres, the delineation of OARs is often delegated to radiographers or dosimetrists. However, the clinical oncologist is ultimately responsible for checking and approving the contours 2,Reference Arculeo, Miglietta and Nava4,Reference Shessel and Rosewall5 . Although this division of roles can streamline the planning process, it can further contribute to the variability in contouring practices across different institutions.
Studies have shown that protocols Reference Lin, Lapen and Sherer7 , training Reference Tsang, Hoskin and Spezi8 and peer review 2 can reduce the contouring variability of both tumours and OARs. Several automated tools have been developed to facilitate the contouring process. Atlas-based autosegmentation (ABAS) deforms atlases of anatomy previously defined on a reference image onto a new patient image to create a new structure set that fits the patient’s anatomy. Numerous studies have found that this method can reduce the delineation time and the interobserver variability in the delineation of some OARs Reference Tao, Yi and Chen9,Reference Hu, Byrne and Archibald-Heeren10 . Nevertheless, the main limitation of this technology is that the department needs to develop a library of plans to be used for the contouring process Reference Hu, Byrne and Archibald-Heeren10 . However, recently, the introduction of artificial intelligence (AI), in particular the development of deep learning technology, is showing great potential in addressing the challenges associated with manual contouring. Numerous deep learning-based commercial contouring solutions with pre-trained models are now available on the market Reference Robert, Munoz and Moreau11 . Studies have shown that deep learning technology can further improve the accuracy of the contours when compared with ABAS Reference Lustberg, van Soest and Gooding12,Reference van Dijk, Van den Bosch and Aljabar13 . However, despite the potential of AI technology in improving consistency and reducing the contouring time, the implementation of this technology in clinical practice has been slow mainly due to challenges encountered during the training and validation of these algorithms, variable accuracy results, lack of resources, staff acceptance and data protection issues Reference Robert, Munoz and Moreau11,Reference Victor Mugabe14 .
In our department, the radiographers perform the contouring of most OARs, while straightforward GTVs are contoured by higher specialist trainee doctors, and complex GTVs are delineated by the clinical oncologist. For complex cases, a radiologist is sometimes consulted if deemed necessary by the clinical oncologist. However, the clinical oncologist is ultimately responsible for checking and approving all the contours before treatment planning. A recent internal audit revealed significant levels of interobserver variation (average Hausdorff distance of 11.7mm) in the definition of the GTV for patients undergoing head and neck radiotherapy. This has triggered the need to perform a consultation process to evaluate what can be done to improve the contouring process at our local department. Therefore, this study aimed to evaluate the challenges encountered by radiographers working at the local oncology department during the contouring of specific OARs. In addition, the study also evaluated the perception of radiographers on techniques such as AI, training and peer review that could be used to address this problem.
Methodology
Research design
The study used a quantitative, prospective, non-experimental, cross-sectional research design with a self-administered survey.
Development of the data collection tool
The online questionnaire was developed by the researcher using the Google Forms platform. The questionnaire content was informed by relevant literature findings 2,Reference Mercieca, Belderbos and Herk3 and was divided into five parts (Supplementary Data 1). The initial section consisted of six questions designed to assess the professional background, contouring experience and training received by the radiographers. In the second section, the radiographers were asked to rate the difficulties encountered during the contouring of specific OARs. In addition, the radiographers were also asked to rate their agreement with specific statements related to contouring challenges (e.g., delineation time, workflows and subjectivity). The third and fourth sections assessed the current contouring tools used by the radiographers and their perception of the use of AI software in delineating OARs. In the last and final section, the radiographers were asked to indicate their agreement with the introduction of measures (e.g., time allotted to the task, training, peer review and the introduction of AI software) that could be used to facilitate the contouring process. In addition, an open-ended question was included to allow participants to offer further insights and suggestions regarding methods that could be implemented to facilitate and improve the quality of OARs delineation at the local radiotherapy department. Sections A, C and the last part of section D consisted of multiple-choice questions, while sections B, the first part of section D and section E consisted of Likert scale-type questions whereby the radiographers were asked to rate the difficulty levels or agreement to specific statements using a 5-point Likert scale ranging from 1 (not difficult/do not agree) to 5 (very difficult/strongly agree).
Validation of the questionnaire
Four academic radiographers (two Maltese, one from Ireland and one from England) with more than 5 years of experience were asked to evaluate every question of the questionnaire for clarity and relevance in relation to the aims and objectives of the study using a Likert scale ranging from 1 (not clear/not relevant) to 4 (very clear/ highly relevant). Furthermore, the lecturers were asked to make suggestions to improve the questionnaire. The content validity index for each question (I-CVI) was then used to determine the level of agreement between the four lecturers. The I-CVI was calculated by dividing the number of items rated as relevant (3) or highly relevant (4) by the total number of experts. The average content validity of the questionnaire was calculated by dividing the total I-CVI by the total number of questions. The CVI score ranges from 0 to 1, and a score of 0.78 or higher indicates a good level of agreement Reference Gilbert and Prion15 . All items within the questionnaire had an I-CVI of 0.8 or higher, and the average CVI for the instrument was 0.88, which indicates that the questionnaire is relevant. The overall clarity score for the questionnaire was 0.83. However, in response to the expert comments, certain questions were reworded to improve clarity.
Reliability testing and pilot study
Prior to the data collection process, two radiographers were asked by The Society of Radiographers Malta (SRM) to test the questionnaire for intra-rater reliability. The radiographers were asked to complete the same questionnaire twice within a 1-week interval. Given that the sample size for the study was predicted to be small, the same radiographers who tested for questionnaires for reliability were also used as part of the pilot study to ensure the smoothness of the data collection process.
Kendall’s tau test was used to assess intra-rater reliability. The Kendall’s tau test resulted in a coefficient (τ) of 0.98, with a p-value less than p < 0.001, indicating satisfactory test–retest reliability. The radiographers did not comment on the questionnaire or data collection process; thus, no changes were made to the questionnaire 16 .
Data collection
The target population for this study consisted of all radiographers (therapy or dually qualified) working at the radiotherapy department in Malta. The accessible population included all members of the target population who were willing to complete the online questionnaire. The data collection was conducted between February 2023 and March 2023. The SRM acted as an intermediary for the study and invited all radiographers working in Malta to participate in this study via email. The email explained the purpose of the study and provided the radiographers with a weblink to complete the questionnaire. No incentive was offered to complete the questionnaire. However, multiple reminders were distributed by the SRM to improve the response rate.
Ethical considerations
The Faculty of Health Science Research Ethics Committee (FREC) of the University of Malta granted approval to perform this study. No personally identifiable information was collected in the questionnaire, and completing the questionnaire constituted consent to participate in the study.
Data analysis
The Statistical Package for Social Sciences software was used to analyse the data. The categorical data were summarised as percentages, while the continuous data were summarised as means ± standard deviations. The Kruskal Wallis test was used to evaluate the impact of experience (less than 5 years, more than 5 years) and frequency of practising contouring (rarely, occasionally, almost every day and every day) on the OARS mean difficulty score. For all statistical tests, a p-value below 0.05 was deemed statistically significant. The open-ended question was evaluated using content analysis.
Results
The professional background of the participants
The local radiotherapy department currently employs 29 radiographers, of whom 23 (79.3%) completed the questionnaire. The professional characteristics of the participants are summarised in Table 1. Most of the participants (65.22%) had more than 5 years of experience working as radiographers at the local oncology hospital. However, most participants (52.17%) declared that they were occasionally assigned contouring duties. Only three (13.04%) of the participants attended additional post-graduate training on contouring. These courses were provided by professional bodies (n = 2) or were conducted as part of a Master of Science in Radiotherapy (n = 2).
Perceived difficulty level contouring specific OARs
The mean difficulty rating score for all OARs was 1.95 ± 0.33, which indicates that the radiographers identified slight difficulties when contouring the OARs. However, some OARs, such as the parotid, stomach and brain stem, were found to be somewhat difficult to contour, with a mean rating score of 3 or higher. The mean OARs difficulty delineation score for each organ is shown in Fig. 1.
Factors affecting the perceived delineation difficulty for specific OARs
Table 2 summarises the impact of experience on the perceived difficulty level contouring specific OARs. Overall, the radiographers with more than 5 years of experience found the contouring of OARs slightly less difficult than radiographers with less than 5 years of experience (mean score: 2.03 versus 1.87 p = 0.241). However, the difference was not statistically significant. The same pattern was observed for the majority (14/20) of the OARs.
OAR, organs at risk; n, number of participants; Std, standard deviation. A score of 1 indicates not difficult and a score of 5 indicates very difficult.
Table 3 summarises the impact of duty time allotted to contouring on the perceived difficulty level of contouring specific OARs. The radiographers who were frequently (mean score: 1.82 ± 0.3) assigned contouring duties found the contouring of OARs slightly less difficult than radiographers who were assigned contouring duties occasionally (mean score: 1.88 ± 0.21) or rarely (mean score: 2.38 ± 0.35). However, the difference was marginally statistically significant (p = 0.055).
OAR, organs at risk; n, number of participants; Std, standard deviation. A score of 1 indicates not difficult and a score of 5 indicates very difficult.
* indicates statistically significant.
Evaluation of factors that lead to contouring challenges
The time required to contour OARs was perceived as the most challenging factor (mean score 4.52 ± 0.65), followed by distraction with other duties (mean score 4.38 ± 0.70), long screen time (mean score 4.00 ± 0.64) and subjectivity of contouring (mean score 3.90 ± 0.55). Overall, the radiographers agreed that the time allotted by management to perform the task was sufficient (mean score 3.90 ± 0.62). With regard to the mean rating score related to the sufficiency of experience, training and image quality and the suitability of the contouring software, the mean rating score was shifted towards the neutral side with a mean score ranging from 3.14 to 3.44 (Fig. 2).
Tools used by the radiographers to contour OARs
All radiographers made use of manual and semi-automated tools to contour the OARs. None of the radiographers used AI-based software to contour OARs. However, the majority (n = 18, 78.3%) of the radiographers reported being aware of the availability of AI software that could be used to contour OARs, and only 5 (21.74%) reported having used the Elekta ABAS AI software, which is available at the department. However, it is interesting to note that the ABAS software does not use an AI algorithm to contour OARs but rather a deformable image registration software that deforms the anatomy from a user-defined reference atlas onto a new patient’s CT data. Two of the radiographers rated this software as good, and the rest rated it as average.
Radiographers’ perception of the use of AI software for contouring OARs
Although this technology was not used locally, most radiographers (n = 18, 78.26%) were aware of the existence of AI software for contouring OARs. The perception of radiographers on the use of AI software is summarised in Fig. 3. Overall, the radiographers agreed that the AI software was too laborious to train. However, the radiographers disagreed that AI software is generally very accurate and that the contours generated required minimal editing. The radiographers did not find data protection an issue for the use of AI software. Moreover, they indicated that they lacked the training and confidence to use such software.
Techniques that could be used to facilitate the contouring process
The radiographers agreed that the introduction of peer review, training on contouring, training on AI, internal and external audits, more detailed protocols and the purchasing of automated contouring software could facilitate and improve the contouring of OARs with all statements receiving a mean score of 3.9 or higher (Fig. 4).
A total of seven participants provided additional suggestions to improve the contouring of OARs at the local oncology department. One participant emphasised the necessity of adhering to national contouring guidelines. In addition, three respondents emphasised the importance of additional training on contouring and AI. Four of the participants highlighted the need to introduce AI contouring software, with one of the participants stressing that this is now becoming increasingly important with the imminent implementation of the MR-Linac within the local radiotherapy department.
Discussion
The challenges radiographers encounter during the definition of OARs can vary between departments. Since the delineation of OARs is an emerging role for radiographers, very few studies have been published on the matter Reference Oliveira, Barbosa and Couto17,Reference Oliveira, Barbosa, Couto, Bravo, Khine and McNair18 . In this study, we performed the first survey to evaluate the contouring challenges encountered by radiographers at the local oncology hospital in Malta. The findings from our survey hold significant implications for informing the hospital management about the necessity of implementing novel workflows, training programmes and best practices in OAR delineation, particularly with the imminent introduction of the MR-Linac.
As expected, the radiographers found more difficulties contouring OARs with more complex shapes and boundaries, such as the parotid, small bowel, stomach and brain stem, when compared with structures with less complex shapes and boundaries, such as the femur, bladder and body outline. Studies have found that the level of experience Reference Giraud, Elles and Helfre6 and professional background of clinical oncologists Reference Peng, Chen and Shen19 can impact the consistency of the contours. In our studies, radiographers with less than 5 years of experience and those who were rarely assigned delineation of OARs duties found the contouring task slightly more difficult. However, the difference was not statistically significant for both parameters, possibly due to the small sample size.
All radiographers received most of their OAR contouring training during their undergraduate course programme. Only three radiographers pursued further post-graduate training, either through a Master’s degree programme or by enrolling in a course provided by a professional body. Due to the limited number of participants, it was not possible to statistically evaluate the impact of training on the challenges encountered contouring OARs. It is important to acknowledge that multiple training interventions may be required to reach an adequate level of competency in contouring Reference Konert, Vogel and Everitt20 . Moreover, despite the wide range of contouring courses available, most contouring training courses have been predominantly targeted towards oncologists, and there is a notable scarcity of courses specifically designed to enhance radiographers’ contouring skills. As a result, radiographers may have limited opportunities to receive specialised training in this critical aspect of radiation therapy planning Reference Oliveira, Barbosa and Couto17 .
The radiographers found the introduction of peer review as the most needed intervention to improve the quality of the OARs delineation in the department. While multiple professional bodies advocate for the implementation of peer review to reduce the occurrence of gross contouring errors and facilitate training interventions to address contouring inconsistencies, not all departments possess adequate resources to incorporate this process into routine clinical practice 2,Reference Chera, Mazur, Adams and Marks21,Reference Caissie, Rouette and Jugpal22 .
The long contouring time and the long screen time were identified as the main challenges when contouring OARs. More frequent staff rotations could enable radiographers to maintain their skills while minimising the fatigue caused by prolonged screen time. Studies have shown that the introduction of AI-based contouring may also reduce the contouring time and reduce the observer variation Reference Lustberg, van Soest and Gooding12,Reference van Dijk, Van den Bosch and Aljabar13 . Although most radiographers were aware of the availability of automated software to contour OARs, only five radiographers reported using such software. However, when asked to elaborate, the radiographers stated that they used ABAS, which does not use AI to delineate OARs. A licence for ABAS has been available in the department for many years. Nevertheless, this software was never implemented clinically as the vendors did not provide atlases with the system, and the radiographers lacked the expertise and found the software too laborious to train. Overall, the radiographers did not have highly positive views on using AI for contouring OARs. The radiographers agreed that AI models are very laborious to train and often generate contours that lack accuracy and require too much editing. This negative perception might have been based on the department’s experience using ABAS. Recent studies have shown that deep learning algorithms can reduce the delineation time and improve the accuracy of OARs contours compared to manual contouring Reference van Dijk, Van den Bosch and Aljabar13 . Moreover, AI software vendors are now providing trained AI models that can be more easily implemented within the radiotherapy workflows Reference Robert, Munoz and Moreau11 .
The findings of our survey carry significant implications for the clinical practices at our local oncology department. Survey responses underscore the imperative need to establish an ongoing training programme within the department aimed at ensuring that all radiographers maintain their competency in this specialised field. To effectively design and implement this programme, collaboration among oncologists, physicists and radiographers with expertise in this domain is essential. The proposed training course could be structured as a self-study programme, wherein radiographers are tasked with contouring OARs across a variety of clinical cases. In order to enhance learning, automatic feedback mechanisms should be integrated, utilising quantitative metrics to provide radiographers with real-time performance evaluations. Radiographers who do not meet the specified criteria for proficiency in contouring should receive additional training and guidance from senior radiographers or clinical oncologists. In addition, regular contouring audits should be conducted as part of the quality assurance programme of the department. These audits would enable the management to identify specific areas that require improvement within the department’s contouring practices. Furthermore, whenever feasible, OAR contouring should be a focal point of discussion during multidisciplinary team meetings. Finally, as AI-based contouring is expected to become more established, there is clearly a need for the hospital to invest in this technology and provide training to enable radiographers to use this technology safely.
This study has some limitations that have to be acknowledged. The small sample size in our study may have limited the statistical significance of our research findings, particularly for the subgroup analysis. Moreover, we did not gauge the perception of dosimetrists and doctors on the contouring challenges encountered while contouring OARs. Therefore, larger multicentre studies that include all professionals involved in the contouring process are required to improve the generalisability of the research findings. Although some open-ended questions were included, the survey was predominately based on closed-ended questions, which may have limited the collection of in-depth information about the subject. In order to avoid having a very long questionnaire, some of the questions aimed at gauging a general overview of the contouring challenge for all OARs rather than asking detailed questions for each OAR. As a result, we could not gain a full understanding of the specific challenges encountered by the radiographers for the delineation of each of the OARs. Finally, surveys rely on the participants’ perceptions and self-assessment of their contouring practices. Without direct observation, it is not possible to validate contouring accuracy or identify discrepancies between self-reported practices and actual behaviour. It is important to note that the majority of the questionnaires were completed by radiographers who were not regularly involved in delineation and lacked previous exposure to ABAS or other deep-learning contouring systems. As a result, some participants may have had difficulty recalling the challenges encountered during the contouring process. Moreover, they may have provided an inaccurate rating for the system based on perception rather than their actual experience or true knowledge. Therefore, further research is recommended to assess the interobserver variation within the department and to assess the impact of introducing peer review and training on the quality of the OARs contours. Further studies should make use of interviews to gain more in-depth information about the subject. Moreover, workshops could be performed to enable the radiographers to gain a better understanding of this technology. Further studies should be performed to assess the true impact of these workshops on the radiographers’ confidence and contouring accuracy should be assessed.
Conclusion
Overall, the radiographers encountered slight difficulties when contouring OARs. Nevertheless, some OARs, such as the parotid, stomach and brain stem, were deemed more difficult to contour. The introduction of peer review and training were identified as the two key factors required to improve the quality of the contours. The introduction of AI-based contouring software could also reduce the delineation time. However, training is essential to facilitate its smooth introduction into clinical workflows.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1460396923000432.
Author contributions
Mr. Antoine Attard collected the data, analysed the data and wrote the manuscript. Dr. Susan Mercieca supervised the project and edited the final manuscript.
Competing interests
The authors have no conflict of interest to declare.