Introduction
Adaptive radiotherapy (ART) accounts for inter-fractional anatomical change, e.g. weight loss, primary tumour and lymph node growth or shrinkage and set-up issues. ART aims to preserve target coverage whilst reducing overdosage of healthy tissue and organs at risk (OARs). Reference Brock1 It is commonly required for head and neck (H&N) treatments, where significant weight loss is observed. Reference Chen, Daly and Cui2–Reference Barker, Garden and Ang7 However, weight loss and other demographic/disease/treatment factors are not reliable predictors. Reference Figen, Çolpan Öksüz and Duman8 In addition, the timing of when replanning is judged to be required during a course of treatment is variable. Reference Figen, Çolpan Öksüz and Duman8 There is a paucity of guidance on methods for identifying ART patients which has led to differing implementations. Studies show replan rates varying between 5 and 25%. Reference Figen, Çolpan Öksüz and Duman8–Reference van Beek, Jonker and Hamming-Vrieze11 A survey assessing ART adoption found lack of staff resources to be the primary reason for low replan rates, Reference Lee, SchettIno and Nisbet12 and methods to automate ART are of great interest. Reference Alves, Dias and Rocha13 One proposed technique is a ‘dose of the day’ calculation Reference Veiga, McClelland and Moinuddin14,Reference Allen, Yeo and Hardcastle15 where the original plan is recalculated on the patients’ cone-beam CTs (CBCTs) to verify suitability. This approach faces two main obstacles: limited accuracy of CBCT Hounsfield unit (HU) information and requirement for auto-segmentation.
CBCT HU accuracy is degraded by lower image contrast and artefacts. Reference Washio, Ohira and Funama16 This leads to uncertainty in electron density (ED) conversions and hence dose calculation. Several approaches for generating ‘synthetic CT scans’ (sCT) are proposed, which synthesise ED information for the new geometry. Reference Allen, Yeo and Hardcastle15,Reference O’Hara, Bird and Al-Qaisieh17,Reference Chen, Liang and Shen18 These include atlas-based, bulk-density assignment, deformable registration and machine learning.
The second barrier is the requirement for contour generation. Manual segmentation remains the gold standard for target and OAR delineation; however, there is significant research effort towards automatic segmentation due to its increased efficiency. Two commonly employed techniques are deformable registration, Reference Castadot, Lee and Parraga19,Reference Al-Mayah, Moseley and Hunter20 and deep learning (DL). Reference Harrison, Pullen and Welsh21
A recent study at this centre created a proof-of-concept pipeline aimed at identifying prostate patients who require ART. Reference Russell, O’Hara and Andersson22 It utilised DL to generate sCT scans, contours were deformably copied from the pCT, and the original plan was recomputed on each sCT. The DVH statistics from each fraction were analysed to identify dosimetric changes. When applied to retrospective patients, the pipeline showed high sensitivity (0·83) and specificity (0·88) relative to the clinical decision.
In this work, we assess the feasibility of a pipeline for the more complex H&N cancers. These treatments have more complex target volumes, often with multiple dose levels. High dose volumes are often close to critical OARs and scans may be complicated by dental implants, artefacts, and bolus. Another significant difference is anatomy outside of the CBCT field-of-view (FoV), typically the shoulders and top of the head, which affects dose calculation accuracy. The rationale for undertaking this work is due to the significantly more challenging anatomy in the H&N compared to the prostate which prevents the conclusions from the prostate study being applicable to H&N treatments. This work aimed to develop the pipeline further, such that it was suitable for applying to complex H&N anatomy, including resolving issues such as limited FoV CBCTs and bolus, and to assess the technically achievable, accuracy and clinical utility of using the pipeline in a real-world setting.
Method
Patient selection and data acquisition
One hundred retrospective patients, randomly selected from 2016 to 2023 using a clinical database interrogation tool, were assessed for inclusion in this study. Inclusion criteria included all patients who received external beam radiotherapy for radical H&N treatments of any cancer type, stage and dose prescription, ensuring any patient eligible for ART in the clinical workflow would be eligible in this study. No criteria were placed on whether patients were undergoing chemotherapy, or had feeding tubes.
Exclusion criteria included patients treated with non-VMAT beams, rescanned due to inability to tolerate treatment, did not finish treatment, required modifications to their bolus during treatment and those with incomplete CBCT data. From the 100 cases, 14 eligible non-ART cases were identified, and 12 eligible ART cases were identified. It was considered that 26 patients would sufficiently demonstrate the proof-of-principle feasibility of the pipeline. At this centre, patients receive daily CBCT scans for the first four fractions, followed by weekly CBCTs. For patients where significant anatomical change is identified or the original plan is close to specific OAR tolerance, daily imaging may be performed.
Planning CT scans were acquired on a Philips brilliance big bore scanner with parameters 120kVp, 106mAs and 1·2 × 1·2 × 2 mm resolution. CBCT scans were acquired on an Elekta XVI system with parameters 120kVp, 20mAs, 1·0 × 1·0 × 1·0 mm resolution and s20 filter. All plans were 6 flattening filter-free (FFF) VMAT, planned using Monaco treatment planning system (TPS). The current adaptive pathway is that on-treatment cone-beam CTs (CBCTs) are visually compared against the original pCT. If changes are suspected to be significant, the images are reviewed offline by a multidisciplinary team (MDT) of physicists and clinicians. If critical organs are potentially over tolerance or target volume coverage is suspected to be underdosed, a rescan-CT is undertaken, and contours are propagated onto the rescan-CT and amended by the treating clinician before replanning.
Pipeline construction
The pipeline was written in Python and designed to run within the RayStation 12A TPS scripting interface, following five steps:
-
1. Rigid and deformable registrations are created between the pCT and each CBCT.
-
2. sCTs generated from each CBCT using RayStation’s ‘Virtual CT’.
-
3. Contours propagated from the pCT to the sCT.
-
4. The original plan was recomputed on each sCT.
-
5. Dosimetric assessment was performed.
During step 1, a FoR registration was created between the pCT and each CBCT by importing the on-set shifts. This more accurately modelled the dose delivered over the course of treatment by modelling variations in patient set-up. Deformable registrations were then created, deforming the pCT to each CBCT.
During step 2, CBCT scans were converted into ‘Virtual CTs’ using inbuilt RayStation functionality. This generates a joint histogram between the pCT and the CBCT, creating a conversion between CBCT intensity and HU intensity. An intermediate ‘Corrected CBCT’ is generated using this conversion. The deformable registration between the pCT and CBCT creates a deformed CT. Mismatching low-density regions in the deformed CT are replaced with values from the ‘Corrected CT’ producing a ‘Virtual CT’. This contains modelling of anatomy outside of the CBCT FoV, whilst correcting for low signal artefacts in CBCT images. 23 The dosimetric accuracy of this technique has been validated in a previous study. Reference O’Hara, Bird and Al-Qaisieh17
For patients with bolus, to ensure it was correctly modelled and prevent deformation on the virtual CT, an additional ‘expanded bolus’ structure was created, expanding the pCT bolus structure by 1cm in all directions, cropped 0·2cm from the external. This was rigidly copied onto each CBCT and used as a controlling ROI for the generation of deformable registrations and sCTs.
During step 3, OAR and CTV contours were propagated onto the sCT using deformable image registration. This method was considered sufficiently accurate after an independent evaluation of generated contour accuracy versus manually clinician-drawn contours, where the generated CTV and OAR contours had DICE scores versus manual contours >0·8. The results are not included in this manuscript.
In step 4, the original plan was recomputed on each sCT.
In step 5, the DVH statistics were extracted and compared against pre-defined goals to determine clinical acceptability of the computed dose distribution.
Table 1 shows mandatory clinical goals used for treatment planning in this centre, and assessed by this pipeline, using CTVs rather than PTVs to assess coverage. This accounted for the role of the PTV in ensuring the CTV receives its prescribed dose despite random set-up errors that would cause the goals to fail, rendering the pipeline ineffective. Similarly, only OAR constraints were analysed rather than planning organ at risk (PRV) constraints. A fraction was considered to have failed if ≥1 goal was not met. H&N plans commonly include elective nodal volumes prescribed to lower doses. To differentiate, the prescription dose CTV is represented by ‘CTV’, whilst all lower dose CTVs are represented by ‘Elective CTV’.
Table 1. Mandatory goals assessed by the pipeline. CTV and elective CTV represent the different dose levels—primary CTV and elective nodal CTVs, respectively

Pipeline validation against rescan-CTs
Pipeline performance was validated by comparison against patients’ rescan-CTs. These contained contours manually delineated by clinicians and none of the uncertainties associated with sCTs. They represented the most accurate model of the patient’s geometry during treatment.
For this assessment, the dataset was restricted to the 12 patients who received ART and hence had a second CT scan. The rescan-CTs for each patient were imported into the TPS, rigidly registered to the original pCT and the original plan recalculated. The same goals applied by the pipeline (Table 1) identified whether the original plan would still be within tolerance if delivered to the rescan-CT. This created a ‘gold standard’ marker for whether the patient required ART.
A receiver operating characteristic curve (ROC) quantitatively assessed the pipeline’s predictive power relative to this new ‘gold standard’. The ROC curve plotted the sensitivity (true positive rate) and 1-specificity (false positive rate) of the pipeline, comparing the gold standard rescan-CT result for each threshold option, where the threshold is the number of failed sCTs (scans where at least one goal failed) required for the pipeline to recommend ART. The area under curve (AUC) provides a quantitative measure of the pipeline’s predictive power. Sensitivity and specificity were then independently plotted to identify the optimum threshold.
Clinical assessment
Data from all 26 patients were used to assess the clinical utility of the pipeline and determine whether it could identify which patients would benefit from ART. For non-ART patients, every CBCT scan was analysed, whereas for ART patients only CBCTs prior to the decision to replan were assessed. The previous analysis was repeated with the pipeline output compared against the clinical decision made (ART or non-ART) rather than the ‘gold standard’ result, where the clinical decision was based on subjective visual assessment of anatomical changes versus the pCT.
Results
Pipeline assessment against rescan-CT
De-identified patient demographics including tumour stage and treatment details (dose and fractionation, chemotherapy, use of bolus and CBCT schedule) can be found in the supplementary information. Figure 1 shows the final pipeline output for each ART patient. Each square represents a CBCT scan with the x-axis corresponding to the fraction when the scan was acquired. The square colour represents the number of mandatory goals the pipeline identified as failing.

Figure 1. The fraction on which rescan patients received cone-beam CT scans and the result of the pipeline assessment. The colour of each box represents the number of failed mandatory goals identified by the pipeline in that fraction. Anonymised patient IDs have been re-coloured to represent the outcome of the ‘gold standard’ rescan-CT assessment where green denotes pass and red denotes fail. Each patient had a rescan-CT after the fraction number of their last coloured box, for example, ANON17 had a rescan-CT after fraction five and before fraction six.
Anonymised patient IDs have been coloured to represent the ‘gold standard’ rescan-CT assessment result. If ≥1 goal failed when the original plan was recalculated on the rescan-CT, the patient ID is shown in red. Of the 12 ART patients, 5 did not fail any goals when recalculated on the rescan-CT.
Figure 2(a) shows an ROC curve assessing the pipeline’s predictive power compared to the rescan-CT assessment. The AUC was found to be 0·78, representing good predictive power. By plotting the sensitivity and specificity at each threshold value (Figure 2(b)), the optimum threshold was found to be 1 failed CBCT (sensitivity 0·83 and specificity 0·67).

Figure 2. (a) An receiver operating characteristic curve curve assessing the sensitivity and specificity of the pipeline against the rescan result at different thresholds. Thresholds are the number of failed cone-beam CT scans that would correspond to a replan result. (b) Shows how the sensitivity and specificity vary with applied threshold.
The failed goals identified by the pipeline were also compared to the goals identified on the ‘gold standard’ rescan-CT assessment. For patients identified as requiring ART by both the pipeline and the rescan assessment, at least one failed clinical goal matched in every instance.
Clinical assessment
Figure 3 shows the pipeline result for every patient. The top 14 patients (whose IDs are shown in green) did not receive ART, whereas the bottom 12 patients did.

Figure 3. The fraction of patients who received cone-beam CT scans and the result of the pipeline assessment. The top 14 patients (Anonymised IDs shown in green) did not receive a replan, whereas the bottom 12 patients (IDs shown in red) did receive a replan. Note that the green and red colours on the patient IDs have different meanings in this plot and Figure 1.
A ROC curve (Figure 4(a)) compared the pipeline output with the clinical decision made (ART or non-ART). The AUC was 0·48, suggesting the number of failed CBCTs identified by the pipeline had no predictive value over whether a patient was identified clinically for ART. Using the same pipeline threshold of 1 failed CBCT, the pipeline sensitivity and specificity were 0·58 and 0·47, respectively, compared to the clinical decision.

Figure 4. (a) An ROC curve assessing the sensitivity and specificity of the pipeline against the clinical decision at different thresholds. Thresholds are the number of failed CBCT scans that would correspond to a replan result. (b) Variation of the sensitivity and specificity with different pipeline thresholds.
To further investigate the current ART process, the clinical goals identified by the pipeline as failing have been plotted in Figure 5, showing the number of CBCT scans identified as failing each goal, with each bar split into unique patients.

Figure 5. A histogram showing the number of patients identified by the pipeline as failing each mandatory goal. Goals failing on patients who were identified clinically as requiring ART are shown in orange and goals failing on patients who did not receive ART are shown in green. Each bar has been broken up to represent the number of unique patients, e.g., 1 ART patient had 6 scans with CTV D2% failed clinical goals, one ART patient had 2, and 2 ART patients had just 1 scan with poor coverage.
Only 4 unique failure clinical goals were identified as failing over the entire patient cohort. One relates to poor nodal coverage, one to high spinal canal dose, and 2 to excessive doses in the primary CTV. Of the 6 patients with CTV D2% failed clinical goals, 4 received ART clinically, including both patients who showed failed goals on multiple CBCT scans. Failed goals relating to high doses to the spinal canal were only identified in non-ART patients. Twenty-seven CBCT scans were identified with this concern; however, they largely originated from 3 patients’ treatments. All patients with CBCTs showing D50% exceeding 2% of the pCT value were identified for ART and two-thirds of patients with hot spots in the CTV.
Conclusions
The fundamental challenge in this study, and similar investigations, is the lack of a reliable gold standard against which to compare. On-set clinical decisions can be highly subjective, and often, there is no definitively correct answer about whether (and when) a patient should receive ART. Many confounding factors may contribute, which are not modelled by this pipeline, such as poor mask fit, or the timing of anatomical changes. A patient exhibiting weight loss near the end of treatment is unlikely to be rescanned unless changes are large; however, a smaller change may instigate ART if observed earlier. Other factors, including a clinical assessment of the likelihood of ongoing weight loss or whether changes will stabilise, are difficult to model.
Due to the subjectivity of the current decision process, a new ‘gold standard’ was defined by recalculating the original plan on patients’ rescan-CTs. Since they contain none of the dosimetric uncertainties associated with sCT generation and contain clinically approved contours, they represent the most accurate measure of a plan’s suitability. It highlighted the subtleties associated with ART decision-making, since 5/12 ART patients showed no failed goals when the original plan was recalculated on their rescan-CT.
Pipeline assessment against rescan-CTs
When the pipeline output was compared against this new ‘gold standard’ result, it showed good predictive power (AUC 0·78, optimum sensitivity 0·83, and specificity 0·67. In cases where the pipeline and rescan both identified failed clinical goals, ≥ 1 goal matched in 100% of cases. This gives confidence in the pipeline’s performance, suggesting its use in improving ART assessments. The main limitation of this ‘gold standard’ is the potential for different patient positioning between the pCT and rescan-CT. The rescan-CT represents a snapshot in time and, unlike the CBCTs, does not account for variations in patient set-up over the treatment. Lastly, the assessment relies on the selected clinical goals representing a complete summary of the ART requirements.
Clinical assessment
When the pipeline performance was evaluated over a wider cohort containing ART and non-ART patients and assessed against the clinical decision, it showed no predictive power (sensitivity 0·54, specificity 0·47). This is interesting given our confidence in the pipeline performance and that previous work Reference Russell, O’Hara and Andersson22 demonstrated significant predictive power for prostate patients. Here, there was substantial additional complexity in the pipeline simply due to the anatomical differences in H&N volumes versus prostates, but also through resolving challenges such as anatomy outside of the FoV and the addition of bolus within the pipeline. These additional complexities in assessing H&N cases, compared to prostates, clearly have increased the subjectivity of the current decision-making processes and introduced inconsistencies between the priorities and goals recommended for this pipeline and the justifications employed on-set. One potential benefit of this pipeline could be identifying patients who require ART, but may not be identified through current clinical processes.
Comparison of the failed goals identified gives insight into the sensitivities of the current process. It showed good sensitivity to poor nodal coverage (Elective CTV D95% ≤ 95%) and generic weight loss (CTV D50% ≥ pCT + 2%).
Failed clinical goals relating to high spinal canal dose were only identified in non-ART patients. Upon manual inspection, these dose distributions were safely outside the spinal cord of all patients; however, the failed clinical goals represent the cautious approach to contouring vital structures at this centre. In total, 3/4 non-ART patients who failed spinal canal clinical goals received periods of daily imaging due to concern, contributing to the higher number of failed clinical goals. This could suggest this is a risk our current practices lack sensitivity to. Patients 3 and 10 are interesting, with both patients receiving daily CBCT due to close proximity of high doses to the optics (54 Gy) and spinal canal (48Gy), respectively. Patient 3’s daily CBCT worked well to optimise patient position and prevent OARs from breaching their tolerance, whereas patient 10 had an erratic set-up, and struggled with their mask. Clinical notes showed that the clinical judgement accepted a higher dose tolerance to the OARs given the patient’s challenges. In both cases, the pipeline would have provided quantitative data to support clinical decision-making, demonstrating its value.
In total, 16/20 CTV D2% of clinical goals failing in non-ART patients originated from the same patient. On closer inspection, this hotspot partly originated from a variation in bolus construction, which was slightly smaller than originally planned. This shows potential for such pipelines to identify errors beyond shape change, such as poor bolus creation and set-up errors.
Challenges and future perspectives
One key challenge when designing this pipeline was the speed and unpredictability of weight loss. Two patients identified by the rescan assessment as requiring ART received just one failed CBCT immediately before the rescan request, suggesting rapid weight loss. These patients contribute to the low optimum pipeline threshold of one failed CBCT. This threshold is too sensitive to be employed clinically since poor positioning on a single fraction would cause the pipeline to recommend ART. Also, a limitation of the pipeline is that it assesses patients on a fraction-by-fraction basis. In some cases, failed sCTs will occur towards the end of treatment; however, if considering the cumulative whole course doses, it is highly likely that the goal would be met.
In this analysis, only CBCT scans acquired prior to the decision to replan were assessed, thereby modelling information available at the time of decision. However, when all treatment fractions were included in our analysis, the sensitivity increased to 1.0 for thresholds of one and two CBCTs, as more patients had multiple failed CBCTs during the time between identification for ART and a new plan being approved. The specificity was unchanged. This implies a higher sensitivity to these changes locally than the pipeline could provide if a higher threshold were selected.
Future pipeline modifications could increase sensitivity to these changes by assessing for steep changes in dose statistics and considering the time between adjacent scans. Due to the large variety in H&N treatments, this project should be repeated with a larger cohort of patients for a more thorough validation of performance.
This preliminary study highlights the complexity of implementing ART for complex sites and the risks associated with implementing a pipeline as a standalone decision-maker. However, the true benefit offered is its ability to model delivered dose distributions as well as alerting staff to potential areas of concern. The additional information provided would be a valuable tool to aid the decision-making process, helping to standardise and reduce the subjectivity of the current pathway.
Acknowledgements
None.
Financial support
This work was performed under a research agreement between Leeds Cancer Centre and RaySearch Laboratories, and the work was funded by Cancer Research UK for the Leeds Radiotherapy Research Centre of Excellence (RadNet: C19942/A28832)
Competing Interests
The authors declare none.