There is increasing demand for evaluation of public health policy to demonstrate effectiveness(Reference Petticrew, Cummins and Ferrell1–3). Such policies, however, tend to be complicated with multiple interacting components, complex partnerships and external factors which have the potential to reduce the impact of the intervention, and this provides a challenge for evaluation(Reference Pawson and Tilley4–Reference Campbell, Fitzpatrick and Haines6). Unlike most clinical studies, the theory underpinning a policy and the specific aims and objectives are not always clear. Issues associated with evaluating policy interventions are still debated(Reference Craig, Dieppe and Macintyre5), but it is increasingly recognised that a different approach from that for clinical studies is needed(Reference Tones7–Reference Bonner10). Often a wide range of strategically chosen outcomes need to be identified which are consistent with and sensitive to the purpose of the intervention(Reference Campbell, Fitzpatrick and Haines6, Reference Nutbeam11, Reference Green and South12). A single outcome may not give adequate assessment of the policy intervention, or may ask too much of the intervention(Reference Hawe, Degeling and Hall13, Reference Fishbein14). Realistic expectations are also needed for what policy interventions can deliver within a specified time frame, particularly as they often involve a process of social change(Reference Hunter15, Reference Wimbush and Watson16).
Several different methods have been used to try to conceptualise this complexity (e.g. logic models(Reference Kubisch, Brown and Chaskin17, Reference Milstein and Wetterhall18), concept mapping(Reference Trochim19)), with a common aim to identify short-, medium- and long-term outcomes, showing a logical progression of change. Concept mapping, which brings together knowledge and experiences of stakeholders to develop a common framework for evaluation(Reference Trochim19), is particularly useful for long-term programmes, such as public health policies, where there is limited theoretical and empirical knowledge of the time to impact. The estimated time for a policy intervention to have a measureable impact is often not specified, which can lead to unrealistic expectations of the outcome and the issue of whether it has failed or if the outcomes have simply not occurred yet(Reference Hawe, Degeling and Hall13, Reference Tones and Tilford20, Reference Sridharan, Campbell and Zinzow21). Concept mapping has been used in the evaluation and planning of a number of public health policies (e.g. evaluation of the Smoking, Health and Social Care (Scotland) Act 2005, planning health improvement initiatives(Reference Trochim, Milstein and Wood22) and determining intervention strategies to increase physical activity(Reference Kelly, Baker and Brownson23)).
Despite raised awareness of diet and health and the number of community-based interventions aimed at improving dietary intakes, there is a lack of well-planned evaluations(2, Reference Kraemer and Winkleby24). The UK House of Commons Health Committee(3) recently highlighted this, stating that ‘even where evaluation is carried out, it is usually “soft”, amounting to little more than examining processes and asking those involved what they thought about them’.
The aim of the project described here was to develop an evaluation framework with a timeline for assessing the impact of nutrition-related policy interventions; the development used the implementation of the Schools (Health Promotion and Nutrition) (Scotland) Act 2007(25) in primary schools since it is an example of typical government policy, but the intention was that the framework would have wider relevance. The Act applies to both primary and secondary schools but the focus for the project was primary schools (pupils aged 5–11 years). The legislation covers a range of health promotion issues as well as setting regulations for all food and drink provided in schools to meet nutritional standards. The main principles of the Act are shown in Table 1; more detailed information (e.g. the nutrient regulations, restriction of specific food and drinks, and the health promotion within schools) can be found on the Scottish government website (www.scotland.gov.uk/Topics/Education/Schools/HLivi/foodnutrition). This policy is similar to others being implemented in other countries and is typical of many public health policies which have a number of objectives and actions within a single policy(Reference Pandey and Wright26, Reference Lee, Rainey and Chun27). There was also sufficient detail to allow stakeholders with a range of expertise and prior knowledge of the Act to participate.
More detailed information about the Act can be found on the Scottish government website (www.scotland.gov.uk/Topics/Education/Schools/HLivi/foodnutrition).
Methods
Participants
Concept mapping is a stakeholder-driven method. Stakeholders from scientific research, national and local government, the National Health Service (NHS), schools, community organisations and parents with children in primary school participated in the study. This wide range of stakeholders collectively had expert knowledge of the day-to-day schools context, research knowledge associated with schools-based health promotion, behaviour change and nutrition issues, as well as knowledge on policy processes and their intended aims. Stakeholder participation varied at different stages of the project, as described in subsequent sections. Key stakeholders were sent a written invitation to participate in the sorting and rating stage of the project (n 291), with additional invitations sent out via national and local nutrition organisations in the UK (Food and Health Alliance, The Nutrition Society, British Dietetic Association, Scottish Colloquium on Food and Feeding).
Concept mapping process
Concept mapping is a mixed-methods approach, combining qualitative research methods with multivariate statistical analyses, identifying and grouping key outcomes of an intervention. The process has six stages: (i) defining the evaluation question; (ii) brainstorming the possible outcomes; (iii) sorting and rating the outcomes; (iv) data analysis; (v) interpretation of the results; and (vi) development of a timeline.
Evaluation question
The aim of the project was to identify the likely outcomes resulting from the implementation of the Schools Act in primary schools. The following statement was developed for stakeholders to complete in the brainstorming session: ‘I think the introduction of the Schools (Health Promotion and Nutrition) (Scotland) Act 2007 in primary schools will …’.
Brainstorming: generate outcome statements
Three methods were used to generate outcome statements: a workshop, semi-structured interviews and focus groups. The workshop was run with nineteen stakeholders from research and policy sectors across Scotland. Following a presentation describing in detail the Schools Act and purpose of the workshop, participants were asked to individually generate between six and eight outcomes that completed the statement ‘I think the introduction of the Schools (Health Promotion and Nutrition) (Scotland) Act 2007 in primary schools will …’, then to work in groups to agree and refine the list of statements. The groups included a mix of stakeholders from different sectors, with varying prior knowledge of the Act. Statements from all the groups were collated. Semi-structured interviews were carried out with stakeholders from local government departments involved with the implementation of the Schools Act (education (n 2), catering (n 3) and health (n 4)) in Scotland. Interviewees described their role and previous experience of nutrition and health promotion policies in schools and were asked to describe what they thought would be the likely outcomes from the implementation of the Act. Interviews were transcribed and statements describing the outcomes extracted independently by two researchers (J.I.M., J.L.) and then combined. The opinions of parents with children in primary school were explored through two focus groups (six parents per group). The discussion focused on their views of school lunches and health promotion in schools, their knowledge of the Schools Act and the changes that they anticipated with the implementation of the Act.
Structuring: sorting and rating the statements
All the statements from the workshop, interviews and focus groups were combined (removing duplications) and reviewed both internally and externally, reducing the final list to eighty-five possible outcomes. A larger group of stakeholders was then sent information about the project (including the concept mapping methodology), a detailed description of the Schools Act and instructions for completing the sorting and rating. They were asked to first sort the statements into groups based on the similarity between statements in a way that made sense to them. Each statement could be placed in only one group and the statements could not all be put in a single group or grouped simply as positive or negative, but had to be grouped into themes. Next they rated each of the statements in terms of the relative likelihood of the outcome occurring, the importance of the outcome and the anticipated earliest time to measureable impact. Both the likelihood and importance were rated on a 5-point Likert scale from 1 (not at all) to 5 (extremely). Time to impact was rated on a 7-point scale with yearly intervals from 0 (immediate) to 6 (6 years or more). The sorting and rating exercise was completed online using the Concept System Global software (Concept Systems Inc., Ithaca, NY, USA), enabling a larger, more geographically dispersed group to participate. Stakeholders provided background information on age, sex, the type of sector they worked in, their area of expertise, their knowledge of and role in the Schools Act and whether they had children in primary school.
Representation: data analysis
Individual matrices were generated from the sorted data showing groupings by individuals, which were then aggregated into a similarity matrix to show the number of participants who sorted each pair of statements together. Using multidimensional scaling (MDS), two-dimensional coordinates were created for each statement on the concept map to give a visual representation of the data. The distance between the points (i.e. statements) on the map represents the similarity between the statements, with those sorted together by more of the participants being closer together. The stress value was calculated for the MDS as a measure of the goodness-of-fit of the map (lower values indicate a better fit). Hierarchical cluster analysis was performed using the coordinates produced by the MDS to group the statements. An iterative process reducing the number of clusters from an initial twenty clusters down to eight was examined for interpretability. Data from fifteen stakeholders were excluded because they had not grouped the statements according to the instructions described above. Data were analysed using the Concept System software.
The average likelihood, importance and time to impact ratings were calculated for each statement and each cluster. The average rating for each cluster was compared between stakeholder groups using general linear models in the SPSS statistical software package version 17·0 (SPSS Inc., Chicago, IL, USA). The groups were based on their type of work: research (academic researchers with knowledge of scientific studies, 35 %); policy (national government or government organisations involved in policy making, 19 %); and implementation (local government, NHS, schools and community groups responsible for implementing legislation, 44 %).
Interpretation: development of the timeline
The average rating of the anticipated time to impact for each cluster was used to develop the timeline. Time to impact ratings for individual statements were excluded if rated as ‘not at all’ likely to occur, as it was considered to be meaningless if the participant thought it would not occur. The number excluded per statement ranged from 0 to 35 ratings (median per statement, 3).
Results
Participant characteristics
Table 2 shows the characteristics of the stakeholders who completed the online rating exercise (n 102). It was not possible to estimate the exact response rate since the authors did not have access to the number of additional invitations emailed via professional societies. One hundred and fifty-four stakeholders signed up to the website and 111 completed the sorting (72 %); 102 completed all three sets of ratings (66 %).
Concept maps
Over 300 statements were generated, many of these were duplications and the list was reduced to eighty-five unique statements. From the cluster analysis a consensus was reached that the thirteen cluster option provided logical groupings. Positioning of the statements and clusters is shown on the concept map (Fig. 1), with clusters comprising between three and thirteen statements. The stress value for the MDS was 0·251, which was comparable with previous studies (0·205 to 0·365(Reference Kane and Trochim28)).
The clusters included positive and negative aspects of health, diet, knowledge, social norms and economics (Table 3). Nine clusters were positive outcomes, i.e. improvements in the uptake of school meals, diet and food intake, healthy dietary choices, food literacy, food culture, local economy, educational benefits, long-term health and well-being. Four clusters were negative or unintended consequences, i.e. undesirable eating behaviours, adverse economic consequences, negative attitudes and behaviours and a range of unintended consequences (e.g. bullying, food waste, reduction in family meals together).
Ratings by cluster
‘Increase food literacy’ (cluster 4) was rated as the most likely outcome, followed by ‘improve school meals and uptake’ (cluster 7) and ‘improve food culture’ (cluster 5). Negative and unintended consequences (clusters 11, 12 and 13) were rated as less likely to occur. Clusters tended to have higher ratings for importance than likelihood, with the exception of ‘adverse economic consequences’ (cluster 10). Figure 2 shows the relationship between importance and likelihood ratings for individual statements.
Timeline
Using the average earliest anticipated time for impact for each cluster, it was possible to identify short-, medium- and long-term outcomes (Fig. 3). The arrows intercepting the timeline (x-axis) represent the average rating for each cluster. The width of each box is set by the minimum and maximum rating for individual statements within the cluster. For example, in cluster 1 the average time to impact was 5·02, ranging from 4·22 (dental health) to 5·71 (health of the general population and reduce costs to the NHS). The timeline serves to illustrate realistic expectations for change. Although there is a clear order for the clusters based on the average time to impact there is considerable overlap between them. The timeline does not serve as a precise single time point to assess change but a realistic guide for approximate estimates when change would be anticipated. Different levels of shading of the clusters in Fig. 3 illustrate the likelihood of the outcome, with darker shading illustrating a higher likelihood.
Time to impact by stakeholder group
The anticipated time to impact for each of the clusters was compared between stakeholder groups. A significant difference in the anticipated time to impact was found between groups for all of the clusters except clusters 1 (‘long-term health gains and impact’), 11 (‘unintended consequences in school and on the family’) and 13 (‘undesirable eating behaviours’; Table 4). Post hoc tests showed significant differences between the implementation and research groups for clusters 2, 3, 4, 5, 6, 7, 8 and 12 and between the implementation and policy groups for clusters 4, 5, 7, 8, 9 and 10. In all cases the implementation group anticipated that the time to impact would be longer. The research and policy groups did not differ significantly on average time to impact for any of the clusters. Detailed prior knowledge about the Schools Act varied between groups, 11 % for the research group, 58 % for the policy group and 65 % for the implementation group, but adjusting for this did not alter the differences found between the groups.
*Differences between groups (post hoc comparisons using Tukey’s Honestly Significant Difference test): aresearch v. implementation, bpolicy v. implementation (P < 0·05).
Discussion
A systematic process was used to conceptualise and develop a timeline for evaluating nutrition policy in schools. This analysis grouped a complex range of behavioural, socio-cultural, health and economic outcomes into clusters and estimated realistic expectations of the time lag between implementation and impact on the range of outcomes. A common problem in evaluating public health policy is that programme theories (if stated) seldom specify the length of time for observed behavioural changes and long-term health gains to emerge, which can lead to unrealistic expectations(Reference Hawe, Degeling and Hall13, Reference Sridharan, Campbell and Zinzow21). In the present study, stakeholders anticipated that on average it would take over 5 years to detect any measureable change in long-term health outcomes (e.g. obesity, dental health and diabetes), which was 2 years longer than when they anticipated dietary changes. This presents a huge challenge for both policy makers and researchers trying to determine the impact of a long-term policy within a relatively short policy cycle. This highlights the importance of short- and intermediate-term outcomes as proximal indicators of whether the policy intervention is likely to produce the desired long-term outcome. Furthermore, initial success can often create the conditions for further success; for example, children need to try a new food, made available to them through school meals, before they decide if they like it and subsequently include it in their diet or not, and it is important to monitor these small changes.
The time to impact of the thirteen clusters provided a logical sequence of how the intervention might change behaviours, shift social norms and ultimately impact on health outcomes. The analysis revealed that the time to impact for the clusters often overlapped, because they were made up of several outcomes. For example, the food literacy cluster included knowledge, attitudes and food choices. Measurable changes in awareness and knowledge of a healthy diet were anticipated within 2 years, while for social norms, preferences and food choice the anticipated time was over 3 years (cluster average 2·7 years). The time points that emerged in this framework typically spanned several months or years. The mean time points therefore were intended to serve as a guide to indicate when to expect change, not to identify an exact time point.
With the high prevalence of obesity among children and adults(Reference Bromley, Bradshaw and Given29, Reference Craig and Mindell30) there is a growing pressure for policies and community interventions to tackle this major public health problem(Reference Hill, Peters and Wyatt31–Reference Swinburn33). Although the aim would be to reduce the prevalence of obesity, using weight status as the primary outcome of population-based interventions, without short- and medium-term indicators, would need a very long period of follow-up to reach a conclusion about the impact of the intervention. Kropski et al.(Reference Kropski, Keckley and Jensen34) recently reviewed fourteen school-based obesity prevention programmes which used weight or BMI as a primary or secondary outcome. The review concluded that the evidence for school-based interventions was weak, but the authors stated the ‘time required for small behavioural changes in diet or physical activity to manifest in anthropometric measures may exceed the duration of the studies reviewed’. The duration of the studies included in the review ranged from 6 months to 6 years, with the median duration of 1·8 years. This contrasts with results from our study, where it was anticipated that the earliest time to impact for a reduction in the prevalence of overweight or obesity was over 5 years. The need for short- and intermediate-term outcomes is supported by the conclusions of other reviews stating that the duration for evaluation of impact on weight or BMI for many studies is too short and that a broader range of outcomes needs to be measured(Reference Kropski, Keckley and Jensen34, Reference Summerbell, Waters and Edmunds35).
In general, stakeholders working in the community and involved in implementing policy in practice anticipated the time to impact would be longer than that anticipated by those working in research or policy. This difference may reflect their experiences of implementing complex policy interventions in different settings. A researcher’s evidence base is likely to come from studies carried out under ideal situations where a controlled intervention is fully implemented; this is likely to be in contrast to stakeholders implementing public health policies in the ‘real world’ where there is less control over the intervention, which may result in only partial implementation, and other external factors(Reference Pawson and Tilley4, Reference Kelly, Baker and Brownson23, Reference Judge and Bauld36, Reference Leviton37). This highlights the importance of including the expertise and experience of stakeholders, because if the evaluation timeline was based only on empirical evidence from studies carried out under ideal conditions it is likely that it would underestimate the real time to impact. Policies need time to be fully implemented and embedded in practice, before judgements about their impacts on distal outcomes such as dietary changes and health outcomes are made(Reference Tones7, Reference Nutbeam11, Reference Hunter15, Reference Wimbush and Watson16).
Rating the likelihood of the outcome occurring was an important factor. It can help decide which outcome variables would be most useful to monitor and gives an indication of the size of impact of different outcomes. This can reduce the risk of measuring outcomes that may be thought to be important but unlikely to have an impact. Reducing the cost to the NHS and reducing the prevalence of type 2 diabetes among children were both rated highly on importance but low on likelihood, suggesting that these would not be good indicators to include in the evaluation of this policy. Unintended and negative consequences tended to be rated as less likely to occur than behaviours more closely associated with action of the Act. The low level of the importance given to the economic outcomes may reflect a bias towards a more health-oriented group of stakeholders.
The use of concept mapping would be considered too detailed for simple, well-controlled and well-defined interventions with a clear theory underpinning the intervention and predictable time to impact, but highly relevant for policy interventions where the theory and empirical data are often lacking. Ideally all policies should have a sound theory underpinning them(Reference Kelly, Stewart and Morgan38), but in practice this is not always the case and therefore this type of approach to evaluation is needed(Reference Sridharan, Campbell and Zinzow21). The increasing demand for evidence-based policy requires better evaluation of policy and, as such, appropriate methods to develop evaluations.
This method for developing an evaluation framework has strengths and limitations. A stakeholder-driven approach combines knowledge of the empirical evidence with ‘real-world’ perspective and experiences of working directly with the intended beneficiaries of policy interventions. This serves to give a more realistic, plausible and achievable evaluation timeline to guide data collection and the evaluation. Including this in the planning stage gives stakeholders and policy developers a better understanding of the intervention and the logical progression. One strength of the current study was that a diverse range of stakeholders participated giving different perspectives and who collectively had expert knowledge of the day-to-day schools context, research knowledge associated with health promotion, behaviour change and nutrition issues, and policy processes. Some stakeholders had little prior knowledge about the Schools Act before participating, but they were given detailed information about it before completing each stage of the process. Furthermore, no significant differences in the timeline were found between stakeholders with little or no prior knowledge of the Act before the study compared with those who had detailed prior knowledge.
A general limitation of any model developed for policy evaluation is that it assumes a stable environment, which does not take into account implementation failure. Therefore, a cautionary note has to be considered with the final timeline as it is based on the assumption that the intervention was fully implemented as intended; but in contrast to research studies this does not necessarily happen with all policy interventions. This could delay changes in the outcomes on the timeline. It was interesting to note that negative outcomes (e.g. economic, uptake of school meals) were considered to be more likely to emerge sooner than positive health or dietary change outcomes. Stakeholders expected some indicators to get worse before they got better (e.g. uptake of school meals) and this is important to note within a public policy context that is currently underpinned by tight fiscal constraints. Interestingly this trend has now been observed for uptake since the implementation of the Act; an initial reduction in uptake was observed in the year that changes were first made but over subsequent years there has been a recovery and increase in uptake(39). A populated timeline of outcomes, derived from an exercise like this, may assist policy makers to make the case to continue a public health policy intervention, in challenging economic conditions, that may not yet have yielded positive health-related results but is considered likely to do so in the longer term. Conversely it can help to detect possible negative impacts (such as economic issues and implications) with a policy, especially in the early stages post implementation, and help provide a more realistic picture of the overall impact. A potential limitation of the study was that we did not include of the views of children as a stakeholder group, which could have given an additional dimension during the brainstorming. Completing the rating and sorting exercise was time-consuming, which can place a burden on stakeholders.
In summary, the purpose of the current study was to develop a timeline for evaluating nutrition-related policy interventions and it has demonstrated an approach to developing a conceptual framework. It produced a time-to-impact ‘road map’ connecting a broad range of outcomes to better understand the likely impact. The Schools Act provided a multilevel example of a policy intervention which not only aims to change individual behaviour through education to develop knowledge and understanding about healthy eating, but is also concerned with developing supportive organisational and cultural changes through local environmental modifications. This integrated health promotion approach can form the basis of strategies for tackling obesity through other settings such as the workplace. We believe our study offers a way of gaining a better understanding of the range of outcomes that can be affected by policy, as well as identifying potential unintended consequences. The timeline could be adapted and used as a guide for future evaluations. It offers policy makers and researchers a way of identifying possible progress towards challenging long-term public health goals, within the constraints of relatively short government policy cycles.
Acknowledgements
Sources of funding: The study was funded by the Rural and Environment Research Analysis Directorate (Scottish Government). Conflicts of interest: The authors have no conflicts of interest to declare. Author contributions: J.I.M. conceived the study and all the authors contributed to the design and interpretation of the results; J.I.M. and J.L. conducted the study and data analysis. J.I.M. was responsible for writing the manuscript and J.L., F.D., A.L., C.C. and G.M. commented on and edited the manuscript. Acknowledgements: The authors would like to thank all the stakeholders who gave their time to participate in the project.