Exploiting variation in individual outcomes to assess how individuals respond to dietary interventions
Randomised controlled trials (RCT) are the ‘gold standard’ approach to identify associations between a dietary intervention and one or more health or disease outcomes, at the population level. In principle, the randomisation of participants between the intervention and control groups evenly distributes the characteristics of the participants and helps to reduce bias and confounding introduced by other known and unknown factors. Any observed differences in the outcome(s) between the groups, should, in theory, reflect the effects of the interventions delivered to each group. However, in practice, all randomised trials of dietary interventions face some degree of bias, including different types of selection or sample bias or measurement bias, because most are conducted in the real world, with a limited sample size(Reference Krauss1). A common source of bias is the participants’ baseline characteristics (e.g. age and health status), which are often poorly distributed between the intervention and control groups, independently of the randomisation technique used. Although more complex randomisation techniques (e.g. restricted or stratified) can be used, not all characteristics will have been measured, and there is a limit to how many variables the sample can be stratified to, inevitably leading to unbalanced groups with regard to one or more potentially relevant characteristics(Reference Hoare2).
Complex confounding factors that describe the environment, (epi)genetics, nutritional status, food behaviours, gut microbiota and the metabolome are also often left unmeasured. Not only can these be responsible for significant variations in responses to interventions between individuals at a specific point in time(Reference Brennan and de Roos3), they may also cause fluctuations in the outcome (e.g. plasma TAG, blood pressure and heart rate) in each participant over time. For example, when studying the effect of a dietary intervention on blood pressure and heart rate using an RCT design and assessing these outcomes at the start and end of the study period for both groups, if the participants show elevated blood pressure and/or elevated heart rate, compared to baseline, this change could have resulted from a particularly salty dinner they ate the night before, being stressed, having just finished a strenuous bout of exercise or even coping with a sudden change in outside temperature, rather than the effect of the dietary intervention (Figure 1). These unacknowledged variables, which could also affect the outcome, can lead to variability in responses unrelated to the interventions, potentially causing spurious associations. These same factors that cause variation in individual responses can, however, be exploited to assess, or predict, how individuals respond to dietary interventions using more fit-for-purpose study designs.
The field of precision nutrition ultimately aims to develop more comprehensive and dynamic nutritional recommendations based on individual variables, including genetics, the microbiome, metabolic profile, health status, physical activity, dietary patterns, food environment and socio-economic and psychosocial characteristics(Reference Berciano, Figueiredo and Brisbois4). Such ambitions can be realised by exploring existing large datasets and by conducting new intervention studies using appropriate study designs. Here we review where these methods have been applied so far.
Modelling data to identify variables that can explain responses
Statistical models quantify the strength of an association between several independent variables and an outcome, providing insight into how multiple drivers, such as baseline variables, work together to cause a change in an outcome. Modelling is an important tool; however, when a study takes measurements of, or records, numerous independent variables, it can be difficult to determine which should be included in the model. Variable selection refers to the process of choosing the most relevant independent variables to include in a regression model, which will help to improve model performance, and avoid overfitting by reducing the number of independent variables included(Reference Heinze, Wallisch and Dunkler5). For a statistical model to work, both the input data and the choice of the model need to be considered; inaccurate or incomplete data, or a non-optimal method, will result in inaccurate results and erroneous associations.
Independent variables could be selected a priori based on existing evidence of potential associations with an outcome. However, if researchers are faced with an overwhelming number of variables, or if there is uncertainty about where associations may exist, statistical methods exist to assist with variable selection. The importance of variable selection was highlighted in a secondary data analysis study that investigated changes in concentrations of plasma TAG after fish oil supplementation(Reference Potter, Horgan and Wanders6), using original data from the FINGEN study(Reference Caslake, Miles and Kofler7). The FINGEN study had previously shown that males and those with the apo E4 genotype had the highest TAG-lowering response to fish oil intervention. However, the authors of the FINGEN study also specifically reported on substantial response heterogeneity between participants(Reference Caslake, Miles and Kofler7,Reference Madden, Williams and Calder8) . In the secondary data analysis(Reference Potter, Horgan and Wanders6), four variable selection and analysis methods – forward stepwise selection, backward stepwise selection, least absolute shrinkage and selection operator (LASSO) and the Boruta algorithm – were applied to five datasets imputed by multiple imputation, using a pooled regression on a per-outcome basis. The final model, and, therefore, the optimal variable selection and analysis method for that outcome, was chosen based on the lowest validation set root mean squared error. Different variable selection methods were found optimal for different outcomes, highlighting the need to consider the selection method. For example, models generated by LASSO identified higher baseline plasma insulin concentrations and lower pre-intervention TAG concentrations as the strongest predictors of plasma TAG change, after fish oil intervention. On the other hand, backward stepwise selection identified being older, and female, and having lower baseline levels of plasma EPA and DHA, as the strongest predictors of plasma EPA and DHA change, after fish oil intervention(Reference Potter, Horgan and Wanders6).
To explore how certain individuals might respond to a given intervention, researchers are using increasingly advanced statistical approaches and machine-learning algorithms. These are mostly based on the measurement of multiple outcomes and independent variables, often including extensive questionnaires and the application of omics technologies, in order to identify the strongest predictors of change in clinically relevant outcomes on an individual level. Thus far, most studies have focussed on postprandial blood glucose levels due to the robust existing technologies available to measure blood glucose in a continuous fashion for longer periods of time, which are relevant, for example, to manage glycaemic control in diabetes(Reference Maiorino, Signoriello and Maio9). To ensure sufficient statistical power, such studies typically involve large cohorts containing hundreds of participants and require advanced statistical approaches and machine-learning algorithms to be applied to datasets in which the number of participants is greater than the number of predictors. Pioneering work by Zeevi and colleagues(Reference Zeevi, Korem and Zmora10) allowed the development and validation of personalised algorithms to reduce individual postprandial glycaemic responses. But is this personalised/precision concept working in terms of increasing the efficacy of diets to improve health outcomes? A subsequent large-scale intervention study (the Personal Diet study) applied these algorithms to provide a personalised diet targeting the postprandial glucose response in adults with abnormal glucose metabolism and obesity. However, they found that the personalised diets resulted in no more weight loss and no greater reduction in glycaemic variability and HbA1c levels, compared with a generic low-fat diet, after 6 months(Reference Popp, Hu and Kharmats11). A second study by Berry and colleagues, the PREDICT-1 study, identified the main predictors of individual glucose responses, including meal composition, genetics, meal content, serum glucose markers and the microbiome(Reference Berry, Valdes and Drew12). The results of an efficacy study of a personalised nutrition programme, based on the algorithms of the PREDICT-1 study, were recently published(Reference Bermingham, Linenberg and Polidori13). There were some positive but modest results: an average weight loss of 2.5 kg over 4 months, but no changes in various other biomarkers, including blood pressure, insulin, glucose and postprandial TAG. The control group was also perhaps not a control group in the strictest sense of the term – they were simply given standard dietary advice and a helpline to call. It should be noted, however, that weight loss is a very ambitious goal, with the outcome also influenced by a significant number of behavioural factors, which are rarely taken into account in prediction models. Also, it has been questioned whether postprandial glucose is a relevant health outcome, especially in those who do not have diabetes, and whether healthy participants would benefit from flattening their blood glucose curves(Reference Blaak, Antoine and Benton14,Reference Jarvis, Cardin and Nisevich-Bede15) . Recently, a study showed that individual postprandial continuous glucose monitoring responses to duplicate meals were unreliable in adults without diabetes(Reference Howard, Guo and Hall16).
The application of novel precision nutrition study designs
Over the past decade, we have seen the development of novel precision nutrition approaches as an innovative way to measure the efficacy of dietary interventions to improve individual health outcomes, rather than, or in addition to, population health. It is important to realise that these novel precision nutrition approaches answer different research questions and provide different levels of evidence, compared to RCT. Precision nutrition approaches target the personal aspect of nutrition information and aim to give people a better idea of whether a dietary intervention or change will work for them. The higher effectiveness of a precision nutrition approach to achieve this is based on an expectation that certain individuals may particularly benefit from certain dietary interventions, depending on factors such as their environment, (epi)genetics, nutritional status, food behaviours, gut microbiota and the metabolome(Reference Brennan and de Roos17). But how good are we at investigating what influences individual responses to personalised diets?
Within the precision nutrition context, the repeated collection of outcome and potential explanatory variable measurements enhances the precision of association estimates and is key to understanding which factors affect a participant’s response to an intervention(Reference de Roos18). For example, the PRECISE study assessed whether supplementation with bilberry and grape seed extract for 12 weeks improved cardiometabolic outcomes in individuals at risk of developing type 2 diabetes mellitus (T2DM), with 14 participants acting as their own controls using a placebo-controlled crossover randomised design(Reference Grohmann, Walker and Russell19). Over the course of 12 weeks, multiple measurements of glycated Hb (HbA1c), 2-h oral glucose tolerance tests, total cholesterol, LDL-cholesterol and HDL-cholesterol were taken, and continuous blood glucose levels were assessed. No significant changes in any of these outcomes were observed between groups, but it was found that bilberry and grape seed extract significantly reduced ambulatory blood pressure over 24 h, to a level that is comparable with the effect of anti-hypertensive drug treatments. The significant reduction in blood pressure may have been an early marker of a beneficial effect on glucose metabolism – a recent individual participant meta-analysis assessing the effect of blood pressure lowering and risk of new-onset type 2 diabetes found that a systolic blood pressure reduction by 5 mmHg reduced the risk of T2DM by 11%(Reference Nazarzadeh, Bidel and Canoy20). It was also found that 8 of the 14 participants were identified as blood pressure ‘responders’ to bilberry and grape seed extract. These responders had significantly higher levels of phenylpropionic and phenyllactic acids in their faecal samples and a higher proportional abundance of Fusicatenibacter-related bacteria in their baseline stool samples, which may provide an explanation for the blood pressure response(Reference Grohmann, Walker and Russell19). However, these results at the individual level will need to be tested in further studies, as they were obtained using simple statistical methodological approaches (e.g. ANOVA), which do not account for the fact that multiple measurements and observations of the same individual over time are not independent and are highly correlated.
Novel study designs for nutrition research – the potential of N-of-1
An N-of-1 study is the opposite of a large-scale study aimed to make inferences about a population. Instead, it is a study specifically designed to generate statistically robust results using data collected from an individual, with the goal of drawing conclusions that apply only to that person. There are two main types of N-of-1 studies: observational N-of-1 studies, which simply monitor a participant over time without introducing an intervention, and interventional N-of-1 studies that compare one intervention with a baseline period or compare two or more interventions and test hypotheses; the focus of this review is on interventional studies. Typically, an interventional N-of-1 study takes repeated measurements and/or recordings, of both the dependent variable (outcome) and independent variables of interest, in real time and in the context of day-to-day activities, to determine associations between the independent variables and the outcome(s). There are different ways an interventional N-of-1 study can be designed, depending on the aims of the study and the reversibility of the intervention, with the possibility of using a design with a randomised sequence of treatments where the individual is their own control(Reference Vieira, McDonald and Araújo-Soares21).
Comparing interventional N-of-1 designs and randomised controlled trials
A key distinction between an RCT, or other common group-level interventional designs such as crossover designs, and an interventional N-of-1 study is the research question one intends to answer. An RCT aims to understand the effect(s) of an intervention(s) within the recruited sample that could be generalised to the population represented by the sample. Conversely, the primary aim of an interventional N-of-1 study is to understand how an individual responds to different interventions, to understand whether personal behavioural and lifestyle factors also influence variation in the outcome(s), and potentially explore any interactions between these factors and the intervention(s).
The focus of an N-of-1 study is on the individual, and whilst studies have been published on single participants, a series of N-of-1 studies are more common(Reference Hawksworth, Chatters and Julious22,Reference Gabler, Duan and Vohra23) . Generally, N-of-1 studies have recruited less than 10 participants, whereas current protocols are aiming to recruit around 20 participants(Reference Hawksworth, Chatters and Julious22). Data from multiple N-of-1 studies can be aggregated in a meta-analysis to estimate the effect of the intervention(s) in a population assumed to be represented by the sample of individuals studied(Reference Zucker, Ruthazer and Schmid24,Reference Senn25) , much like analysing independent RCT in a meta-analysis. Based on various assumptions including a statistical significance level of 0.05, and a power of 80%, it has been suggested that data from up to approximately 45 individual N-of-1 studies might be required for such an analysis(Reference Senn26). Data from N-of-1 trials can also be used to help predict the effect of a treatment in a future person, which would provide improved methods to personalise treatment for individual patients. However, this type of analysis needs to be able to accurately estimate variation in between-patient differences and, to do this, may require up to 100 individuals to be studied in the first place(Reference Senn26). In a medical context, it may not be possible to recruit so many patients(Reference Senn26); however, in nutrition, where the general population may often be the target, inclusion and exclusion criteria might be less constrained, and while undeniably challenging, a study of this size might be more achievable for the purpose of developing personalised nutrition strategies.
The flexibility offered by a series of N-of-1 interventional studies – their ability to answer research questions pertaining to the individual and the population – could make them more desirable than an RCT in certain contexts. By choosing appropriate control or placebo interventions, applying randomisation, counterbalancing and blinding, as has been suggested in guidelines for designing N-of-1 studies(Reference Kravitz, Duan, Duan, Eslick and Gabler27), interventional N-of-1 studies will be able to conclude whether any of the independent variables describing time, the intervention, the environment or an individual’s behaviour over time are associated with variation in the outcome(s). This is a distinct advantage of N-of-1 studies over RCT.
While N-of-1 studies and RCT have obvious differences, it is important to understand that certain methods and principles relevant to RCT also apply to N-of-1 studies. The concept of statistical power still applies and is derived from the number of repeated measurements taken in an individual (per intervention if the study is an intervention study) rather than the number of people recruited. A minimum of 50 repeated measures per intervention has been suggested, and the potential for missing data should be considered(Reference Vieira, McDonald and Araújo-Soares21). Randomisation can be used to generate intervention delivery sequences and to allocate participants to these sequences when multiple sequences are involved, with counterbalancing being an alternative or complementary method. Exerting more control over the order the interventions are delivered, could also control for potential confounding introduced through order effects – the effect of one treatment on the effect of the following treatment. Similarly, the implementation of washout periods between treatments should also be implemented to reduce or eliminate the risk of carryover effects, the residual effects of an intervention beyond its assigned duration, as in a crossover trial, although analytical solutions exist when it is not possible or convenient to include washout periods(Reference Liao, Qian and Kronish28). Other publications (see(Reference Kravitz, Duan, Duan, Eslick and Gabler27,Reference Potter, Vieira and de Roos29) ) have described and discussed these points in a more comprehensive manner with regard to the underlying theory and practical applications.
Designing an N-of-1 study
Most discussion surrounding intervention order in an N-of-1 study focuses on the comparison between several interventions: how long each intervention is delivered for within a block (also referred to as a cycle), how they are ordered within a block and how multiple blocks are arranged across the course of a study. It is also apparent that a lot of published N-of-1 studies compare only two treatments (see(Reference Hawksworth, Chatters and Julious22,Reference Lillie, Patay and Diamant30) ), although comparisons of three treatments do exist(Reference Hawksworth, Chatters and Julious22,Reference Nurmi, Knittle and Naughton31) . In nutrition research, the need to compare more than two interventions may be required – for example, when researchers wish to study the effects of individual components of a diet, or components of a single dietary item. Based on the assumption that an appropriate length of time to deliver a specific intervention is one week, a doubly counterbalanced design for three interventions, which seems to instinctively follow on from the design AB-BA-BA-AB (a common example in which each letter denotes a unique intervention, and a hyphen denotes the transition between blocks), could be ABC-CBA-CBA-ABC. However, in this design, intervention B is never delivered consecutively, so to appropriately account for order effects, one might need to add two more blocks, for example, ABC-CBA-CBA-ABC-ACB-BCA. This study is now at least 18 weeks long, and given that washout periods are yet to be incorporated and that participants would have to complete more than one data point per day to achieve at least 50 data points per intervention, the burden of this study on the participant may be deemed too great. An alternative approach would be to combine randomisation and counterbalancing; a random selection of four (out of a possible six) orders of three treatments could be selected and then counterbalanced such that no treatment is delivered in consecutive weeks, for example, ABC-BCA-CBA-BAC. A participant would still have to complete at least two measurements per day to achieve at least 50 per intervention; however, as a result, the study is, favourably, reduced to 12 weeks.
Imputation of missing data in N-of-1 studies
N-of-1 studies often aim to collect a large amount of data from many time points and many different variables from a single participant using ecological momentary assessments (EMA)(Reference Kravitz, Duan, Duan, Eslick and Gabler27,Reference Potter, Vieira and de Roos29) . EMA involves the repeated sampling of current behaviours and experiences of a participant in real time and in a natural environment, aiming to minimise recall bias and maximise ecological validity(Reference Shiffman, Stone and Hufford32). It is entirely possible that a final dataset may have rows of missing data associated with a point in time when a participant was unable to return the information requested. This may depend on numerous factors such as the methods of collecting data (remotely or at a site visit), the frequency of data collection, a participant’s reduced engagement with the study at times or a random occurrence that prohibited data collection. However, to be able to appropriately apply statistical methodologies to an N-of-1 dataset, which is inherently time series data, one requires a complete set of data to ensure proper modelling of the time structure and autocorrelation patterns. Therefore, these missing data points require imputation, which could be performed in various ways. One could impute the missing data based solely on the principle of autocorrelation, the relationship between any given data point and those that preceded it, on a variable-by-variable basis; within R, a programming language for statistical computing and data visualisation, the addition of a package such as imputeTS could be used to achieve this(Reference Moritz and Bartz-Beielstein33). However, imputation in this manner would assume that variables are independent of one another, an assumption that is often unrealistic, especially in an N-of-1 study that is collecting data on numerous aspects of an individual’s life. There is also the risk that basing the imputation of missing values on those immediately prior could disproportionately emphasise the effect of autocorrelation during the analysis. An alternative approach would be to use multiple imputation. In R, there are various packages available for carrying out multiple imputation such as Amelia II(Reference Honaker, King and Blackwell34) and MICE(Reference van Buuren and Groothuis-Oudshoorn35). Multiple imputation usually generates multiple complete datasets that can be analysed together using functions provided by the imputation package, with the estimates from each dataset being pooled to provide an overall effect(Reference McDonald, Vieira and Johnston36). With time series data, however, it may be necessary to build statistical models for each imputed dataset separately before deciding on a final model to apply to all imputed datasets. This might be required if the variables that are statistically significant are particularly sensitive to the imputation and to permit appropriate investigation of the existing autocorrelation patterns within an independent variable or outcome, per imputed dataset.
Model variable selection
To determine the variables to be included in any final statistical model after multiple imputation, a decision would need to be made a priori, or in an exploratory analysis, based on the independent variables that are statistically significant in the models derived from each imputed dataset. It is also critical that autocorrelation is appropriately accounted for through the inclusion of lagged variables associated with both the independent and dependent variables. A lagged variable represents the value that a specific variable had at a previous time point (e.g. previous day, 2 h ago or 7 d ago) and, by being included in the model, allows the modelling of any existing autocorrelation within the variable’s consecutive data points. The construction of lagged variables has been described in detail elsewhere(Reference Kwasnicka, Inauen and Nieuwenboom37).
Strengths and limitations of N-of-1 studies
N-of-1 studies are relatively new to the field of human nutrition but have previously been applied in the fields of medicine, health psychology, education research and experimental economics for some time(Reference Hawksworth, Chatters and Julious22,Reference Gabler, Duan and Vohra23,Reference Lillie, Patay and Diamant30,Reference Kwasnicka, Inauen and Nieuwenboom37) . The value of N-of-1 trials is in the opportunity they offer to include behavioural factors, which are important when considering the acceptability of, or compliance to, dietary interventions, especially in the context of health policies. For example, an N-of-1 design was used to assess the between-person variability in the psychological and social factors associated with daily alcohol consumption, in 25 adults with a history of alcohol dependence, using EMA(Reference Kwasnicka, Boroujerdi and O’Gorman38). This study, by fitting multi-level models to the longitudinal data, was able to identify individual factors that contributed to alcohol consumption changes before and after minimum unit pricing implementation in Scotland, including psychosocial factors such as temptation. This was despite the fact that each participant provided only 27 responses on average (out of a possible 84 questionnaires that were sent daily for 12 weeks), that the average time in the study was only 64 out of 84 days, that the total response rate was only 48% and that only 15 out of 25 participants provided sufficient data for analysis(Reference Kwasnicka, Boroujerdi and O’Gorman38). Non-adherence to data collection, possibly including lower response rates near the end of longitudinal data collection, resulting in missing data, may be due to the ‘burden’ of daily data collection being too high for some participants. This has been identified as one of the challenges of N-of-1 studies, although overall, adherence to EMA has been considered high(Reference Kwasnicka, Inauen and Nieuwenboom37). On the other hand, one main advantage of N-of-1 studies is that they offer the opportunity to tailor the intervention and data collection to an individual, increasingly involving wearables, smartphone sensors and app or web-based data collection, which is believed to increase engagement and adherence to intervention(Reference Kwasnicka, Inauen and Nieuwenboom37).
Conclusion
The adoption of new study designs and modelling approaches, including N-of-1 designs and statistical approaches to examine an individual’s response to interventions, will help to make predictions and better understand associations, with regard to dietary interventions and health outcomes, either for individuals or groups that share certain characteristics, more effectively and accurately in the future. An advantage of these study designs is that they can include time-varying factors, such as the environment, epigenetics, nutritional status, food behaviours, physical activity, gut microbiota and the metabolome, which cause variation in individual responses to dietary interventions, and use this information to assess, or predict, how individuals respond to dietary interventions. N-of-1 studies are especially suitable for clinically relevant outcomes that vary over time, due to the precision obtained from repeated measurements. However, N-of-1 studies need rigorous consideration of specific elements, including randomisation and blinding, the number of measurements and intervention cycles and choosing appropriate outcomes that can be measured regularly or continuously using wearables and other remote data collection methods.
Author contributions
EP, RV, and BdR wrote the paper. All authors read and approved the final manuscirpt.
Financial support
The research of BdR is funded through the Scottish Government Rural and Environment Science and Analytical Services Division (RESAS), Grant RI-B5-06. The PhD project of EP is funded by Lipton Teas and Infusions.
Competing interests
There are no conflicts of interest.