Utilising published and archived research
Hoogenboom (Reference Hoogenboom2020), in his editorial in The Journal of Agricultural Science, pointed out that in the present disease epidemic universities, research institutes and laboratories are closed, so many research projects have been discontinued. However, he noted that during the past century a wealth of agricultural research has been conducted, yet there are sometimes slow adoption processes.
The historical review below attempts to bring together some techniques for, and progress in, the analysis, interpretation and exploitation of existing research data.
Standard techniques
The statistical analysis of data is required in many disciplines across numerous industries. However, it first arose from the need of agricultural experimenters at Rothamsted Experimental Station UK to distinguish between treatment effects and chance in field plots. Russell (Reference Russell1966), in his history of agricultural science and in particular the history of Rothamsted, of which he was director from 1912 to 1943, quoted the landmark publication of Fisher (Reference Fisher1925) on statistical analysis and experimental design.
Guiding principles of the work of Fisher and his colleague Yates were randomisation and replication to estimate the effects of background variation and to cope with the inherent variability of biological material, including crops and livestock. Their early work on field plots allowed for variations in soils and topography across the fields of Rothamsted by randomized replicated plots. Fisher and Yates (Reference Fisher and Yates1938) developed the comprehensive tables needed by users of statistical analysis. With the computational technology of the 1930s that must have been a monumental task. Later, in the digital age, the statistical package Genstat was developed at Rothamsted in 1968 and, at that time, ran only on large main-frame computers. It can now be downloaded by users for their own local use.
Computational and data collection hardware technology has changed almost beyond recognition since the time of Fisher and Yates. There are drones and satellite imaging techniques for field data collection and on-board devices for tractors. There are tagging devices for monitoring farm animals. Large, geographically dispersed research teams can be in instant contact electronically without leaving their homes even in a pandemic. This is different from the research environment and working conditions of most decades of the 20th century.
Analytical procedures derived from and enhancing those of Fisher and Yates served agricultural researchers and users well for the next half century. Many texts giving guidance to researchers, students and other users were published in the years since 1938. Two examples of the genre may suffice, written 40 years apart, namely, those of Bailey (Reference Bailey1959) and Morris (Reference Morris1999).
An alternative
However, the Fisher and Yates approach to the analysis and understanding of experimental work has not always gone unchallenged. Matthews (Reference Matthews1998) argued that it had too successfully displaced subjectivity and consideration of the older Bayesian statistics (named after Thomas Bayes, an 18th century English clergyman). Matthews claimed that ‘… the axioms of probability reveal subjectivity to be a mathematically ineluctable feature of the quest for knowledge …’. Bayes's theorem incorporates measures of belief in a theory, given the data and considering previous information.
However, there does not appear to have been any resurgence of interest in a Bayesian approach and the methods of Fisher and Yates continue to dominate the analysis of agricultural experimentation.
A change of objective
Generations of experimenters since Fisher and Yates designed their experiments to detect significant differences and interactions between treatments. These could be described as difference questions, i.e. asking if a difference between treatments exists and can be distinguished from chance. By the 1970s, it was realized that in some agricultural markets, differences too small to be statistically significant might sometimes, if real, be commercially significant and too consequential to be ignored. Therefore, experiments intended to resolve difference questions were replaced by experiments designed to answer quantitative questions, that is questions about the magnitude of differences and responses to treatments.
This led to a change of direction, from examining significance to the fitting and analysis of response curves. Dillon (Reference Dillon1977) suggested using the first derivative of response functions to compare the incremental response with the incremental cost. This had profound effects on the design of some agricultural experiments, placing a premium on multi-levels of independent variables to define response curves. Statisticians at Rothamsted took this principle further by designing experiments with more emphasis on treatment levels than on replication at each level, even occasionally using incomplete replication in designs with multiple levels (Spechter et al., Reference Spechter, Bailey and Charles1982; Hill et al., Reference Hill, Charles, Spechter, Bailey and Ballantyne1988).
Response curve analysis
The interpretation of experimental results by response curve analysis led not only to the use of at least four graded levels of treatments but also to the application of least squares curve fitting packages to generate algebraic functions. Unfortunately the ease of applying linear regressions to data sometimes tempted authors to appear to assume linearity when the scatter of the data points suggested no such thing. Results could be summarized in the form:
where X is the independent agricultural input variable, Y the dependent yield variable and b is the slope.
However, a linear regression that offered a good fit (i.e. a high value of r 2), may not always have made biological sense, often because extrapolation of the line with slope b might indicate unlikely, or even impossible, values of Y. To avoid oversimplified assumptions of linearity, the curvature was often introduced by the least squares fitting of polynomials, often quadratic or third order. An example of a third-order polynomial was that used by Marsden and Morris (Reference Marsden and Morris1980) to describe the effects of environmental temperature on the voluntary energy intake (Y, kJ/bird per day) of laying hens
where T = dry bulb air temperature, °C.
Fitted curves should only be used when there are reasons to assume that the curvature may be biologically realistic. In the case of Eqn (2) there were reasons to suggest that the shape of the curve was appropriate since, by the time of its publication, there was quantitative evidence on the effects of temperature on hens (e.g. Emmans and Charles, Reference Emmans, Charles, Haresign, Swan and Lewis1977; Sykes, Reference Sykes, Haresign, Swan and Lewis1977), which hinted at the shape of the response.
It almost goes without saying that linear or curvilinear regressions should not be quoted when a simple plot of the data points indicates random scattering with no systematic relationship between the independent and dependent variables. Yet examples slip through editorial nets into publication. In an ecology text described by a distinguished reviewer as ‘… the standard advanced textbook in ecology for nearly 20 years …’ (Begon et al., Reference Begon, Townsend and Harper2006), there are several examples of straight lines drawn through scatter diagrams.
Biological models
It was a short step from curve fitting to mathematical models of responses to agricultural treatment variables. The models were intended to provide users with accessible summaries of large amounts of information. They also allowed exploration of the effects of changes in treatment levels, at least within the valid limits of the data. Two categories of biological models have been built to describe and, to a limited extent, to predict agricultural responses. Descriptive models, also called empirical models, simply apply a series of equations like Eqn (2) in some assumed reasonable biological order. They are quick to build but unsophisticated biologically. An example is a simple model of the effects of nutritional and environmental variables on laying hen egg production (Charles, Reference Charles1984). Later, a simple model of the energy balance of ruminants (Charles et al., Reference Charles, McArthur, Gregson and Crawshaw1991), based on climatic physiology published by Blaxter (Reference Blaxter, Haresign, Swan and Lewis1977), was used to estimate the responses of several classes of sheep and cattle to climatic factors, and thence to suggestions for shelter engineering (Charles, Reference Charles1991).
Fundamental models, by contrast, start from biological principles or physical fundamentals and build up from there. A famous early example is the Penman equation for evaporation of water from soil and grass (Penman, Reference Penman1948). More recent examples of fundamental models are the model of pig growth by Whittemore et al. (Reference Whittemore, Tullis and Emmans1988) and that of poultry growth by Emmans (Reference Emmans1995). Fundamental models are less ephemeral than descriptive models, the latter being no better than the data they describe which may be superseded by later information, or by genetic changes.
Even descriptive models of biological processes of agricultural importance, such as growth, usually require more sophisticated mathematical functions to describe them than simple linear or polynomial functions. Wilson (Reference Wilson, Boorman and Wilson1977) reviewed five functions, published from 1825 to 1966, describing the growth of farmed animals. Fundamental models have often been based on the Gompertz double exponential function for describing sigmoid growth, though it was developed for actuarial purposes (Gompertz, Reference Gompertz1825). Wilson showed that it closely described the growth curve of chickens Gallus domesticus (the dependent variable body weight, kg, plotted against the independent variable age in days).
While simple functions like polynomials are not always appropriate for modelling, polynomial approximations to more complex fundamental models have been used to save computing space in early models stored on the low capacity portable devices of the time. For example, Fisher et al. (Reference Fisher, Morris and Jennings1973) published a fundamental model of the response of laying hens to essential amino acid intake. In the simple empirical model of Charles (Reference Charles1984) an inverse polynomial function was found to yield a close approximation to the response curve of Fisher et al. (Reference Fisher, Morris and Jennings1973).
Limitations of this historical review
This history describes progress in this field during the 20th century and some procedural foundations laid during that period. That does not of course imply that no relevant developments have happened since.
Summary
In response to the needs of commercial users for more sensitivity there were, during the 20th century, sometimes changes from experiments designed to test questions of the reality of differences, to experiments designed to provide quantification of treatment effects. Thus data analysis and interpretation were no longer dominated by searches for significant differences but often involved response curve fitting.
Developments in hardware, in particular the advent of low-cost micro-computers, have made software formerly only running on large main-frame computers accessible to researchers and users at their desks or at home.
Yet we are still dealing with biological systems in plants, animals and soils. An understanding of their function and ecology remains important in designing experiments and in the interpretation of experimental results.
This review raises a question: namely is it time to revisit Bayesian statistics on the grounds that visionaries and innovators are prone to subjectivity? This is but a personal observation, though one with which I suspect Popper (Reference Popper1963) might have concurred.
The design of experiments and the analysis and interpretation of results are key component disciplines in agricultural science. Principles and procedures were developed during the 20th century, which should assist future research efforts in coping with feeding the global population in the rest of the 21st century.