A

Brian S. Everitt

Abortion rate:

The annual number of abortions per 1,000 women of reproductive age (usually defined as age 15–44 years). For example, in the USA in 1970 the rate was 5, in 1980 it was 25 and in 1990 it was 24. [Family Planning Perspectives, 1998, 30, 244–7.]

Abortion ratio:

The estimated number of abortions per 1,000 live births in a given year. Given by the ratio, 1,000 × (number of abortions divided by the number of live births). For example, in the USA in 1970 the ratio was 52, in 1980 it was 359, and in 1990 it was 344. [Family Planning Perspectives, 1998, 30, 244–7.]

Abscissa:

The horizontal (or x-axis) on a graph, or a particular point on that axis.

Absolute cause-specific risk:

Synonym for absolute risk.

Absolute deviation:

Synonym for average deviation.

Absolute risk:

Often used as a synonym for incidence, although also used occasionally for attributable risk, excess risk or risk difference. Defined more properly as the probability that a disease-free individual will develop a given disease over a specified time interval given current age and individual risk factors, and in the presence of competing risks. Absolute risk is a probability and consequently lies between 0 and 1. See also relative risk. [Kleinbaum, D. G., Kupper, L. L. and Morgenstern, H., 1982, Epidemiologic Research: Principles and Quantitative Methods, Lifetime Learning Publications, Belmont.]

Absolute risk reduction:

The proportion of untreated people who experience an adverse event minus the proportion of treated people who experience the event. For example, in a clinical trial of mammography, it was found that out of 129 750 women who were invited to begin having mammograms in the late 1970s and early 1980s, 511 died of breast cancer over the next 15 years, a death rate of 0.4 per cent. In the control group of 117 260 women who were not invited to have regular mammograms, there were 584 breast cancer deaths over the same period, a death rate of 0.5 per cent. So, the estimated absolute risk reduction is 0.1 per cent. See also relative risk and number needed to treat. [Strauss, S. E., Glazsziou, P., Scott Richardson, W. and Haynes, R. B., 2018, Evidence Based Medicine: How to Practice and Teach EBM, 5th ed., Churchill Livingstone, New York.]

Absorbing barrier:

See random walk.

Accelerated failure time model:

A general model for data consisting of survival times, in which explanatory variables measured on an individual are assumed to act multiplicatively on the timescale, and so affect the rate at which an individual proceeds along the time axis. Consequently, the model can be interpreted in terms of the speed of progression of a disease. This model, which simply regresses the logarithm of the survival time on the covariates, although used far less often than Cox’s proportional hazards model, might be a useful alternative in many situations because of this intuitive physical interpretation. [Collett, D., 2015, Modelling Survival Data in Medical Research, 3rd ed., Chapman & Hall/CRC, Boca Raton, FL.]

Acceptable quality level:

See quality control procedures.

Acceptable risk:

The risk for which the benefits of a particular medical procedure are considered to outweigh the potential hazards. For example, islet transplantation would help to control the many secondary effects of type 1 diabetes, but what is the appropriate level of risk to implement this technology responsibly considering the possible dangers from retroviruses? [Nature, 1998, 391, 326.]

Acceptance region:

A term associated with statistical significance tests, which gives the set of values of a test statistic for which the null hypothesis is to be accepted. Suppose, for example, that a z-test is being used to test the null hypothesis that the mean blood pressure of men and women is equal against the alternative hypothesis that the two means are not equal. If the chosen significance of the test is 0.05, then the acceptance region consists of values of the test statistic z between −1.96 and 1.96. [Altman, D. G., 2018, Practical Statistics for Medical Research, Chapman & Hall/CRC, Boca Raton, FL.]

Accident proneness:

A personal psychological factor that affects an individual’s probability of suffering an accident. The concept has been studied statistically under a number of different assumptions for accidents:

Pure chance, leading to the Poisson distribution
True contagion, that is, the hypothesis that all individuals initially have the same probability of having an accident, but that this probability changes each time an accident happens
Apparent contagion, that is, the hypothesis that individuals have constant but unequal probabilities of having an accident

The study of accident proneness has been valuable in the development of particular statistical methodologies, although in the last two decades the concept has, in general, been out of favour. Attention now appears to have moved more towards risk evaluation and analysis. [Shaw, L. and Sichel, H. S., 1971, Accident Proneness, Pergamon Press, Oxford.]

Accrual rate in clinical trials:

The rate at which eligible patients are entered into a clinical trial, measured as people per unit time. Often disappointingly low, for reasons that may be both physician and patient related. Low accrual to adult oncology trials is a major barrier to progress in cancer therapy. [Journal of Clinical Oncology, 2001, 19, 3554–61.]

Accuracy:

The degree of conformity to some recognized standard value. See also bias.

Accuracy versus precision:

An accurate estimate is close to the quantity being estimated. A precise interval estimate is a narrow one, but it may not be accurate even when quoted to a large number of decimal places.

ACES:

Abbreviation for active control equivalence studies.

ACF:

Abbreviation for autocorrelation function.

ACORN:

Acronym for ‘a classification of residential neighbourhoods’. A system for classifying households according to demographic, employment and housing characteristics of their immediate neighbourhood. Derived by applying cluster analysis to 40 variables, including age, class, tenure, dwelling type and car ownership, used to describe each neighbourhood. [Dorling, D. and Simpson, S., 1999, Statistics in Society, Arnold, London.]

Acquiescence bias:

The bias produced by respondents in a survey who have the tendency to give positive answers, such as ‘true’, ‘like’, ‘often’ or ‘yes’ to a question. At its most extreme, the person responds in this way irrespective of the content of the question. Thus, a person may respond ‘true’ to two statements such as ‘I always take my medicine on time’ and ‘I often forget to take my pills’. See also end-aversion bias. [Journal of Intellectual Disability Research, 1995, 39, 331–40.]

Active control equivalence studies (ACES):

Studies that aim to demonstrate that an experimental treatment is equivalent in efficacy to a standard treatment. The justification for undertaking such studies is that even if the new treatment is no more effective than the existing treatment in alleviating a particular condition, it may still be of use for patients who are resistant to, or who simply cannot tolerate, the standard treatment. So clinical trials are sometimes undertaken when the object is simply to show that the new treatment is at least as good as the existing treatment. [Senn, S., 2008, Statistical Issues in Drug Development, 2nd ed., John Wiley & Sons, Chichester.]

Active control trials:

Clinical trials in which the new treatment is compared with some other active agent rather than a placebo. For example, a clinical trial investigating treatments for asthma might compare the long-acting beta-agonists salmeterol and formoterol with the shorter-acting beta-agonist salbutomol. [Senn, S., 2008, Statistical Issues in Drug Development, 2nd ed., John Wiley & Sons, Chichester.]

Active life expectancy (ALE):

Defined for a given age as the expected remaining years free of disability. In life expectancy the end point is death. In active life expectancy the end point is the loss of independence or the need to rely on others for assistance with daily activities. ALE is a useful index of public health and quality of life in a population. Interest in recent years has centred on whether current trends towards longer life expectancy have been accompanied by comparable increases in active life expectancy. See also disability-free life expectancy. [New England Journal of Medicine, 1983, 309, 1218–24.]

Activities of daily living scale (ADLS):

A scale designed to measure physical ability/disability, which is used in investigations of a variety of chronic disabling conditions such as arthritis. The scale is based on scoring responses to questions about mobility, self-care, grooming, and so on. See also Barthel index and health assessment questionnaire. [Journal of the American Medical Association, 1963, 185, 914–19.]

Actuarial statistics:

The statistics used by actuaries to evaluate risks, calculate liabilities and plan the financial course of insurance, pensions, and so on. An example is life expectancy for people of various ages, occupations, etc. See also life table. [Benjamin, B. and Pollard, J. H., 1993, The Analysis of Mortality and Other Actuarial Statistics, 3rd ed., Institute of Faculty of Actuaries, Oxford.]

Adaptation:

A heritable component of the phenotype that confers an advantage in survival and reproduction success; the process by which organisms adapt to environmental conditions. [Sham, P. C., 1998, Statistics in Human Genetics, Arnold, London.]

Adaptive cluster sampling:

A procedure in which an initial set of subjects is selected by some sampling process and, whenever the variable of interest of a selected subject satisfies a given criterion, additional subjects in the neighbourhood of that subject are added to the sample. [Biometrika, 1996, 84, 209–19.]

Adaptive randomization designs:

Randomized clinical trials in which treatment allocation probabilities are modified based on accumulating data in the trial to achieve selected experimental objectives while protecting the study from bias and preserving inferential validity of the trial results. Such designs have the potential to outperform traditional parallel group fixed randomization designs by treating trial participants more efficiently, identifying promising treatments more rapidly and minimizing unnecessary expenditures while maintaining validity and integrity of results. See also covariate adaptive randomization and Bayesian adaptive randomization. [Drug Information Journal, 2006, 40, 425–35.]

Adaptive treatment-switching design:

A design for clinical trials that allows the investigator to switch a patient’s treatment from an initial assignment to an alternative treatment if there is evidence of lack of efficacy or safety of the initial treatment. Such a design is commonly employed in cancer trials.

Adaptive randomization:

There are now many methods of adaptive randomization that can be used in clinical trials as an alternative to simple randomization but it might be wise to keep in mind that old piece of advice, ‘If it ain’t broke, don’t fix it’.

Addition rule for probabilities:

For two mutually exclusive events, that is, events that cannot occur together, the probability of either event occurring is the sum of the two individual probabilities. The rule extends in an obvious way to more than two mutually exclusive events. See also multiplication rule for probabilities.

Additive effect:

A term used when the effect of administering two or more factors together is the sum of the effects that would be produced by each of the factors in the absence of the others.

Additive genetic variance:

The variance of a characteristic that can be explained by the additive effects of genes. [Sham, P. C., 1998, Statistics in Human Genetics, Arnold, London.]

Add-on trial:

A clinical trial that compares treatments, say A and B, in the presence of a standard treatment, say S, the randomized comparisons being S + A versus S + B. Under certain conditions, B may be a placebo version of A. Used routinely in AIDS trials. [Statistical Methods in Medical Research, 2002, 11, 1–22.]

Adequate subset:

A term used most often in regression analysis for a subset of the explanatory variables that is thought to be as adequate for predicting the response variable as the complete set of explanatory variables under consideration. See also all-subsets regression and selection methods in regression.

Adherence:

Synonym for compliance.

Adjectival scales:

Scales with adjectival descriptions and discrete or continuous responses. Two examples are:

1. Discrete response (participants may circle one)
How much pain are you suffering today?
Below average Average Above average
2. Continuous response (participants may mark anywhere on the line)
How satisfied are you with your treatment?

[Streiner, D. L. and Norman, G. R., 1989, Health Measurement Scales, Oxford University Press, Oxford.]

Adjusted R²:

The square of the multiple correlation coefficient adjusted for the number of parameters in the model under consideration. [Der, G. and Everitt, B. S., 2008, A Handbook of Statistical Analysis using SAS, 3rd ed., Chapman & Hall/CRC, Boca Raton, FL.]

Adjusted treatment means:

A term usually applied to the estimates of the treatment means in an analysis of covariance after they have been adjusted for the covariates of interest using the estimated relationship between the covariates and the response variable. [Fisher, L. D. and Van Belle, G., 2004, Biostatistics: A Methodology for the Health Sciences, 2nd ed., Wiley-Blackwell, New York.]

Adjusting for baseline:

The process of allowing for the effect of baseline characteristics, particularly a prerandomization measure of the response variable, on the response variable, usually in the context of a clinical trial. A number of methods might be used; for example, the analysis of simple change scores, the analysis of percentage change, or, in some cases, the analysis of more complicated variables, such as 100 × change/baseline. In general, it is preferable to use the adjusted variable that has least dependence on the baseline measure. In the context of a longitudinal study in which the correlations between the repeated measures over time are moderate to large, then using the baseline values as covariates in an analysis of covariance is known to be more efficient than analyzing change scores. See also baseline balance. [Senn, S., 2008, Statistical Issues in Drug Development, 2nd ed., John Wiley & Sons, Chichester.]

Adjusting for baseline:

Change scores remain popular despite being less powerful than using analysis of covariance. It is difficult to think why.

ADLS:

Abbreviation for activities of daily living scale.

Administrative databases:

Databases derived from information collected routinely and systematically for purposes of managing a healthcare system. Such data can be used to examine admission procedures and lengths of stay and make comparisons across hospitals, communities and regions. [Grady, M. L. and Schwartz, H. A., 1992, Medical Effectiveness Research Data Methods, Department of Health and Human Services, Rockville, MD.]

Admixture in human populations:

The exchange of genes by breeding between members of different linguistic and cultural groups, or the sudden infusion of genes caused by large-scale migration. [Annals of Human Genetics, 1971, 35, 9–17.]

Adoption studies:

Studies involving subjects raised by non-biological parents. Such studies have played a prominent role in the assessment of genetic variation in human and animal traits. For example, if the shared environment is influential, then siblings raised in the same family should be more similar than adopted-away siblings (siblings reared apart). [Fuller, J. L. and Thompson, W. R., 1979, Foundations of Behavior Genetics, Mosby, St Louis, MO.]

Adverse event:

Any undesirable consequences experienced by a patient during a medical investigation or study, particularly a clinical trial. These can range from the minor, for example constipation, to the far more serious, for example a heart attack. Such events need to be noted on a case report. [Quality in Health Care, 2000, 9, 47–52.]

Aetiological fraction:

Synonym for attributable risk.

Age heaping:

A term applied occasionally to the collection of data on ages when these are accurate only to the nearest year, half-year or month. See also rounding. [Geographical Journal, 1992, 28, 427–42.]

Age–incidence curve:

A plot of age against the age-specific incidence rate for some disease or condition of interest. For example, for cancer of the uterine cervix, the curve rises steeply from puberty to age 40, after which it plateaus. [Proceedings of the National Academy of Science of the United States of America, 1977, 74, 1341–2.]

Ageing models:

Models for ageing and biological life span are important to plan for the future of society. The features of lifetime data in ageing research which distinguish such data from typical survival data are that (1) all individuals enter the study simultaneously, (2) a cohort is observed until the last member is dead and lifetime normally refers to the entire life span of each individual, (3) censoring is only seldom encountered and (4) the data are usually aggregated in a life table. [Age and Ageing, 2006, 35, 607–14.]

Age-of-onset estimation:

The estimation of the distribution, as a function of age, of the time a trait or condition first appears. For example, a psychiatrist might be interested in the age-of-onset of schizophrenia. Estimating age-of-onset is important in studies of disease aetiology and crucial to clinical management and in designing disease prevention studies. [Genetic Epidemiology, 1989, 6, 217–20.]

Age–period–cohort analysis:

A family of statistical techniques for understanding temporal trends in an outcome measure in terms of three related time variables: the subject’s age, the subject’s date of birth and the calendar period. Early methods employed informative graphical displays of the data, but recently more formal modelling techniques have been used to try to disentangle the separate contributions of each of the factors. See also Lexis diagram. [Annual Reviews of Public Health, 1991, 12, 425–57.]

Age-related reference ranges:

Range of values of a measurement of interest that identify to upper and lower limits of normality in some population, where the range varies according to the subject’s age. An example is shown in Figure 1. [Statistics in Medicine, 1993, 12, 917–24.]

Figure 1 Age-related 95% reference ranges for blood pressure in boys: systolic (solid lines); diastolic (dotted lines).

(Figure after that in age-related reference ranges in Encyclopaedic Companion to Medical Statistics, 2nd ed., eds. B. S. Everitt and C. R. Palmer, John Wiley & Sons, Chichester, 2005.)

Age–sex pyramid:

See population pyramid. [British Journal of Medicine, 1985, 291, 1391–3.]

Age–sex register:

A list of all patients or clients of a medical practice or service classified by age and sex. Such information is often needed for calculating, for example, age-specific birth rate, age-specific death rate and sex-specific death rate for conditions of interest. [British Medical Journal (Clinical Research Edition), 1984, 288, 1967.]

Age-specific birth rate:

The number of live births per 1,000 women in a specific age group. For example, in California in 1990, the rate for women aged 15–19 years was 11.4; in 1998, the corresponding figure was 11.2. [Child Trends, 2018.]

Age-specific death rate:

Death rate calculated for a specified age group. For example, for 20–30-year-olds:

DR (20 / 30) = \frac{number of deaths among 20 - 30 ‐ year ‐ olds in a year}{average population size of 20 - 30 ‐ year ‐ olds in the year}

Calculating death rates in this way is usually necessary since such rates almost invariably differ widely with age, a variation not reflected in the crude death rate. In England and Wales in 1990, the age-specific death rates per 1,000 for men in four age groups were:

45–54 years: 4.8
55–64 years: 14.8
65–74 years: 39.5
75–84 years: 94.3

See also cause-specific death rates and standardized mortality rate. [Fisher, L. D. and Van Belle, G., 2004, Biostatistics, 2nd ed., Wiley-Blackwell, New York.]

Age-specific fertility rate:

The number of births occurring during a specified period to women of a specified age group, divided by the number of person-years lived during that period by women of that age group. For example, in the period 1990–5, the rate per 1,000 women in the 15–19 years age group in Africa was 136, in Asia 45 and in Europe 27. The corresponding figures for the 40–44-years age group were Africa 82, Asia 22 and Europe 5.

Age-specific incidence rate:

Incidence rates calculated within a number of relatively narrow age bands. For example, age is the most important risk factor for prostate cancer, with the incidence rate being very small for men below 45 years but about 1,000 per 100,000 at age 65. [American Journal of Epidemiology, 2000, 151, 1158–71.]

Age standardization:

A process of adjusting rates before they are compared in different populations, so as to minimize the effects of possible differences in age composition of the populations. See also standardized mortality rate.

Agglomerative hierarchical clustering methods:

Methods of cluster analysis that begin with each individual defining a separate cluster and proceed by combining individuals and later groups of individuals into larger clusters ending when all the individuals have been combined into a single cluster. At each stage, the individuals or groups of individuals who are closest according to some particular definition of distance are joined. The whole series of steps can be summarized by a dendrogram. Solutions corresponding to a particular number of clusters are found by cutting the dendrogram at some level. See also average linkage clustering, complete linkage cluster analysis, single linkage clustering, Ward’s method and K-means cluster analysis. [Everitt, B. S., Landau, S., Leese, M. and Stahl, D., 2011, Cluster Analysis, 5th ed., John Wiley & Sons, Chichester.]

Agresti’s alpha:

A generalization of the odds ratio for two-by-two contingency tables to larger contingency tables arising from data where there are different degrees of severity of a disease and differing amounts of exposure. [Agresti, A., 2010, Analysis of Ordinal Categorical Data, 2nd ed., Wiley-Blackwell, New York.]

Aickin’s measure of agreement:

A chance-corrected measure of agreement that is similar to the kappa coefficient but that uses a different definition of chance agreement. [Biometrics, 1990, 46, 293–302.]

Akaike’s information criterion:

An index often used as an aid in choosing the most suitable model for a set of observations. The index takes into account both the statistical goodness of fit and the number of parameters that have to be estimated to achieve this degree of fit, by imposing a penalty for increasing the number of parameters. Lower values of the index indicate the preferred model, that is, the one with the fewest parameters that still provides an adequate fit to the data. See also parsimony principle. [Psychometrika, 1987, 52, 345–70.]

Algorithm:

A well-defined set of rules that, when applied routinely, lead to a solution of a particular class of mathematical or computational problem.

Alias:

See confounding.

Allele:

One of two or more genes that may occur at a given location in the genes of an individual; essentially alternative forms of a gene occupying the same locus on a chromosome.

Allocation rule:

See discriminant analysis.

Allometric growth:

Changes in the shape of an organism associated with different growth rates of its parts. Shape changes in growing organs or whole organisms may be triggered by either biological or physical needs. [Bookstein, F. L., 1978, The Measurement of Biological Shape and Shape Change, Springer-Verlag, Berlin.]

All-subsets regression:

A form of regression analysis in which all possible models are compared and the ‘best’ selected using some appropriate index of performance of each model. If the number of explanatory variables is q, then there are a total of 2^q − 1 models to consider, since each explanatory variable can be either ‘in’ or ‘out’ of the model and the model with no explanatory variables is excluded. So, for example, with q = 10, a total of 1,023 models have to be considered. The leaps-and-bounds algorithm is generally used to make the approach computationally feasible when there is a large number of explanatory variables.[Rawlings, J. O., Pantula, S. G. and Dickey, D. A., 2nd ed., 2013, Applied Regression Analysis: A Research Tool, Springer, New York.]

Alpha (a):

The probability of a type I error. See also significance level.

Alpha spending function:

An approach to interim analysis for clinical trials that allows the control of the type I error rate while retaining flexibility in the number of interim analyses to be conducted and their timing. [Statistics in Medicine, 1996, 15, 1739–46.]

Alpha-trimmed mean:

A statistic for estimating the mean of a population that is less affected by the presence of outliers than the arithmetic mean. Involves dropping a proportion (alpha) of the observations from both ends of the sample before calculating the mean of the remainder. [Fisher, L. D. and Van Belle, G., 2013, Biostatistics, 2nd ed., John Wiley & Sons, New York.]

Alternate allocation:

A method of allocating patients to treatment groups in a clinical trial that places alternate patients into the groups. Not to be recommended since it is open to accusations of abuse, with, for example, the treating clinician having the possibility of manipulating that the patient receives each treatment. [Everitt, B. S. and Wessely, S., 2008, Clinical Trials in Psychiatry, 2nd ed., Wiley-Blackwell, Chichester.]

Alternate allocation:

Never use this method of allocation if you wish your clinical trial to be taken seriously.

Alternative hypothesis:

The hypothesis against which the null hypothesis is tested generally in the context of statistical significance tests. [Altman, D. G., 2018, Practical Statistics for Medical Research, Chapman & Hall/CRC, Boca Raton, FL.]

Analysis as randomized:

Synonym for intention-to-treat analysis.

Analysis of covariance:

Essentially an application of multiple linear regression in which some of the explanatory variables are categorical, often binary, for example treatment group, and others are continuous, for example age. The aim is to increase the sensitivity of the F-tests used in assessing treatment differences. [Altman, D. G., 2018, Practical Statistics for Medical Research, Chapman & Hall/CRC, Boca Raton, FL.]

Analysis of dispersion:

Synonym for multivariate analysis of variance.

Analysis of variance:

The separation of variation attributable to one factor from that attributed to others. By partitioning the total variance of a set of observations into parts due to particular factors, for example sex, treatment group, etc., differences in the mean values of the dependent variable can be assessed. The simplest analysis of this type involves a one-way design: a sample of individuals from a number of different populations is compared with respect to some outcome measure of interest. The total variance in the observations is partitioned into a part due to differences between the group means (between-groups sum of squares) and a part due to differences between subjects in the same group (within-groups sum of squares or residual sum of squares). The results of this division are usually arranged in an analysis of variance table. The equality of the means of the populations involved implies that the between-group and within-group variances are both estimating the same quantity, and this can be tested by an appropriate F-test. [Altman, D. G., 2018, Practical Statistics for Medical Research, Chapman & Hall/CRC, Boca Raton, FL.]

Analysis of variance:

The model underlying the analysis of variance is essentially the same as that involved in multiple linear regression, with the explanatory variables being dummy variables coding factor levels and interactions between factors.

Analysis of variance table:

See analysis of variance.

Analytic epidemiology:

A term for epidemiological studies, such as case-control studies, that obtain individual-level information on the association between disease status and exposures of interest.

ANCOVA:

Acronym for analysis of covariance.

Anecdotal evidence:

Evidence from case reports or observations on a single patient by a particular clinician rather than from systematically collected data. Although such evidence is not acceptable for drawing conclusions about treatments or therapies, it may be suggestive of procedures worthy of further investigation in an appropriate scientific manner, for example by way of a clinical trial. [Everitt, B. S. and Wessely, S., 2008, Clinical Trials in Psychiatry, 2nd ed., Wiley-Blackwell, Chichester.]

Anecdotal evidence:

Suitable only for tabloid journalists, not for serious medical researchers.

Angle transformation:

Synonym for arc-sine transformation.

Angular histogram:

A method for displaying circular data, which involves wrapping the usual histogram around a circle. Each bar in the histogram is centred at the midpoint of the group interval with the length of the bar proportional to the frequency of the group. Figure 2 shows such a display for arrival times on a 24-hour clock of 254 patients at an intensive care unit, over a period of 12 months. [Fisher, N. I., 1993, Statistical Analysis of Circular Data, Cambridge University Press, Cambridge.]

Figure 2 Angular histogram for arrival times at an intensive care unit.

(Reproduced by permission of Cambridge University Press from Statistical Analysis of Circular Data, by N. I. Fisher, 1995.)

Animal model:

A study carried out in a population of laboratory animals that is used to model processes comparable to those that occur in human populations.

ANOVA:

Acronym for analysis of variance.

Antagonism:

See synergism.

Anthropometry:

The subject that deals with the measurement of the size, weight and proportions of the human body. [Tanner, J. M., 1981, A History of the Study of Human Growth, Cambridge University Press, Cambridge.]

Antidependence models:

Models used in the analysis of longitudinal data that assume a particular form for the variance-covariance matrix of the repeated measurements, a form that arises from assuming that an observation at a particular time point is dependent on the values of a small number of the immediately preceding observations, but given these values, is conditionally independent of earlier observations. [Biometrics, 1986, 42, 805–20.]

Apgar score:

See Likert scale.

Approximation:

A result that is not exact but is sufficiently close to the truth to be of practical value.

A priori comparisons:

Synonym for planned comparisons.

Arc-sine transformation:

A transformation of a proportion designed to stabilize its variance and produce values more suitable for techniques such as analysis of variance and regression analysis. [Collett, D., 2003, Modelling Binary Data, 2nd ed., Chapman & Hall/CRC, Boca Raton, FL.]

Area sampling:

A method of sampling in which a geographical region is subdivided into smaller areas (countries, villages, city blocks, etc.), some of which are then selected at random and subsampled or surveyed completely. [American Journal of Epidemiology, 2001, 153, 1119–27.]

Area under curve (AUC):

Often a useful way of summarizing the observations made on an individual over time, for example those collected in a longitudinal study or for a dose–response relationship. The measure is illustrated in Figure 3. Usually calculated by adding together the area under the curve between each pair of consecutive observations. The AUC is often a predictor of biological effects, such as toxicity or efficacy, and for measurements taken at regular intervals it is essentially equivalent to using the mean. See also C_max, T_max and response feature analysis. [International Journal of Clinical Pharmacology, Therapeutics and Toxicology, 1991, 29, 394–9.]

Figure 3 Time course of plasma concentration following a single oral administration of a drug: illustrates area under the curve, C_max and T_max.

(Taken with permission of the publisher Wiley from Encyclopedia of Biostatistics, volume 1, 2nd ed., edited by P. Armitage and T. Colton, 2005.)

Arithmetic mean:

See mean.

Armitage–Doll model:

A model for carcinogenesis in which the central idea is that the important variable determining the change in risk is not age but time. The model proposes that cancers of a particular tissue develop according to the following process:

A normal cell develops into a cancer cell by means of a small number of transitions through a series of intermediate steps.
Initially, the number of cells at risk is very large, and for each cell a transition is a rare event.
The transitions are independent of one another.

[LeCam, L. M. and Neyman, J., 1961, Proceedings of the Fourth Berkeley Symposium on Mathematic Statistics and Probability, University of California Press, Berkeley, CA.]

Artefactual associations:

See association.

Artificial intelligence (AI):

A discipline that attempts to understand intelligent behaviour in the broadest sense, by getting computers to reproduce it, and to produce machines that behave intelligently no matter what their underlying mechanism. (Intelligent behaviour is taken to include reasoning, thinking and learning.) See also artificial neural network and pattern recognition. [Russell, S. and Norvig, P., 2002, Artificial Intelligence: A Modern Approach, 2nd ed., Prentice Hall, Harlow.]

Artificial neural network:

A mathematical structure modelled on the human neural network and designed to attach many statistical problems, particularly in the areas of pattern recognition, multivariate analysis, learning and memory. The essential feature of such a structure is a network of simple processing elements (artificial neurons) coupled together (either in the hardware or the software) so that they can cooperate. From a set of inputs and an associated set of parameters, the artificial neurons produce an output that provides a possible solution to the problem under investigation. In many neural networks, the relationship between the input received by a neuron and its output is determined by a generalized linear model. Enthusiasts often assert that neural networks provide a new approach to computing, whereas sceptics point out that they may solve a few ‘toy’ problems but cannot be taken seriously as a general problem-solving tool. [Garson, G. D., 1998, Neural Networks: An Introductory Guide for Social Scientists, Sage, London.]

Artificial neural network:

Fashionable, but perhaps not the answer to as many problems as its advocates would have the rest of us believe.

Artifical neurons:

See artificial neural network.

Ascertainment bias:

A possible form of bias, particularly in retrospective studies, that arises from a relationship between the exposure to a risk factor and the probability of detecting an event of interest. In a study comparing women with cervical cancer and a control group, for example, an excess of oral contraceptive use amongst the cases might possibly be due to more frequent screening in this group. See also recall bias. [American Journal of Epidemiology, 2002, 155, 875–80.]

Asher’s paradox:

According to Richard Asher, a distinguished London physician, if you can believe fervently in the treatment you are suggesting to a patient, even though controlled studies show that it is quite useless, then your results are much better, your patients are much better, and your income is much better too. Asher goes on to suggest that this accounts for the remarkable success of some of the less gifted, but more credulous members of the medical profession, and also for the often violent dislike of statistics and controlled tests, which fashionable and successful doctors are accustomed to display. See also placebo effect.

ASN:

Abbreviation for average sample number.

As-randomized analysis:

Synonym for intention-to-treat analysis.

Assay:

An experiment designed to estimate the strength, kind or quality of some physical, chemical, biological, physiological or psychological agent by means of the response induced by that agent in living or nonliving matter. See also bioassay.

Assay run:

A set of consecutive measurements, readings or observations all based on the same batch of reagents.

Assigned treatment:

The treatment a patient in a clinical trial is designated to receive, as indicated at the time of enrolment. See also compliance, intention-to-treat analysis and pill count.

Association:

A general term to describe the statistical dependence between two variables; ‘positive association’ implies that values of one variable increase (decrease) as the other variable increases (decreases), and ‘negative’ association the reverse. Association can be measured in a variety of ways, for example, correlation coefficients, relative risk, odds ratio, phi coefficient and the Goodman and Kruskal measure of association. It needs to be remembered that an observed association say between an exposure and a disease, may be artefactual, non-causal or causal. An artefactual association may, for example, arise because of bias in the study, and a noncasual association can occur if the disease causes the exposure rather than vice versa, or if both disease and exposure are associated with a third variable, known or unknown. See also Berkson’s fallacy and Simpson’s paradox. [Everitt, B. S., 1992, The Analysis of Contingency Tables, 2nd ed., Chapman & Hall/CRC, Boca Raton, FL.]

Association:

Remember, an association, if biologically plausible, may suggest a causal link but proof is only obtainable by experiment.

Assortative mating:

A process whereby biological parents are more similar for a phenotype trait than they would be if the mating occurred at random in the population. Common examples in human populations are height and intelligence. [Eugenics Quarterly, 1968, 15, 128–40.]

Assumptions:

The conditions under which statistical techniques give valid results. For example, analysis of variance generally assumes normality, homogeneity of variance, and independence of the observations.

Asymmetrical distribution:

A probability distribution or frequency distribution that is not symmetrical about some central value. A J-shaped distribution is an example.

Asymptotic method:

Synonym for large sample method.

Attachment level:

A common measure of periodontal disease levels given by the minimum distance between the cement–enamel junction (a reference point on the tooth) and the epithelial attachment. Usually measured in millimetres with a graduated blunt-end probe. [Journal of Periodontology, 2002, 73, 198–205.]

Attack rate:

A term often used for the incidence rate of a disease or condition in a particular group, or during a limited period of time, or under special circumstances such as an epidemic. A specific example would be one involving outbreaks of food poisoning, where the attack rates would be calculated for those people who have eaten a particular item and for those who have not. Calculated as the ratio of the number of people ill in the time period over the number of people at risk in the time period. An example involving a well-known brand of soft drink occurred in Belgium in 1999, in which 37 of 280 students were identified as cases and the attack rate was 13.2% overall, 15.6% among girls and 8.9% among boys.

Attenuation:

A term applied to the correlation calculated between two variables when both are subject to measurement error to indicate that the value of the correlation between the true values is likely to be underestimated. See also regression dilution. [Fisher, L. D. and Van Belle, G., 2004, Biostatistics: A Methodology for the Health Sciences, 2nd ed., John Wiley & Sons, New York.]

Attributable fraction:

A measure that quantifies the proportional reduction in disease prevalence that would be achieved if a risk factor of interest could somehow be eliminated from the population. The measure depends jointly on relative risk and the prevalence on the risk factor within the general population. [Statistical Methods in Medical Research, 2001, 10, 159–93.]

Attributable risk:

A measure of the association between exposure to a particular factor and the risk of a particular outcome, calculated as

\frac{incidence rate among exposed - incidence rate among nonexposed}{incidence rate among exposed}

Measures the amount of the incidence rate that can be attributed to one particular factor. Can be estimated from case-control studies and cross-sectional studies. For example, it has been reported that for lifetime smokers, 31% of all lung cancer is attributable to five nonsmoking risk factors. See also relative risk and preventable fraction. [American Journal of Epidemiology, 1995, 142, 1338–43.]

Attrition:

A term used to describe the loss of subjects over the period of a longitudinal study. Such phenomena may cause problems in the analysis of data from such a study. See also missing values. [Everitt, B. S. and Wessely, S., 2008, Clinical Trials in Psychiatry, 2nd ed., Wiley-Blackwell, Chichester.]

AUC:

Abbreviation for area under curve.

Audit in clinical trials:

The process of ensuring that data collected in complex clinical trials are of high quality. [Controlled Clinical Trials, 1995, 16, 104–36.]

Audit trail:

A computer program that keeps a record of changes made to a database.

Autocorrelation:

The internal correlation of the observations in a time series, usually expressed as a function of the time lag between observations. A plot of the value of the autocorrelation against the time lag is known as the autocorrelation function or correlogram and is a basic tool in analysis of time series data. An example of a medical time series and its associated correlogram is shown in Figure 4. [Chatfield, C., 2019, The Analysis of Time Series: An Introduction, 7th ed., Chapman & Hall/CRC, Boca Raton, FL.]

Figure 4 Levels of luteinizing hormone in blood samples taken from a healthy woman every 10 minutes (a) and the autocorrelation function with approximately 95% confidence limits for zero correlation (b).

(Taken with permission of the publisher Wiley from Encyclopedia of Biostatistics, volume 1, 2nd ed., edited by P. Armitage and T. Colton, 2005.)

Autocorrelation function (ACF):

See autocorrelation.

Auto-encoding:

Coding of clinical data by a computer program that matches original text to predetermined dictionary terms.

Autoregressive model:

A model used primarily in the analysis of time series in which an observation at a particular time is postulated to be a linear function of previous values in the series plus an error term. [Chatfield, C., 2019, The Analysis of Time Series: An Introduction, 7th ed., Chapman & Hall/CRC, Boca Raton, FL.]

Available case analysis:

An approach to multivariate data containing missing values on a number of variables, in which means, variances and covariances are calculated from all available subjects with non-missing values on the variable (means and variances) or pair of variables (covariances) involved. Although this approach makes use of as many of the observed data as possible, it does have disadvantages. For example, the summary statistics for each variable may be based on different numbers of observations, and the calculated variance–covariance matrix may now not be suitable for methods of multivariate analysis such as principal components analysis and factor analysis. See also missing values and imputation. [Schafer, J. L., 1997, Analysis of Incomplete Multivariate Data, Chapman & Hall/CRC, Boca Raton, FL.]

Available case analysis:

Causes problems when the investigator wishes to apply, say, factor analysis to the data. Complete case analysis and imputation of the missing values are possible alternatives.

Average:

Used most often for the arithmetic mean of a sample of observations, but can also be used for other measures of location, such as the median.

Average age at death:

A flawed statistic for summarizing life expectancy and other aspects of mortality. For example, a study comparing average age at death for male symphony orchestra conductors and for the entire US male population showed that, on average, the conductors lived about four years longer. The difference is, however, illusory, because as age at entry was birth, those in the US male population who died in infancy and childhood were included in the calculation of the average life span whereas only men who survived to become conductors could enter the conductor cohort. The apparent difference in longevity disappeared after accounting for infant mortality rate and perinatal mortality rate. [Andersen, B., 1990, Methodological Errors in Medical Research, Blackwell Scientific, Oxford.]

Average age at death:

Beware trying to prove that clinicians (or statisticians) live longer on the basis of this statistic.

Average deviation:

A little used measure of the spread of a sample of observations given by the average of the absolute values of deviations of each observation from the sample mean.

Average linkage clustering:

An agglomerative hierarchical clustering method that uses the average distance from members of one cluster to members of another cluster as its measure of intergroup distance. [Everitt, B. S., Landau, S., Leese, M. and Stahl, D., 2001, Cluster Analysis, 5th ed., John Wiley & Sons, Chichester.]

Average sample number (ASN):

A quantity often used to describe the performance of a sequential analysis and given by the expected value of the sample size required to reach a decision to accept the null hypothesis or the alternative hypothesis and therefore to discontinue sampling.

Axiom:

A statement that is considered self-evident and used as a foundation on which arguments can be based.

Book contents

A

Summary

Adaptive randomization:

Adjusting for baseline:

Alternate allocation:

Analysis of variance:

Anecdotal evidence:

Artificial neural network:

Association:

Available case analysis:

Average age at death:

Save book to Kindle

Save book to Dropbox

Save book to Google Drive