Skip to main content Accessibility help
×
Hostname: page-component-788cddb947-tr9hg Total loading time: 0 Render date: 2024-10-15T12:20:58.712Z Has data issue: false hasContentIssue false

Chapter 1 - Introduction

Published online by Cambridge University Press:  20 April 2023

Jos W. R. Twisk
Affiliation:
Amsterdam University Medical Centers

Summary

In Chapter 1 the different medical study designs are discussed and the difference between age, period and cohort effects is explained. Furthermore, some general information (e.g. prior knowledge, software used for the examples) needed to work through the book is provided. Finally, there is a short section in which the differences between the second and third edition are outlined.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2023

1.1 Introduction

Longitudinal studies are defined as studies in which the outcome variable is repeatedly measured; i.e. the outcome variable is measured in the same subject on several occasions. In longitudinal studies, the observations of a subject over time are not independent of each other, and therefore it is necessary to apply special statistical methods, which take into account the fact that the repeated observations within a subject are correlated. The definition of longitudinal studies (used in this book) implicates that statistical methods like survival analyses are beyond the scope of this book. Those methods basically are not longitudinal data analysing methods because (in general) the outcome variable is an irreversible endpoint and therefore strictly speaking only measured at one occasion. After the occurrence of an event no more observations are carried out on that particular subject.

Why are longitudinal studies so popular these days? One of the reasons for this popularity is that there is a general belief that with longitudinal studies the problem of causality can be solved. This is, however, a typical misunderstanding and is only partly true. Table 1.1 shows the most important criteria for causality, which can be found in every epidemiological textbook. Only one of them is specific for a longitudinal study: the rule of temporality. There has to be a time-lag between the outcome variable (effect) and the covariate (cause); in time the cause has to precede the effect. The question of whether or not causality exists can only be (partly) answered in specific longitudinal studies (e.g. randomized controlled trials) and certainly not in all longitudinal studies. In Chapter 6 the problem of causality in observational longitudinal studies will be discussed, while Chapter 10 deals with the analysis of data from randomised controlled trials.

Table 1.1 Criteria for causality

Strength of the relationship
Consistency in different populations and under different circumstances
Specificity (cause leads to a single effect)
Temporality (cause precedes effect in time)
Biological gradient (dose–response relationship)
Biological plausibility
Experimental evidence

What then is the advantage of performing a longitudinal study? A longitudinal study is expensive, time consuming, and the data are difficult to analyse. If there are no advantages over cross-sectional studies why bother? The main advantage of a longitudinal study compared to a cross-sectional study is that the individual development of a certain outcome variable over time can be studied. In addition to this, the individual development of an outcome variable can be related to the individual development of particular covariates.

1.2 Study Design

Medical studies can be roughly divided into observational and intervention studies (see Figure 1.1). Observational studies can be further divided into case-control studies and cohort studies. Case-control studies are never longitudinal, in the way that longitudinal studies were defined in Section 1.1. The outcome variable (a dichotomous outcome variable distinguishing case from control) is measured only once. Furthermore, case-control studies are always retrospective in design. The outcome variable is observed at a certain time-point, and the covariates are measured retrospectively.

Figure 1.1 Schematic illustration of different medical study designs.

In general, observational cohort studies can be divided into prospective, retrospective and cross-sectional cohort studies. A prospective cohort study is the only cohort study that can be characterized as a longitudinal study. Prospective cohort studies are usually designed to analyse the longitudinal development of a certain outcome over time. It is argued that this longitudinal development concerns growth processes. However, in studies investigating the elderly, the process of deterioration is the focus of the study, whereas in other developmental processes, growth and deterioration can alternately follow each other. Moreover, in many studies one is interested not only in the actual growth or deterioration over time, but also in the longitudinal relationship between an outcome and several covariates. Intervention studies, e.g. randomised controlled trials, are by definition prospective, i.e. longitudinal. The outcome variable is measured at least twice (the classical pre-test, post-test design), and other intermediate measures are usually also added to the research design in order to evaluate short-term and long-term effects of the particular intervention.

1.2.1 Observational Longitudinal Studies

In observational longitudinal studies investigating individual development, each measurement taken on a subject at a particular time-point is influenced by three factors: (1) age (time from date of birth to date of measurement), (2) period (time or moment at which the measurement is taken), and (3) birth cohort (group of subjects born in the same year). When studying individual development, one is mainly interested in the age effect. One of the problems of most of the designs used in longitudinal studies of development is that the main age effect cannot be distinguished from the period and cohort effects.

There is an extensive amount of literature describing age, period and cohort effects (e.g. Lebowitz, Reference Lebowitz1996; Robertson et al., Reference Robertson, Gandini and Boyle1999; Holford et al., Reference Holford, Armitage and Colton2005). However, most of the literature deals with classical age–period–cohort models, which are used to describe and analyse trends in (disease-specific) morbidity and mortality (e.g. Kupper et al., Reference Kupper, Janis, Karmous and Greenberg1985; Mayer and Huinink, Reference Mayer, Huinink, Magnusson and Bergman1990; Holford, Reference Holford1992; McNally et al., Reference Mchunu, Mwambi, Reddy, Yende-Zuma and Naidoo1997; Robertson and Boyle, Reference Robertson and Boyle1998; Rosenberg and Anderson, Reference Rosenberg and Anderson2010). In this book, the main interests are the individual development over time, and the longitudinal relationship between an outcome and several covariates. In this respect, period effects or time of measurement effects are often related to a change in measurement method over time, or to specific environmental conditions at a particular time of measurement. A hypothetical example is given in Figure 1.2. This figure shows the longitudinal development of physical activity with age. Physical activity patterns were measured with a five-year interval, and were measured during the summer in order to minimise seasonal influences. The first measurement was taken during a summer with normal weather conditions. During the summer when the second measurement was taken, the weather conditions were extremely good, resulting in activity levels that were very high. At the time of the third measurement, the weather conditions were comparable to the weather conditions at the first measurement, and therefore the physical activity levels were much lower than those recorded at the second measurement. When all the results are presented in a graph, it is obvious that the observed age trend is highly biased by the period effect at the second measurement.

Figure 1.2 Illustration of a possible time of measurement effect (dotted line: real age trend, solid line: observed age trend).

One of the most striking examples of a cohort effect is the development of body height with age. There is an increase in body height with age, but this increase is highly influenced by the increase in height of the birth cohort. This phenomenon is illustrated in Figure 1.3. In this hypothetical study, two repeated measurements were carried out in two different cohorts. The purpose of the study was to detect the age trend in body height. The first cohort had an initial age of five years; the second cohort had an initial age of 10 years. At the age of five, only the first cohort was measured, at the age of 10, both cohorts were measured, and at the age of 15 only the second cohort was measured. The body height obtained at the age of 10 is the average value of the two cohorts. Combining all measurements in order to detect an age trend will lead to a much flatter age trend than the age trends observed in both cohorts separately.

Figure 1.3 Illustration of a possible cohort effect (dotted line: cohort specific, solid line: observed).

Both cohort and period effects can have an influence on the interpretation of results of longitudinal studies. An additional problem is that it is very difficult to disentangle the two types of effects. They can easily occur together. Logical considerations regarding the type of variable of interest can give some insight into the plausibility of either a cohort or a period effect. When there are (confounding) cohort or period effects in a longitudinal study, one should be careful with the interpretation of age-related results.

In studies investigating development, in which repeated measurements of the same subjects are performed, cohort and period effects are not the only possible confounding effects. The individual measurements can also be influenced by a changing attitude towards the measurement itself, a so-called test or learning effect. This test or learning effect, which is illustrated in Figure 1.4, can be either positive or negative.

Figure 1.4 Test or learning effects; comparison of repeated measurements of the same subjects with non-repeated measurements in comparable subjects (different symbols indicate different subjects, dotted line: cross-sectional, solid line: longitudinal).

One of the most striking examples of a positive test effect is the measurement of memory in older subjects. It is assumed that with increasing age, memory decreases. However, even when the time interval between subsequent measurements is as long as three years, an increase in memory performance with increasing age can be observed: an increase which is totally due to a learning effect (Dik et al., Reference Dik, Jonker, Comijs, Bouter, Twisk, van Kamp and Deeg2001).

1.3 General Approach

The general approach to explain the statistical methods covered in this book will be: the research question as basis for analysis. Although it may seem quite obvious, it is important to realise that a statistical analysis has to be carried out in order to obtain an answer to a particular research question. The starting point of each analysis will be a research question, and throughout the book many research questions will be addressed. The book is further divided into chapters regarding the characteristics of the outcome variable. Each chapter contains extensive examples, accompanied by computer output, in which special attention will be paid to the interpretation of the results of the statistical analyses.

1.4 Prior Knowledge

Although an attempt has been made to keep the (complicated) statistical methods as understandable as possible, and although the basis of the explanations will be the underlying research question, it will be assumed that the reader has some prior knowledge about (simple) cross-sectional statistical methods such as linear regression analysis, logistic regression analysis, and analysis of variance.

1.5 Example

In general, the examples used throughout this book are taken from the same longitudinal dataset. The dataset is taken from the Amsterdam Growth and Health Longitudinal Study, an observational longitudinal study investigating the longitudinal relation between lifestyle and health in adolescence and young adulthood (Kemper, Reference Kemper1995).

This dataset consists of a continuous outcome variable (serum cholesterol in mmol/liter) which is measured six times on the same subjects. In the examples, in general, two covariates are used. Body fatness, which is operationalised by the sum of the thickness of four skinfolds, is continuous and also measured six times on the same subjects and sex, which is dichotomous and which is measured only once and has the same value at all six repeated measurements.

In the chapter dealing with dichotomous outcome variables (i.e. Chapter 7), the continuous outcome variable cholesterol is dichotomised (i.e. the highest tertile versus the other two tertiles) and in the chapter dealing with categorical outcome variables (i.e. Chapter 8), the continuous outcome variable cholesterol is divided into three equal groups based on tertiles. Table 1.2 shows descriptive information for the variables used in the example.

Table 1.2 Descriptive information1 for the data used in most of the examples

Time-pointCholesterol (mmol/liter)Sum of skinfolds (cm)Sex
14.43 (0.67)3.26 (1.24)69/78
24.32 (0.67)3.36 (1.34)69/78
34.27 (0.71)3.57 (1.46)69/78
44.17 (0.70)3.76 (1.50)69/78
54.67 (0.78)4.35 (1.68)69/78
65.12 (0.92)4.16 (1.61)69/78

1 For cholesterol and sum of skinfolds, mean and between brackets standard deviation are given, while for sex the numbers (males/females) are given.

All the example datasets used throughout the book are available on request by .

1.6 Software

Most of the example analyses performed in this book are performed in STATA (version 17). However, SPSS (version 26) is also used for some of the example analyses. STATA is chosen as the main software package for the longitudinal data analyses, because almost all statistical analyses can be performed in STATA and because of the simplicity of the syntax and the output. In Chapter 13, an overview (and comparison) will be given of other software packages such as R (version 4.0.3) and SAS (version 8). In all these packages, algorithms to perform longitudinal data analysis are implemented in the main software. Both syntax and output will accompany the overview of the different software packages.

1.7 Data Structure

It is important to realise that different statistical software packages need different data structures in order to perform longitudinal data analyses. In this respect a distinction must be made between a long data structure and a broad data structure. In a long data structure, each subject has as many data records as there are measurements over time, while in a broad data structure each subject has one data record, irrespective of the number of measurements over time (see Table 1.3).

Table 1.3 Illustration of two different data structures

Broad data structure
IdYt1Yt2Yt3X1t1X1t2X1t3X2
13581014161
22491315151
34671213160
Long data structure
IdYX1X2Time
131011
151412
181613
221311
241512
291513
341201
361302
371603

1.8 What is New in the Third Edition?

In addition to changes made throughout the book to update the material and to make some of the explanations clearer, some new chapters have been added. In the new Chapter 5, hybrid models are introduced. Hybrid models are used to disentangle the between- and within-subjects interpretation of the regression coefficient obtained from a longitudinal data analysis. The new Chapter 6 contains a discussion regarding causality in observational longitudinal studies, while in the new Chapter 9, the analysis of outcome variables with floor or ceiling effects is discussed. In Chapter 10, ‘Analysis of Longitudinal Intervention Studies’, three new sections have been added: one section about an alternative repeated measures analysis to take into account regression to the mean; one section about the analysis of data from a stepped wedge trial design; and one section about the difference in difference method.

Figure 0

Table 1.1 Criteria for causality

Figure 1

Figure 1.1 Schematic illustration of different medical study designs.

Figure 2

Figure 1.2 Illustration of a possible time of measurement effect (dotted line: real age trend, solid line: observed age trend).

Figure 3

Figure 1.3 Illustration of a possible cohort effect (dotted line: cohort specific, solid line: observed).

Figure 4

Figure 1.4 Test or learning effects; comparison of repeated measurements of the same subjects with non-repeated measurements in comparable subjects (different symbols indicate different subjects, dotted line: cross-sectional, solid line: longitudinal).

Figure 5

Table 1.2 Descriptive information1 for the data used in most of the examples

Figure 6

Table 1.3 Illustration of two different data structures

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Introduction
  • Jos W. R. Twisk, Amsterdam University Medical Centers
  • Book: Applied Longitudinal Data Analysis for Medical Science
  • Online publication: 20 April 2023
  • Chapter DOI: https://doi.org/10.1017/9781009288002.002
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Introduction
  • Jos W. R. Twisk, Amsterdam University Medical Centers
  • Book: Applied Longitudinal Data Analysis for Medical Science
  • Online publication: 20 April 2023
  • Chapter DOI: https://doi.org/10.1017/9781009288002.002
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Introduction
  • Jos W. R. Twisk, Amsterdam University Medical Centers
  • Book: Applied Longitudinal Data Analysis for Medical Science
  • Online publication: 20 April 2023
  • Chapter DOI: https://doi.org/10.1017/9781009288002.002
Available formats
×