Background
The initial phase of the Boston University Twin Project (BUTP) was a multisituation, multimethod, longitudinal investigation of genetic influences on the temperament dimension of activity level and related behaviors in early childhood. The sample and methods were described in Saudino and Asherson (Reference Saudino and Asherson2013). Our use of multiple measures of activity level (actigraphs, parent ratings and observer ratings) in this early BUTP study has provided unique evidence of situation-specific (Saudino & Zapfe, Reference Saudino and Zapfe2008), measure-specific (Saudino, Reference Saudino2009) and age-specific (Saudino, Reference Saudino2012) genetic influences on activity level; and genetic links between activity level and hyperactivity (Ilott et al., Reference Ilott, Saudino, Wood and Asherson2010; Ilott, Saudino, & Asherson, Reference Ilott, Saudino and Asherson2010), attention problems (Saudino et al., Reference Saudino, Wang, Flom and Asherson2018) and shyness (Frazier-Wood & Saudino, Reference Frazier-Wood and Saudino2017). We have also explored genetic and environmental contributions to a number of novel phenotypes in early childhood including elicited imitation (Fenstermacher & Saudino, Reference Fenstermacher and Saudino2007), inhibitory control (Gagne & Saudino, Reference Gagne and Saudino2016), callous/unemotional behaviors (Flom & Saudino, Reference Flom and Saudino2017), autistic-like traits (Edelson & Saudino, Reference Edelson and Saudino2009) and positive affect (Flom et al., Reference Flom, Wang, Uccello and Saudino2018). Although data collection for this initial sample was completed in 2007, the dataset is still yielding novel contributions to the literature that relate to child activity and behavioral outcomes (e.g., Flom et al., Reference Flom, White, Ganiban and Saudino2019).
Here, we introduce phase II of the BUTP, which involves a new preschool sample longitudinally assessed at ages 3, 4 and 5 years. The second phase of the BUTP builds on our prior work but has a broader focus and answers very different developmental questions. While we still include measures of activity level in this study, our emphasis is on the development of temperament more broadly and genetic and environmental contributions to growth in temperament. We are interested in understanding the factors that underlie variation in developmental trajectories of both, temperament and parenting, and their links with developmental outcomes. Data collection for this study began in 2012 and ended in 2018. This study has yielded a rich dataset that includes multiple dimensions of child temperament and positive and negative emotional/behavioral outcomes, along with parenting assessed at three time points, using a multimethod approach. Data analyses exploring genetic and environmental contributions to growth in these domains are underway. In this article, we provide a detailed description of the sample, study procedures and measures in the hopes of sparking future collaborations.
Recruitment and Sample Characteristics
As with the first phase of the BUTP, twins were recruited from birth records supplied by the Massachusetts Registry of Vital Records. Twins with birth weights less than 1750 g, gestational age less than 34 weeks or with known developmental or health problems (e.g., autism, Down syndrome) were excluded. The average age of mother at time of birth was 34.4 years (range 20.5–48.4). Forty-seven percent of the mothers had some form of fertility treatment, which likely reflects the fact that Massachusetts is the state with the highest rate of assisted reproductive technology births, and is consistent with the demographics of our sample (Pew Research, 2018).
Sample
Table 1 summarizes the twin sample at each age. All twins within a pair were the same sex; this ensured that sex differences between dizygotic twin siblings did not contribute to behavioral differences between siblings. Three hundred and ten pairs of twins participated in the age 3 laboratory assessments; of these, 286 pairs (92.3%) were assessed again at age 4 and 274 pairs (88.4%) at age 5. Five of the families who did not return at age 4 and 7 of those who did not return at age 5 completed questionnaire data, thus we have some longitudinal data on 291 pairs at age 4 and 281 pairs at age 5. Although the sample was predominately Caucasian (89.6%), ethnicity was generally representative of the Massachusetts population (1.6% Black, 1.9% Asian, 6.2% Mixed and 6.2% Hispanic or Latino). The parents were highly educated, with over 50% of the primary caregivers having a bachelor’s degree or higher. Socioeconomic status (SES) was primarily middle to upper-middle class, but ranged from low to high SES.
MZ=Monozygotic; DZ=Dizygotic.
a N = 291 pairs including five families with questionnaire data, but no age 4 laboratory visit.
b N = 281 pairs including seven families with questionnaire data, but no age 5 laboratory visit.
Study Procedure
Overview
Twins and one primary caretaker (95% mothers) visited the BUTP laboratory within approximately 1 month of the twins’ 3rd, 4th and 5th birthdays. Each assessment lasted 2½–3 h, during which the twins participated in a number of structured situations designed to assess multiple facets of child temperament and parent–child interactions. Observational measures and standardized tests assessing cognitive abilities, preschool readiness and prosocial behavior were also obtained. Tasks were arranged into four blocks and organized to minimize cognitive fatigue and/or negative affective carryover from one block to the next. Blocks were counterbalanced across first- and second-born twins. All assessments were video recorded for later behavioral coding. Within a twin pair, twins were individually assessed by different testers and behavioral ratings from video recordings were made by different coders. In addition to our behavioral assessments, at all ages, parents completed a battery of questionnaires designed to inform about child temperament, behavior problems, family characteristics and demographics. All procedures were approved by the Boston University Institutional Review Board, and primary caregivers provided informed consent.
Zygosity
At age 3, cheek scrapings were used to obtain DNA samples from twins. DNA extraction was performed at the Institute of Psychiatry (London, UK). Zygosity was determined via DNA analyses by genotyping 10 highly polymorphic simple sequence repeat markers in each member of a twin pair. For 10 families who declined to provide DNA samples, zygosity was determined using parents’ responses to physical similarity questionnaires.
Feedback to participants
With the exception of information about twins’ zygosity based on DNA analyses, families do not receive specific information about their children. However, the BUTP publishes annual newsletters, which are sent to all families who have participated in any of our research projects. The newsletter’s intent is to disseminate to a general (i.e., nonscientific) audience new information regarding our research findings.
Measures
Unless otherwise noted, the same measures were administered to twins at each age. This ensured that any observed changes in behavior were not due to methodological differences across age. Measures marked with an asterisk were also included in our Phase I sample and allow the possible combination of data across samples.
Laboratory-assessed temperament
The Laboratory Temperament Assessment Battery — Preschool Version (Lab-TAB; Goldsmith et al., Reference Goldsmith, Reilly, Lemery, Longley and Prescott1995) was used to assess temperament within standardized and structured situations. The Lab-TAB Fear/Anger episodes (Stranger Approach and Imperfect Circles), Exuberance episodes (Popping Bubbles and Surprise), Activity episodes* (Corral of Balls, Arc of Toys and Fidgeting Video) and Interest/Persistence episodes (Bead Sorting and Coffee Pot) were used to elicit specific temperament behaviors. For each of these episodes, trained observers coded the video-recorded data for the dimensions of negative affect, positive affect, attention, persistence and social engagement using a global five-point behavioral rating of each based on the Bayley Behavior Rating Scale (Bayley, Reference Bayley2006). Summary scores for each dimension were obtained by averaging the behavioral ratings across the nine Lab-TAB episodes. Activity level was assessed with Minimitter actical actigraphs*, attached one per limb by means of Tyvek wristbands. A composite activity score was formed based on the mean of the four limb scores. A laboratory-based measure of inhibitory control was also obtained using the Flanker Test from the NIH Toolbox: Early Childhood Cognitive Battery (Zelazo et al., Reference Zelazo, Anderson, Richler, Wallner-Allen, Beaumont and Weintraub2013; see ‘Cognitive abilities’ below).
Tester-rated temperament
Following the laboratory visits, testers of each twin completed the Infant Behavior Record* (IBR; Bayley, Reference Bayley1969) to obtain behavioral ratings of temperament based on behaviors observed across the entire laboratory visit (i.e., including cognitive testing and other non-Lab-TAB activities). Factor analysis of the IBR has yielded three temperament dimensions: Activity, Affect/Extraversion and Task Orientation (Matheny, Reference Matheny1983). Because the IBR is a frequently used observer-rated measure of temperament in behavioral genetic research, including the earlier BUTP sample (e.g., Frazier-Wood & Saudino, Reference Frazier-Wood and Saudino2017), the inclusion of this measure in the present study allows us to compare our data with previous findings and will help to address issues of replication.
Parent reports of temperament
Parents rated the temperament characteristics of each twin with the Child Behavior Questionnaire — Short Form (CBQ-SF; Putnam & Rothbart, Reference Putnam and Rothbart2006). The CBQ-SF assesses 15 dimensions of temperament, including Positive Anticipation, High-Intensity Pleasure, Smiling/Laughter, Activity Level, Impulsivity, Shyness, Discomfort, Fear, Anger/Frustration, Sadness, Soothability, Inhibitory Control, Approach, Attentional Focusing, Low-Intensity Pleasure and Perceptual Sensitivity, as well as three superfactors, Surgency, Negative Affectivity and Effortful Control.
Observed parent–child interaction
The primary caretaker was video recorded while separately interacting with each twin during an Etch-A-Sketch drawing task, a free play session, and during clean-up (approximately 10 min total). These parent–child interaction tasks have been widely used in studies in early and middle childhood (e.g., NICHD Study of Early Child Care and Youth Development). Observations were coded for parent behaviors (positive control, negative control, positive affect, negative affect and responsiveness), child behaviors (positive affect, negative affect, responsiveness, compliance, autonomy, on-task behavior and activity level) and dyadic interaction (reciprocity, conflict and cooperation) using the Parent–Child Interaction System (Deater-Deckard et al., Reference Deater-Deckard, Pylas and Petrill1997).
Parent reports of parenting behaviors
Parent reports of parenting were based on measures used in the Twins Early Development Study at similar ages (see Knafo & Plomin, Reference Knafo and Plomin2006). Parent positive and negative affects toward each twin were assessed using the Parent Feelings Questionnaire* (Deater-Deckard, Reference Deater-Deckard1996). A measure of harsh discipline* was obtained via a widely used semistructured interview modified to a parent-report format (Deater-Deckard, Reference Deater-Deckard2000; Deater Deckard et al., Reference Deater Deckard, Dodge, Bates and Petit1996). For each twin, parents rated the frequency of use for a variety of discipline strategies, yielding a child-specific global rating of harshness of discipline.
Behavior problems
Parents reported on their twins’ behavior problems using the Child Behavior Checklist for Ages 1½–5* (CBCL; Achenbach & Rescorla, Reference Achenbach and Rescorla2000). In addition to the traditional CBCL scoring yielding three higher order scales (Internalizing, Externalizing and Total Behavior problems), seven syndrome scales (Emotionally Reactive, Anxious/Depressed, Somatic Complaints, Withdrawn, Attention Problems, Aggressive Behavior and Sleep Problems) and seven DSM-oriented scales (Affective Problems, Anxiety Problems, Pervasive Developmental Problems, Attention-Deficit/Hyperactivity Problems, Stress Problems, Autism Spectrum Problems and Oppositional Defiant Problems), we included scales of Callous/unemotional behaviors (Willoughby et al., Reference Willoughby, Waschbusch, Moore and Propper2011) and Irritability (Wiggins et al., Reference Wiggins, Mitchell, Stringaris and Leibenluft2014). Parents also reported twins’ behavior problems on the Strengths and Difficulties Questionnaire (SDQ; Goodman, Reference Goodman1997), which yields information on emotional symptoms, conduct problems, hyperactivity/inattention and peer relationship problems.
Prosocial behaviors
Parent ratings of twins’ prosocial behaviors (e.g., shares, considerate, kind, caring) were obtained via the Prosocial subscale of the SDQ. A sharing task provided an observational measure of prosocial behavior. This task, widely used in studies of child prosocial behavior (e.g., Blake & Rand, Reference Blake and Rand2010), is a child version of the dictator game where children are presented with 10 stickers and told that they can do whatever they want with them, keep them all or give some or all of the stickers to their twin. The number of stickers given to their co-twin indexed sharing.
Academic readiness
The Bracken School Readiness Assessment — Third Edition (Bracken, Reference Bracken2007) provided a standardized measure of academic readiness. This test assesses knowledge of color, letters, numbers/counting, sizes, comparisons and shapes in children from 3 to 7 years and is a good predictor of student outcomes (Panter & Bracken, Reference Panter and Bracken2009).
Cognitive abilities
The NIH Toolbox Early Childhood Cognitive Battery (Zelazo et al., Reference Zelazo, Anderson, Richler, Wallner-Allen, Beaumont and Weintraub2013) was used to assess executive functioning (inhibitory control and set shifting), receptive vocabulary and episodic memory. This battery, recommended for ages 3–6, is a series of computerized game-like tasks that include the Flanker, Dimensional Change Card Sort, Picture Vocabulary and Picture Sequence Memory subtests. The Flanker task assesses inhibitory control and attention by asking the child to focus on a target stimulus while inhibiting attention to stimuli flanking it. The Dimensional Change Card Sort is a set-shifting task that requires children to match a series of bivalent test pictures (e.g., yellow balls and blue trucks) to target pictures, first according to one dimension (e.g., color) and then, after a number of trials, according to the other dimension (e.g., shape). In the Picture Vocabulary task (receptive vocabulary), children were presented with a recording of a word and four photographic images on the computer screen and asked to select the picture that most closely matched the meaning of the word (Gershon et al., Reference Gershon, Slotkin, Manly, Blitz, Beaumont, Schnipke and Weintraub2013). In the Picture Sequencing task, children were shown an arbitrary ordering of pictures and asked to reproduce the sequence (Bauer et al., Reference Bauer, Dikmen, Heaton, Mungas, Slotkin and Beaumont2013).
Household chaos
Parent perceptions of environmental confusion in the home were obtained using the Confusion, Hubbub, and Order Scale (Matheny et al., Reference Matheny, Wachs, Ludwig and Phillips1995). This brief measure assesses the degree of organization and calmness in the twins’ home.
Height* and weight*
Children were weighed on a digital scale and measured with a stadiometer wearing light indoor clothing and no shoes.
Hair cortisol (age 5 only)
Testers, using sterilized hair scissors, cut 30 mg of hair from each child’s posterior vertex as close as possible to the scalp. The 3 cm closest to the scalp was assayed for cortisol levels at the University of Massachusetts, Amherst. Human scalp hair grows at a rate of approximately 1 cm /month, so the 3-cm sample serves as an index of chronic cortisol output over the past 3 months.
Demographics*
Parents completed a demographic questionnaire regarding race/ethnicity, family composition, parent education and occupation, pregnancy and birth, and twins’ physical similarities, health and daycare.
Conclusions
The second phase of the BUTP comprises 1740 individual comprehensive laboratory-based assessments and approximately 5000 h of behavioral observations and has yielded a vast amount of data. This unique dataset allows us to address important questions regarding the implications and etiology of developmental change in temperament in preschoolers. We are currently planning to follow up this second cohort of twins pending funding.
Financial support
The Boston University Twin Project (BUTP) is supported by grants MH062375 and HD068435 to Dr. Saudino.