Intelligence is one of the most frequently studied human behavioral traits. Over the past century it has motivated research across a diverse range of fields, including not only the behavioral sciences, but also genetics, neuroscience, molecular biology, and economics. It is one of the strongest known determinants of major life outcomes, such as educational attainment, occupational success, health and longevity (Deary et al., Reference Deary, Whiteman, Starr, Whalley and Fox2004; Gottfredson, Reference Gottfredson1997; Gottfredson & Deary, Reference Gottfredson and Deary2004; Neisser et al., Reference Neisser, Boodoo, Bouchard, Boykin, Brody, Ceci and Sternberg1996; Schmidt & Hunter, Reference Schmidt and Hunter2004). Over the past several decades, developments in multivariate statistical modeling coupled with the availability of large data sets collected in twins and relatives have allowed for the examination of the genetic and environmental etiology of individual differences in intelligence, and the more recent advances in genotyping and DNA sequencing have enabled the search the for specific genetic variants underlying the observed variation (e.g., Benyamin et al., Reference Benyamin, Pourcain, Davis, Davies, Hansell, Brion and Miller2014; Davies et al., Reference Davies, Tenesa, Payton, Yang, Harris, Liewald and Deary2011; Franić et al., Reference Franić, Groen-Blokhuis, Dolan, Kattenberg, Xiao, Scheet and Boomsma2013; Najmabadi et al., Reference Najmabadi, Hu, Garshasbi, Zemojtel, Abedini, Chen and Ropers2011).
The findings emerging from twin and family studies have univocally indicated: (1) a role of genetic factors in the etiology of intelligence (e.g., Bouchard & McGue, Reference Bouchard and McGue1981; Deary et al., Reference Deary, Spinath and Bates2006; Plomin & Spinath, Reference Plomin and Spinath2004; Plomin et al., Reference Plomin, Defries, McClearn and McGuffin2008), and (2) an age-dependent pattern of heritability, with individual differences in late adolescence and adulthood being more strongly influenced by genetic factors than those in childhood (the heritability estimates typically ranging from ~20% in infancy to ~40–50% in middle childhood and ~60–80% in adulthood; e.g., Bartels et al., Reference Bartels, Rietveld, Van Baal and Boomsma2002; Bishop et al., Reference Bishop, Cherny, Corley, Plomin, DeFries and Hewitt2003; Boomsma & van Baal, Reference Boomsma and van Baal1998; Deary et al., Reference Deary, Spinath and Bates2006; Haworth et al., Reference Haworth, Wright, Luciano, Martin, De Geus, Van Beijsterveldt and Davis2009; Hoekstra et al., Reference Hoekstra, Bartels and Boomsma2007; McGue et al., Reference McGue, Bouchard, Iacono, Lykken, Plomin and McClearn1993; Petrill et al., Reference Petrill, Lipton, Hewitt, Plomin, Cherny, Corley and DeFries2004; Plomin, Reference Plomin1986; Polderman et al., Reference Polderman, Gosso, Posthuma, van Beijsterveldt, Heutink, Verhulst and Boomsma2006). Environmental factors that contribute to similarity between family members (e.g., shared family environment) typically decline in etiological relevance throughout childhood and adolescence, while environmental factors that facilitate differentiation between family members appear to play a persistently modest to moderate role (e.g., Bartels et al., Reference Bartels, Rietveld, Van Baal and Boomsma2002; Boomsma & van Baal, Reference Boomsma and van Baal1998; Haworth et al., Reference Haworth, Wright, Luciano, Martin, De Geus, Van Beijsterveldt and Davis2009). The temporal stability of intelligence (i.e., the conservation of the rank order of individuals over time) is estimated to be fairly high, with around 45–60% of the variance in childhood being preserved over any given ~2-year interval (e.g., Bartels et al., Reference Bartels, Rietveld, Van Baal and Boomsma2002). This continuity in the observed individual differences is attributable predominantly to genetic factors, that is, to the expression of a single set of genes throughout development (e.g., Bartels et al., Reference Bartels, Rietveld, Van Baal and Boomsma2002; Bishop et al., Reference Bishop, Cherny, Corley, Plomin, DeFries and Hewitt2003; Eaves et al., Reference Eaves, Long and Heath1986; Hoekstra et al., Reference Hoekstra, Bartels and Boomsma2007; Petrill et al., Reference Petrill, Lipton, Hewitt, Plomin, Cherny, Corley and DeFries2004; Rietveld et al., Reference Rietveld, Dolan, Van Baal and Boomsma2003). In addition to contributing to stability, genetic factors also generate change: age-specific genetic factors emerge at different ages, partly accounting for the lack of complete temporal stability. Environmental influences shared among family members, insofar as they are relevant, contribute mostly to stability, whereas the unshared environment contributes predominantly to change.
The aim of the present study is to contribute to the existing body of literature by providing one of the most comprehensive examinations of the genetic and environmental etiology of the observed stability of intelligence to date. We analyzed longitudinal twin data collected in four different studies on a total of 1,748 twins, measured across a developmental period spanning childhood and adolescence (5–18 years of age). In contrast to many of the previous examinations of the genetic and environmental stability of intelligence (but see Hoekstra et al., Reference Hoekstra, Bartels and Boomsma2007; Rietveld, et al., Reference Rietveld, Dolan, Van Baal and Boomsma2003), we examine the stability of verbal and non-verbal abilities separately. In addition, we examine the stability of general intelligence (e.g., Jensen, Reference Jensen1998; Spearman, Reference Spearman1904). Because the choice of the psychometric instrument used to assess intelligence is inevitably dependent on the age of the participant, and because we combined data from four different studies (comprising 14 different subprojects), there is considerable heterogeneity in the measurement instrument used to assess intelligence across the different samples and ages. This is not dissimilar to the situation in many other data registries, where longitudinal measures are often collected using different instruments across the life span. In twin registries in particular, this issue becomes especially prominent in the context of gene-finding studies (e.g., Flint, Reference Flint2013; Goldstein et al., Reference Goldstein, Allen, Keebler, Margulies, Petrou, Petrovski and Sunyaev2013; Visscher et al., Reference Visscher, Brown, McCarthy and Yang2012), where specific genetic variants contributing to the variation in the observed trait (i.e., the phenotype) are sought. Here, the definition of the ‘observed trait’, or phenotype, is of considerable relevance (e.g., van der Sluis et al. Reference van der Sluis, Verhage, Posthuma and Dolan2010): How does one define a single ‘observed trait’ to be used in the analyses, given multiple measures over time? The presence of longitudinal data collected using different psychometric instruments allows us to address the auxiliary question of how to optimally utilize the existing twin registry data on intelligence in the context of gene finding studies (i.e., to examine whether data summarization is likely to diminish the power to detect genetic effects; see e.g. van der Sluis, Verhage, Posthuma, & Dolan, Reference van der Sluis, Verhage, Posthuma and Dolan2010; Minică, Dolan, Kampert, Boomsma, & Vink, Reference Minică, Dolan, Kampert, Boomsma and Vink2014; Medland & Neale, Reference Medland and Neale2010).
In summary, the present study aims to: (1) assess the observed stability of verbal abilities, non-verbal abilities, and general intelligence, and (2) study the observed stability as a function of the underlying genetic and environmental factors. The structure of the dataset allows for an evaluation of how the results replicate and integrate across the different samples, and the presence of measures collected using multiple psychometric instruments allows us to address the practical question of how to optimally utilize the existing data in the context of gene-finding studies. Although the terms ‘intelligence’ and ‘cognitive ability’ have each been given a multitude of definitions (e.g., Jensen, Reference Jensen1998; Spearman, Reference Spearman1904), in the present article we use the two terms interchangeably.
Materials and Methods
Sample
The data were obtained from the Young Netherlands Twin Register (YNTR; van Beijsterveldt et al., Reference van Beijsterveldt, Groen-Blokhuis, Hottenga, Franic, Hudziak, Lamb and Schutte2013). The YNTR is a population-based register of Dutch twins born after 1986, recruited at birth and measured longitudinally at ages 1 through 18. The sample consisted of 1748 twins (including 872 complete twin pairs; 399 monozygotic (MZ) and 473 dizygotic (DZ)), and comprised four longitudinally measured subsamples (sample sizes: 544, 226, 552, and 426 individuals). A detailed structure of the data is given in Figure S1 (see Supplementary Material). The twins were measured longitudinally at ages 5–18. This generated 4,641 data points in total: 1,946, 808, 1,076, and 811 data points were available for the four subsamples, respectively. 47.5% of the participants were male.
Measures
Cognitive abilities were assessed longitudinally, using the Revised Amsterdam Children Intelligence Test (RAKIT; Bleichrodt et al., Reference Bleichrodt, Drenth, Zaal and Resing1984), Wechsler Intelligence Scale for Children (WISC-R and WISC-III; Sattler, Reference Sattler1992; Van Haasen et al., Reference Van Haasen, De Bruyn, Pijl, Poortinga, Lutje-Spelberg, Vander Steene and Stinissen1986; Wechsler et al., Reference Wechsler, Kort, Compaan, Bleichrodt, Resing and Schittkatte2002), Raven's Standard and Advanced Progressive Matrices (SPM, APM; Raven, Reference Raven1960; Raven et al., Reference Raven, Raven and Court1998), and the Wechsler Adult Intelligence Scale (WAIS; Stinissen et al., Reference Stinissen, Willems, Coetsier and Hulsman1970; Wechsler, Reference Wechsler1997), the choice of test being largely dependent on the age of the participants. Subscale scores were derived following the guidelines in the tests’ manuals (Bleichrodt et al., Reference Bleichrodt, Drenth, Zaal and Resing1984; Sattler, Reference Sattler1992; Stinissen et al., Reference Stinissen, Willems, Coetsier and Hulsman1970; Van Haasen et al., Reference Van Haasen, De Bruyn, Pijl, Poortinga, Lutje-Spelberg, Vander Steene and Stinissen1986; Wechsler Reference Wechsler1997; Wechsler et al., Reference Wechsler, Kort, Compaan, Bleichrodt, Resing and Schittkatte2002): for RAKIT, a verbal (V) and a non-verbal (NV) score were defined; for the WISC and the WAIS, the Verbal Comprehension Index (VCI), Perceptual Organization Index (POI), and Freedom from Distractibility Index (FDI) were defined. For Raven's SPM and APM, the total score (defined as the total number of items answered correctly) was used in the analyses. Because the variances of the subscale scores across the different tests were quite heterogeneous in magnitude, to ease subsequent computation we standardized by dividing each variable by the product of its standard deviation and √5. This resulted in variances of an equal order of magnitude across the different tests.
Analyses
Genetic covariance structure modeling (Martin & Eaves, Reference Martin and Eaves1977) is the application of structural equation modeling (Bollen, Reference Bollen1989; Kline, Reference Kline2005) to data collected in genetically informative samples, such as samples of twins (Franić et al., Reference Franić, Dolan, Borsboom, Boomsma and Hoyle2012; Neale & Cardon, Reference Neale and Cardon1992). In the classical twin design, the sample consists of MZ and DZ twin pairs. DZ twins share 50% of their segregating genes on average, while MZ twins share nearly their entire genome (Falconer & Mackay, Reference Falconer and Mackay1996; van Dongen et al., Reference van Dongen, Draisma, Martin and Boomsma2012). The covariance structure of the phenotypes (i.e., observed traits) is typically modeled as a function of latent factors representing several sources of individual differences: additive genetic (A), shared environmental (C), and individual-specific environmental (E) sources.Footnote 1 Additive genetic influences are modeled by one or more A factors, which represent the total additive effects of genes relevant to the phenotype. Based on quantitative genetic theory (Falconer & Mackay, Reference Falconer and Mackay1996; Mather & Jinks, Reference Mather and Jinks1971), the A factors are expected to correlate 1 across MZ twins and 0.5 across DZ twins. Environmental influences affecting the phenotype of both twins in an identical way, thereby increasing their similarity beyond what is expected based on genetic resemblance alone, are represented by one or more C factors. Therefore, by definition, the C factors correlate unity across twins (regardless of zygosity). All environmental influences causing phenotypic differences among family members are represented by one or more E factors. Thus, by definition, the E factors are uncorrelated across twins. Assuming an ACE model, the expected covariance structure in a multivariate twin model is thus:
where, given p phenotypes, Σ 11 (Σ 22) is the p × p covariance matrix of twin 1 (twin 2), Σ 12 (Σ 21) is the twin 1 – twin 2 p × p covariance matrix, and Σ A, Σ C and Σ E are the additive genetic, shared environmental, and unique environmental p × p covariance matrices, respectively. The coefficient rA is the correlation between the additive genetic factors in twin 1 and twin 2 (1 in MZ and 0.5 in DZ twins).
In the present study, the temporal stability of intelligence (i.e., the stability of individual differences in performance on intelligence tests over time) and the temporal stability of genetic and environmental influences on intelligence (i.e., the degree to which the observed stability is attributable to the continuity of the genetic/environmental factors that affect intelligence over time) were modeled using the simplex model (Guttman, Reference Guttman and Lazarsfeld1954; Jöreskog, Reference Jöreskog1970). An example of a simplex model is depicted in Figure 1. In this model, the data at occasion t (t = 1. . .T) are regressed on data at the preceding measurement occasion (t-1), and the regression coefficient βt,t-1 obtained in this regression is used as an indicator of temporal stability. For instance, a high β in the regression of verbal abilities at age 7 on verbal abilities at age 5 would indicate that the individual differences in verbal abilities are highly stable across this age span, that is, that the rank order of individuals is largely preserved. Thus, the variance of a measure at a given time point is modeled as a function of factors that are stable over time (e.g., the variance at time point t is a function of the variance at time point t-1 and of the regression coefficient βt,t-1: σ2 t-1*βt,t-1 2) and newly emerging factors that affect the phenotype at the given time point but were absent at the preceding time point. The variance of a measure at time point t can thus be expressed as: σ2 t = βt,t-1 2*σ2 t-1 + ζt, where ζt denotes the variance due to innovation. A high βt,t-1 in combination with low ζt indicates high temporal stability; conversely, a low βt,t-1 and a high ζt indicate low stability, implying that the factors relevant to the phenotype at time t-1 decrease in relevance by time t, and newly emerging factors gain in relevance.
In a simplex model with p observed variables, the expected p × p covariance matrix Σ equals (I – B)−1 Ψ (I – B)−1t, where I is a p × p identity matrix, B is a p × p matrix containing the autoregressive coefficients (βs) in the model, and Ψ is a p × p matrix containing the variances and covariances (for the first measurement occasion) and the residual variances and covariances (for all the subsequent measurement occasions) of the observed variables. Thus, for the first three measurement occasions in Figure 1:
where σ2 denotes variance, ζ denotes residual variance, and c denotes (residual) covariance. Further,
where βt,t-1 is the regression coefficient in the linear regression of a variable at time t on a variable at time t-1 (e.g., βV7V5 denotes the regression of variable V7 on variable V5).
To assess the contributions of genes and the environment to the observed stability and change in intelligence scores, a genetic adaptation of the simplex model was used (Boomsma & Molenaar, Reference Boomsma and Molenaar1987; Boomsma et al., Reference Boomsma, Martin and Molenaar1989; Franić et al., Reference Franić, Dolan, Borsboom, Boomsma and Hoyle2012; Neale & Cardon, Reference Neale and Cardon1992). In genetic adaptations of the simplex model, in contrast to modeling a single time series, the phenotype is modeled as a function of several (genetic and environmental) latent time series. For instance, in a model containing only additive genetic and unique environmental latent factors (AE model; Figure 2 Footnote 2 ), the phenotypic variable V measured at age t, Vt, is related to the additive genetic and unshared environmental factors At and Et (t = 1,. . .,T), and simplex models, or first order autoregressions, are specified to account for the stability and change at the level of A and E (e.g., σ2 Et = βEt,t-1 2*σ2 Et-1+ζEt).Footnote 3 The expected covariance structure of the phenotype(s) is thus:
where (assuming the latent factors are expressed on the same scale as the phenotype) the covariance matrices Σ A and Σ E are modeled as follows:
This means that one can assess the contributions of genetic and environmental factors to the observed stability and the change in stability. The phenotypic covariance between consecutive time points may be due to genetic influences (βAt,t-1 ≠ 0), environmental influences (βEt,t-1 ≠ 0), or both (βAt,t-1 ≠ 0 and βEt,t-1 ≠ 0). Likewise, any lack of stability may be due to either or both sources of individual differences. For instance, intermediate phenotypic stability (e.g., a correlation of 0.5) may be due to perfect genetic stability (ζAt = 0), in combination with complete environmental instability (βEt,t-1 = 0).
The present analyses were designed to examine the degree of phenotypic stability of intelligence, and assess the contributions of genes and the environment to the observed stability and change. This was achieved by fitting simplex models (described in the Supplementary Material) to intelligence tests subscale scores: RAKIT V and NV, WISC and WAIS VCI, POI, and FDI, and Raven sum scores. In addition to modeling the subscale scores, the stability of general cognitive ability (i.e., g; Jensen, Reference Jensen1998; Spearman, Reference Spearman1904) was assessed. The g factor was defined as a first-order factor underlying performance on the different subscales at a given age, and the temporal stability of g was examined on both the phenotypic, and the genetic and environmental level. Thus, overall, four different types of models were fitted: (a) phenotypic simplex models, (b) phenotypic simplex models with a g factor, (c) ACE simplex models, and (d) ACE simplex models with a g factor. These models were fitted to each of the four samples separately, resulting in 16 distinct sets of results. To accommodate for any possible mean differences across the sexes, means were modeled separately for males and females in all analyses.
Results
For concision, the results pertaining to Sample 1 are presented in detail, while the results pertaining to the other three samples are summarized and discussed in view of their compatibility with the results in Sample 1. The full list of results (i.e., the parameter estimates obtained for all the four samples) is given in the Supplementary Material. Figure 3 displays the results obtained for Sample 1. For ease of interpretation, the results we present are fully standardized; that is, the variance of each (observed and latent) variable is 1. Stability is expressed as the proportion of variance of a variable at age t explained by the variables at t-1; this proportion is easily obtainable by subtracting the magnitude of innovation variance from the total variance, that is, as 1-ζt. A different standardization (allowing for a comparison of the relative magnitude of the A, C, and E variance components) is presented at the end of the Results section. The stability of the different subscales at a given age was largely comparable; thus, whenever possible, we describe general trends. When this is not warranted, we address the stability of the subscales separately.
Phenotypic Simplex Model
The temporal stability of intelligence subscales, as assessed using a phenotypic simplex model (upper left panel, Figure 3), is in the intermediate range, varying from 34% to 66% in Sample 1. Averaging over the subscale stabilities at each given age gives the mean stabilities of 38%, 43%, 43%, and 54% at the age intervals 5–7, 7–10, 10–12, and 12–18, respectively, indicating that the phenotypic stability of intelligence increases with age. This is especially evident if one considers that the time interval between the last two measurement points (ages 12 to 18) is more than twice the average time interval between the remaining consecutive measurement points, and that the correlation between measurement points is expected to decrease as an exponential function of their temporal distance, given equal stability over time. Thus, with stability being constant over age, one would expect a drop in the stability estimate from the observed 43% in the 10–12 interval to around 3.5% in the 12–18 interval; however, the actual stability estimate in the 12–18 interval is a high 54%, indicating a sharp increase in stability over this period. The cross-lag regression coefficients (e.g., RAKIT V to RAKIT NV) were generally small in magnitude compared to the main regression coefficients (e.g., RAKIT V to RAKIT V); estimates of variance explained by any single cross-lagged relationship ranged from 0.2% to 5.8% (see Figure 3 for estimates). Notably, the stability remained moderate to high despite the use of different tests (RAKIT, WISC, and WAIS).
In Sample 2, the average subscale stability at age 12 was 40%; an estimate comparable to the 43% stability at the same age interval in Sample 1. In Sample 3, the average subscale stabilities between the ages of 5–12 and 12–17 were 18% and 44%, respectively. An estimate of 18% in the 7-year interval prior to age 12 implies that, were the time intervals equal to those in Sample 1 (an average of 2.3 years prior to age 12), the stability would be estimated at 42%; highly consistent with the estimate obtained in Sample 1. The estimate of 44% in the 5-year interval between the ages of 12 and 17 implies that the stability would equal 58%, given a test-retest interval comparable to that of Sample 1 (i.e., 2.5 years). Thus, the temporal stability of intelligence as estimated in Sample 3 increases with age, and is consistent in both its magnitude and its observed increase with that estimated in Samples 1 and 2. In Sample 4, the mean subscale stability between the ages 15 and 18 is estimated at 30%. This is lower than the estimates obtained for the other samples; however, in Sample 4 the Raven sum score alone is used as a predictor of the three WAIS subscales (Figure S3 in Supplementary Material). Thus, while the 30% estimate may reflect a lower temporal stability, it may also be attributable to the relatively low correlation between the WISC subscales and the Raven.
Overall, the phenotypic subscale analyses indicate moderate to high stability of individual differences in intelligence across childhood and adolescence. The stability increases with age; i.e., the individual differences in intelligence become increasingly stable as individuals transition from childhood to adolescence. Notably, the stability remains in the intermediate to high range despite the variation in the instruments used to assess intelligence, and the results replicate well despite the differences in tests and measurement intervals across the four different samples.
Phenotypic Simplex Model With a g Factor
The upper right panel of Figure 3 shows the phenotypic simplex model with a g factor fitted to Sample 1. On average, the g factor explained around 37%, 31%, 38%, 47%, and 55% of subscale variance at ages 5, 7, 10, 12, and 18, respectively (possibly indicating an increasing role of g in intelligence over time, but also possibly reflecting the differences in the tests used). The temporal stability of the g factor is remarkably high: nearly the entire inter-individual variation at a given age can be predicted by the variation at the preceding age. The residual, subscale-specific variation displays a modest degree of stability over time: 20% on average. It should, however, be noted that this is a lower estimate of residual stability, as the estimates of the subscale-specific variation also include measurement error.
In Sample 2, the g factor explained an average of 47% and 37% of subscale variance at ages 9 and 12, respectively. The stability of g from age 9 to age 12 was around 80%, and the stability of the residual scores was modest (~15%), as in Sample 1. In Sample 3, the stability estimates were somewhat lower (42% and 65% at intervals 5–12 and 12–17, respectively). Note, however, that stability estimates of 42% and 65% over 7 and 5 years, respectively, imply that the stability would have been estimated at around 75% and 80%, respectively, had the time intervals been comparable to those of Samples 1 and 2 (2–2.5 years). In Sample 4, the stability was estimated at 60%. Again, it should be borne in mind that in Sample 4, Raven alone was used as a predictor of all three WAIS subscales; therefore, the lower stability estimate may reflect a lower temporal stability in Sample 4, but may also be due to the relatively low correlation between the Raven sum score and the WAIS subscale scores.
In summary, the phenotypic stability of g over childhood and adolescence is high, and exceeds the stability of individual subscales. The g factor explains around 30–55% of subscale variance, regardless of the test used. Across all four samples, the stability of the subscale-specific scores is modest (around 15–20%).
ACE Simplex Model
The lower left panel of Figure 3 shows the parameter estimates obtained for the ACE simplex model fitted to Sample 1. As evident from the figure, the additive genetic influences on intelligence are highly stable; the stability estimates range from approximately 90% to 100%, and display an increase with age. Therefore, the genes that influence intelligence in early childhood overlap largely, if not entirely, with those that affect it throughout childhood and adolescence. Cross-lag regressions across measurement points and residual correlations within measurement points are fairly low, indicating that the genetic factors affecting verbal abilities are largely distinct from those affecting non-verbal abilities, both within and across measurement points. The stability of common environmental influences is in the intermediate range, and differs per subscale: the common environmental stability of verbal abilities before age 10 is considerably higher than the common environmental stability of non-verbal abilities in this period; at later ages, however, the difference in stability between the subscales appears to disappear. However, as the magnitude of C component is small and decreases over time (see end of the Results section), the apparent differences in subscale stabilities are likely attributable to the unreliability of the relevant parameters. The unique environmental influences display virtually no stability over time; the stability estimates are close to zero at all time points.
In Sample 2, the additive genetic influences are highly stable (over 90% on average), the common environmental stability is high (~85% on average), and unique environmental stability is low (12% on average). Similarly, in Sample 3, the additive genetic stability is high (close to 100% except for the FOI subscale at age 12, the stability of which is estimated at 38%), the common environmental stability is estimated at 47% in the 5–12 interval and 100% in the 12–17 interval, and the E stability is virtually zero. In Sample 4, the A influences are estimated to be around 80% stable, the C influences around 60% stable, while the E influences display virtually zero stability.
Overall, the results indicate a high additive genetic stability (largely 90–100%), a moderate to high common environmental stability, and a complete absence of unique environmental stability for both verbal and non-verbal abilities. The cross-subscale (e.g., verbal-nonverbal) stability is consistently low.
ACE Simplex Model With a g Factor
The ACE simplex model with a g factor fitted to Sample 1 is shown in the lower right panel of Figure 3. In this sample, the additive genetic g factor explains around 60% of the additive genetic subscale variance and displays nearly perfect stability; 100% at most time points. Similarly, the additive genetic subscale residuals generally display a high temporal stability. The common environmental g factor explains around 40% of the common environmental subscale variance. The stability of common environmental influences appears to increase after age 10: the stability estimates are 5%, 24%, 100%, and 100% at ages 7, 10, 12, and 18, respectively. However, the magnitude of the C (and E) variance is relatively small, and thus the reliability of the C (and the E) stability parameters is likely low. Overall, the residual C stability is estimated to be high. The unique environmental component displays an opposite pattern to the common environmental component: the stability of g before the age 10 is high, and declines substantially thereafter. However, the unique environmental g factor explains only 16% of the unique environmental subscale variance on average; the rest is explained by the subscale-specific E factors, which display virtually no stability.
In Sample 2, the additive genetic g factor explains 52% of the additive genetic subscale variance, and is 70% stable on average. Similarly, the residuals are highly stable (85%). The Cg factor displays complete stability, and explains 90% of the C subscale variance. The Eg factor explains only 24% of the unique environmental subscale variance, and is 30% stable on average, with highly unstable residuals. In Sample 3, the additive genetic g factor explains 70% of the subscale variance and is 93–100% stable. The Cg factor explains around 60% of the C subscale variance, and declines in stability from 100% at ages 5–12 to 34% at ages 12–17. Again, however, the variance in the C stability estimates is likely due to the small magnitude of C. The E subscale variance was only modestly explained by Eg (~25%), and displayed stability neither at the g level, nor at the residual level. In Sample 4, the Ag, Cg, and Eg factors explained around 76%, 52%, and 11% of their respective variance, and were 100%, 100%, and 16% stable, respectively.
In summary, the Ag, Cg, and Eg factors explained an average of ~65%, ~60%, and ~20% of the A, C, and E variance, respectively. The Ag factor was highly stable over time (mostly close to 100%), with highly stable residuals. The Cg factor was generally highly stable (close to 100%), with some exceptions (ages 5–10 in Sample 1 and ages 12–17 in Sample 3; however, considering the small magnitude of the C variance component, these exceptions likely reflect the unreliability of the estimates. The Eg factor displayed modest stability (around 35% on average), but explained only around 30% of the E variance, the remainder of the variance being entirely unstable (close to 0%) across all samples.
Magnitude of Variance Components
The relative magnitude of the A, C, and E variance components, as estimated in the ACE simplex models and averaged over subscales at each age, is depicted in Figure S2 (Supplementary Material). An age-related increase in heritability accompanied by a relative decline in common environmental variance, expected based on the literature (e.g., Bartels et al., Reference Bartels, Rietveld, Van Baal and Boomsma2002; Bishop et al., Reference Bishop, Cherny, Corley, Plomin, DeFries and Hewitt2003; Boomsma & van Baal, Reference Boomsma and van Baal1998; Deary et al., Reference Deary, Spinath and Bates2006; Haworth et al., Reference Haworth, Wright, Luciano, Martin, De Geus, Van Beijsterveldt and Davis2009; Hoekstra et al., Reference Hoekstra, Bartels and Boomsma2007; McGue et al., Reference McGue, Bouchard, Iacono, Lykken, Plomin and McClearn1993; Petrill et al., Reference Petrill, Lipton, Hewitt, Plomin, Cherny, Corley and DeFries2004; Plomin, Reference Plomin1986; Polderman et al., Reference Polderman, Gosso, Posthuma, van Beijsterveldt, Heutink, Verhulst and Boomsma2006), is evident in Samples 1, 3, and 4. In Sample 2, where only two measurement points were available (ages 9 and 12), this trend was not apparent. This lack of trend can presumably be attributed to the brevity of test-retest time interval.
Integrated Results
Figures 4 and 5 depict estimates of standardized variance components and A, C, and E stabilities, respectively, obtained across all four samples and shown for verbal and non-verbal abilities separately. Unlike data in Figure 5, the data in Figure 4 did not appear to show considerable deviations from linearity; therefore the general trends are represented by linear regression lines weighted by sample size in Figure 4 and by a smoothing function (lowess function as implemented in R; R Core Team, 2013) in Figure 5. Consistently with Figure S2, an increase in the relative magnitude of additive genetic variance accompanied by a decrease in common (and, to some extent, unique) environmental variance is evident from Figure 4. Figure 5 indicates an increase in stability of all three components over time, and suggests that the observed phenotypic stability is driven primarily by additive genetic factors, with unique environment contributing primarily to change. Note that, for comparability, Figure 5 re-expresses the stability estimates on a scale on which all measurement points are equidistant (6 years). As explained earlier (see Supplementary Material), stability estimates are dependent on the time interval one uses for estimation and therefore the absolute magnitude of the stability estimates is not interpretable in itself. The choice of time interval used to re-express the estimates is therefore arbitrary; the reason a 6-year period was chosen in this case is the fact that, with smaller (e.g., 1-year) time intervals, the stability estimates reach an upper bound, making it impossible to distinguish between the stability of the different variance components (i.e., the C and E stability estimates increase, whereas the A stability estimates readily hit the upper bound of 1). Finally, Table 1 gives all available estimates of the phenotypic, genetic and environmental correlations obtained under an ACE simplex model, for verbal and non-verbal abilities separately. Again, it is evident that the observed stability of intelligence is driven primarily by additive genetic factors, with common environment contributing both to stability and change, and the unique environment predominantly generating change.
Note: The correlations are given for verbal (below diagonal) and non-verbal (above diagonal) abilities separately.
Discussion
The present study examined the stability of verbal abilities, non-verbal abilities, and general intelligence across childhood and adolescence, and assessed the genetic and environmental etiology of this stability. Other questions included the feasibility of combining results on multiple types of intelligence tests administered in a longitudinal design with the aim of utilizing the combined score in the context of gene-finding studies, and the relationship between different types of intellectual abilities over time (and the genetic/environmental etiology thereof).
The results indicate an intermediate to high phenotypic stability of individual differences in intelligence across the developmental period under study, with an increase in stability as individuals transition from childhood to adolescence. General intelligence, defined as a first first-order latent factor underlying subscale performance at a given age, explained around 30–55% of variance in subscale performance and displayed high temporal stability, exceeding that of individual subscales. The phenotypic stability appears to be driven primarily by genetic factors: the additive genetic influences were highly to entirely stable. The environment shared by family members appeared to contribute to stability to a moderate degree, while environmental factors unique to family members contributed mainly to innovation (i.e., to temporal instability). Similarly, the observed stability in the g factor was driven primarily by genetic factors: the additive genetic g factor displayed near complete stability, the common environmental g factor was generally stable but explained less of the phenotypic variance than the Ag factor, while the unique environmental g factor was modestly stable but explained only a minor fraction of the phenotypic variance in g. An age-related increase in heritability accompanied by a relative decline in common environmental variance, expected based on the literature (e.g., Bartels et al., Reference Bartels, Rietveld, Van Baal and Boomsma2002; Bishop et al., Reference Bishop, Cherny, Corley, Plomin, DeFries and Hewitt2003; Boomsma & van Baal, Reference Boomsma and Molenaar1998; Deary et al., Reference Deary, Spinath and Bates2006; Haworth et al., Reference Haworth, Wright, Luciano, Martin, De Geus, Van Beijsterveldt and Davis2009; Hoekstra et al., Reference Hoekstra, Bartels and Boomsma2007; McGue et al., Reference McGue, Bouchard, Iacono, Lykken, Plomin and McClearn1993; Petrill et al., Reference Petrill, Lipton, Hewitt, Plomin, Cherny, Corley and DeFries2004; Plomin, Reference Plomin1986; Polderman et al., Reference Polderman, Gosso, Posthuma, van Beijsterveldt, Heutink, Verhulst and Boomsma2006), was observed. In addition, the cross-subscale stability was consistently low, indicating a small to non-existent contribution of one domain of intelligence to another over time.
The stability of intelligence remained in the intermediate to high range despite the variation in the instruments used to assess it, and the results replicated well across the four samples despite the variation in tests and the time intervals used for estimation. The former relates to a common situation in data registries (e.g., twin registries), where data are often collected using a number of different psychometric instruments, the choice of test often being dependent on participants’ age. Given the increased accessibility of genotyping and sequencing technologies and the consequent increase in the use of twin registry data in gene-finding studies, the question of how to optimally combine the existing longitudinal data in defining the phenotype for such studies is gaining in relevance. In this context, there are two prominent issues: (1) the actual modeling of a measured genetic variant in multivariate data; and (2) the accommodation of family members in the analysis. The latter does not pose a problem as the methods and software for family-based gene finding studies are well developed (e.g., Chen & Abecasis, Reference Chen and Abecasis2007; Lippert et al., Reference Lippert, Listgarten, Liu, Kadie, Davidson and Heckerman2011; Minică et al., Reference Minică, Dolan, Hottenga, Willemsen, Vink and Boomsma2013, Reference Minică, Dolan, Kampert, Boomsma and Vink2014; Purcell et al., Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira, Bender and Daly2007). The former is potentially more problematic as full multivariate phenotypic modeling of family data is not computationally feasible, or perhaps even desirable. There are many possible loci of a genetic variant effect in a multivariate model, and therefore many possible models to consider. The present results, as pertaining to the longitudinal genetic covariance structure, suggest that a simple phenotypic sum score based on the repeated measures within a cognitive domain (e.g., verbal) should not result in any appreciable loss of information in a genetic association study (see Minică et al., Reference Minică, Boomsma, Van Der Sluis and Dolan2010). Whether one should sum over cognitive domains is a different question. The genetic g factor accounted for about 60% of the genetic variance of the subtest scores. Summing over domains will only improve the power to detect a genetic variant if it contributes to this common genetic variance. Rather than running the risk of missing genetic variants, it is advisable to carry out gene-finding studies for each domain separately. One can still arrive at an omnibus test of the genetic variant (i.e., address the question of whether the genetic effect generalizes over domains) by combining the statistical results (van der Sluis et al., Reference van der Sluis, Posthuma and Dolan2013).
While we believe that the high genetic stability provides a reasonable justification for summing over repeated measures within an individual, we note that this recommendation is limited in two ways. First, it applies to the present longitudinal results as obtained in the repeated measures design. From the point of view of power, a cross-sectional design may be preferable (and is certainly more efficient and cheaper to implement). However, the exact relationship between power and design is beyond the present scope (Minică et al., Reference Minică, Boomsma, Van Der Sluis and Dolan2010, do consider different multivariate designs). Second, the recommendation is based strictly on the present choice to model the covariance structure by means of autoregressive and cross-lagged modeling. This approach is informative with respect to stability, but does not consider developmental change from the point of view of individual growth curves (Ramsden et al., Reference Ramsden, Richardson, Josse, Thomas, Ellis, Shakeshaft and Price2011). We did not consider growth curve modeling as our IQ test scores were age-corrected, meaning that the present data were not informative with respect to individual developmental growth curves. Finally, the results present here were based on the standard genetic simplex model, in which A, C and E are assumed to be uncorrelated sources of individual differences. Whether this assumption (e.g., the absence of genotype-environment covariance) is valid to a reasonable approximation is an open question. Any genotype-environment covariance is unlikely to undermine our recommendations concerning data summarization in gene-finding studies. However, a representation involving phenotype to environment transmission, typically envisaged as smart children contributing to their own ‘smart’ environment (a.k.a. niche picking; Eaves et al., Reference Eaves, Last, Martin and Jinks1977) is possible (Dolan et al., Reference Dolan, de Kort, Kan, van Beijsterveldt, Bartels and Boomsma2014a, Reference Dolan, de Kort, van Beijsterveldt, Bartels and Boomsma2014b).
Supplementary Material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/thg.2014.26.
Acknowledgments
We would like to thank all NTR participants. This work was supported by the Netherlands Organization for Scientific Research (NWO 668.772; NWO 433-09-220; NWO 051.02.060, NWO-MagW 480-04-004; NWO/SPI 56-464-14192) and the European Research Council (ERC-230374).