Causal complexity in human research: On the shared challenges of behavior genetics, medical genetics, and environmentally oriented social science

James W. Madole; K. Paige Harden

doi:10.1017/S0140525X23000833

Causal complexity in human research: On the shared challenges of behavior genetics, medical genetics, and environmentally oriented social science

Published online by Cambridge University Press: 11 September 2023

James W. Madole

and

K. Paige Harden

Show author details

James W. Madole: Affiliation:
Department of Psychology, University of Texas at Austin, Austin, TX, USA [email protected] [email protected] VA Puget Sound Health Care System, Seattle, WA, USA
K. Paige Harden: Affiliation:
Department of Psychology, University of Texas at Austin, Austin, TX, USA [email protected] [email protected]

Article contents

Abstract
Introduction
What is a cause?
How are RCTs of environmental interventions and within-family GWASs alike and unalike?
Is behavior genetics a qualitatively different endeavor?
Parting thoughts
References

Rights & Permissions

Abstract

We received 23 spirited commentaries on our target article from across the disciplines of philosophy, economics, evolutionary genetics, molecular biology, criminology, epidemiology, and law. We organize our reply around three overarching questions: (1) What is a cause? (2) How are randomized controlled trials (RCTs) and within-family genome-wide association studies (GWASs) alike and unalike? (3) Is behavior genetics a qualitatively different enterprise? Throughout our discussion of these questions, we advocate for the idea that behavior genetics shares many of the same pitfalls and promises as environmentally oriented research, medical genetics, and other arenas of the social and behavioral sciences.

Type: Authors' Response
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e206

DOI: https://doi.org/10.1017/S0140525X23000833 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

R1. Introduction

When opposing groups of intelligent, highly educated, competent scientists continue over many years to disagree, and even to wrangle bitterly about an issue they regard as important, it must sooner or later become obvious that the disagreement is not a factual one.… If this is, as I believe, the case, we ought to consider the roles played in this disagreement by semantic difficulties arising from concealed differences in the way different people use the same words, or in the way the same people use the same words at different times; … and by differences in their conception of what is an important problem and what is a trivial one, or rather what is an interesting problem and what is an uninteresting one. (Lehrman, Reference Lehrman and Aronson1970, pp. 18–19)

Behavior genetics is a topic about which scientists and scholars have continued over many years to disagree, sometimes bitterly. One feature that makes this rancorous debate curious is that the overarching conclusion of behavior genetics – namely, that there are causal genetic effects on human behavioral differences – is often described as obvious and uninteresting. Turkheimer, for instance, wrote off the “absurd null hypothesis” that genetic effects on behavior are zero, questioning: “Does anyone actually believe this?” Even the most negativistic commentator (Burt) began with the premise that “genetic differences matter for human social outcomes – achievements, behavior, physical health, personality – in a complex, context-sensitive way.” Yet our attempt to describe how one might go about conceptualizing, identifying, and leveraging causal genetic effects on human behavioral differences – the very effects that, we are told, everyone already believes exist – inspired 23 widely divergent commentaries.

We consider the diversity of the commentators' opinions, and the intensity of their sentiments, a sign of success: our paper surfaced profound disagreements about what, in the study of human behavior, constitutes an important problem versus a trivial one, an interesting problem versus an uninteresting one, and – to add to Lehrman's list – an in-practice difficult problem versus an in-principle impossible one. Commentators contested nearly every single one of our arguments, but not everyone disagreed with the same arguments, and the commentators also disagreed with each other. Further complicating matters, a few commentators disagreed with themselves, advancing contradictory points within the space of their short replies, and a few expressed agreement-masquerading-as-disagreement – that is, they wrote as if they were in sharp conflict with our target article, but they were, in fact, restating positions we also advanced.

Our synthesis of these disparate viewpoints is necessarily imperfect (like behavioral genetics itself!): it flattens multidimensional arguments and glosses over nuance. We encourage readers to (re)read the individual commentaries to reconstitute the details that have been lost. In particular, we recommend the commentaries by Bourrat; Ross, Kendler, & Woodward, (Ross et al.); Lynch, Brown, Strasser, & Yeo (Lynch et al.); Syed; and Durlauf & Rustichini, as we thought these authors brought fresh and incisive perspectives to a topic that can often recycle the same stale points and counterpoints. With these caveats and recommendations in mind, the commentaries on our target article can be read as speaking to three overarching questions, which we will consider in turn:

(1) What is a cause?
(2) How are randomized controlled trials (RCTs) and within-family genome-wide association studies (GWASs) alike and unalike?
(3) Is behavior genetics a qualitatively different enterprise?

R2. What is a cause?

Even before genomic data entered the mix, commentators disagreed about what, exactly, made something a “cause,” and what was needed to infer a causal relationship. Ross et al. aptly summarized our treatment of causal inference:

M&H adopt a broadly “interventionist” treatment of causation – the minimal condition for some factor C to count as a cause for an outcome E is that if, hypothetically, unconfounded manipulations of C were to be performed these would lead to changes in E. In the familiar case of a randomized experiment, this leads to the conclusion that an average causal effect (ACE) is a legitimate causal notion. M&H observe that an ACE can be present even though C does not have a uniform effect, even though a similar ACE may not be present in populations different from the population from which the experimental sample was drawn, and even though the experiment tells us nothing about the mechanism by which Cs cause Es. We agree.

Their endorsement of our perspective on causation is not surprising (but is reassuring), as our understanding of causation was strongly informed by Woodward's (Reference Woodward2005) previous work, particularly his book Making Things Happen. This interventionist perspective on what makes something a cause continues the lineage of Holland's (Reference Holland1986) decree: “No causation without manipulation” – even if that manipulation can happen only hypothetically, that is, in the form of a thought experiment.

It turns out that not everyone is a Holland acolyte. Some commentators emphasized regularity accounts of causation (Mill, Reference Mill1843/2002), which prioritize concepts like temporal precedence and the repeated co-occurrence of X and Y. Hart & Schatschneider defined causation as follows: “the cause must precede the effect, second, the cause must be related to the effect, and third, we can find no plausible explanation for the effect other than the cause.” Likewise, Tarchi, Merola, Castellini, & Ricca (Tarchi et al.) suggested that “[r]ecent developments in the regularity theory of causation (based on the premises that causes are regularly followed by their effects) have allowed for a precise estimation and quantification of confidence in causal relationships.” Other commentators noted or implied that causation could (should?) be conceptualized in terms of prediction (Shen & Feldman) or mechanism (Smith & Downes).

A complete defense of a broadly interventionist treatment of causation is obviously beyond the scope of this reply. “What is appropriately considered a cause?” and “What is appropriately considered evidence for a cause?” are questions that occupy entire careers. More simply, we have two recommendations to readers who are sifting through the various comments.

First, we ask you to reflect on how you typically answer those questions when the causes under consideration are not genetic in nature. Imagine, for instance, that you were reviewing a paper showing that children whose families were randomly selected to receive housing vouchers to move out of low-income neighborhoods were more likely, on average, to attend college (Chetty, Hendren, & Katz, Reference Chetty, Hendren and Katz2016). Would you object if the authors concluded that their results were consistent with a causal effect of neighborhood environment on educational attainment? Even if the effect increased college attendance by only 2.5 percentage points? Even if you couldn't perfectly predict who would go to college just from knowing whether they moved? Even if the authors could only speculate about the mechanisms linking neighborhood characteristics with educational attainment? Even if the neighborhoods to which people moved were all different from one other, such that everyone in the “treatment” group experienced quite different treatments?

Your answers to such questions are informative about what you already think is, and is not, necessary to infer causation, and what you believe to be the scope of that causal inference. We suggest keeping these priors in mind when considering the question of what it means for genes to be causal. Consistency might be the “hobgoblin of little minds” (Emerson, Reference Emerson1841/1993), but in this case, we think some semantic and conceptual consistency about what makes something a “cause” is important for empirical design, statistical interpretation, scientific theory-building, and policy application. Referring back to the Lehrman quotation that we used as an epigraph to this reply, we urge you to avoid using the word “cause” differently at different times.

Second, we encourage you to remember that the binary judgment of whether or not X is a cause of Y is not the only judgment on the table. As Cerezo reminded us in her comment, there are “metaphysical tools” in our toolbox for describing and differentiating among types of causes (see also Kinney, Reference Kinney2019; Ross, Reference Ross2021, Reference Ross2022). As we described in our target article, and as we will continue to describe in this reply, inferring that X caused Y is one of the first steps toward understanding the X–Y relationship, not the last.

R3. How are RCTs of environmental interventions and within-family GWASs alike and unalike?

In our target article, we described how humans have two copies of every gene, and offspring inherit, at random, just one of them. Because genotypes are randomly “assigned,” conditional on the parental genotypes, average phenotypic differences between family members who have been randomly assigned to different genotypes are conceptually analogous to the average difference between people who have been randomly assigned to treatment and control groups in a randomized controlled trial (RCT). That is, some genetic designs, because they capitalize on the randomness of genetic inheritance within families, can estimate the average causal effect of genotypes on phenotypes.

Many of the commentators focused on what they perceived to be the limits of that analogy. Most commonly, they described desirable features of RCTs that they held to be lacking in a within-family genetic study, claiming that (1) only RCTs involve a manipulation of the causal stimulus, (2) RCTs have greater uniformity of the causal stimulus, and (3) RCTs give more actionable information. A few commentators took a different tack, pointing out the ways in which within-family genetic studies share undesirable qualities of RCTs, in particular that (4) RCTs make an unsatisfactory trade-off between internal validity and external validity. Although many of these points were incisive, we believe that the gap between within-family genetic studies and nearly all RCTs in the social and behavioral sciences is smaller than most commenters acknowledged.

R3.1. Which designs involve a randomized manipulation?

Some commentators disagreed with the fundamentals of our analogy. For instance, Hart & Schatschneider objected to comparing within-family genetic studies to RCTs on the grounds that the former “do not have a manipulation,” whereas Kaplan & Bird implied that only RCTs have a “randomizing element.” We disagree. Although there is no artificial manipulation of the genome in human behavioral genetics, the conception of every human involves a natural randomized manipulation of genetic material. Indeed, it's ironic that Hart & Schatschneider warned against “forcing the language of experiments … on within-family designs,” because the language of experiments actually comes from genetics! In his foundational work on experimental designs, Fisher named experimental “factors” after “Mendelian factors,” and strove to create randomization schemes that mimicked the randomization of genetic inheritance. In their comment, Pingault, Fearon, Viding, Davies, Munafò, & Davey Smith (Pingault et al.) quoted Fisher (Reference Fisher1952) on exactly this point:

The parallel drawn by [Madole & Harden] was made explicitly by Fisher who established a direct filiation between the (artificially) randomized design he theorized and the (natural) randomization of genetic material at conception, in his words: “the factorial method of experimentation, now of lively concern so far afield as the psychologists, or the industrial chemists, derives its structure and its name, from the simultaneous inheritance of Mendelian factors.” (emphasis added)

R3.2. Which designs are complicated by causal stimulus heterogeneity?

In our target article, we discussed the ways in which causal effects can be non-uniform, in that they can produce heterogeneous effects across individuals because of moderation by other causal factors. Lynch et al. incisively raised the issue of another source of heterogeneity, causal stimulus heterogeneity, when not everyone in the “treatment” group receives the same causal treatment. Causal stimulus heterogeneity is patently a problem when using polygenic scores (PGSs), which aggregate effects across very many single-nucleotide polymorphisms (SNPs): two people might have equivalently high or low PGS values, but arrive there via non-overlapping sets of SNPs. Lynch et al. cleverly analogized a PGS to a “a drug with thousands of ingredients of small efficacy, where each pill has one ingredient or an alternative at random according to a defined chance procedure.”

Ross et al. raised a similar concern, noting that it applies not only to PGSs, but also to associations with individual single-nucleotide polymorphisms (SNPs), because each SNP “tags” information about other genetic variants (including unmeasured variants) that are in linkage disequilibrium (LD) with the focal SNP included in a genome-wide association study (GWAS). They provided a similar analogy to Lynch et al.:

Assuming the random nature of meiosis, a GWAS corresponds to a huge number of different randomized treatments in the population: e.g., A versus C at SNP1, G versus T at SNP2 and so on. … Indeed, matters are even more complex since haplotypes are randomized not SNPs. We might perhaps conceptualize this as the assignment of randomized bottles to subjects, each containing a mixture of different drugs.

Related concerns were raised by Borger, Weissing, & Boon (Borger et al.) (“… thousands of single nucleotide polymorphisms (SNPs) are considered simultaneously”), and by Pingault et al., who also analogized polygenic influences to a drug cocktail, the formulation of which differs across people (“… in this case, the ‘treatment’ is not well defined (in content or timing) … [it is] like a drug RCT consisting of the simultaneous administration of hundreds of compounds”).

Lynch et al. correctly pointed out that one can still conclude, on the basis of an appropriately randomized study, that a heterogeneous causal stimulus “works,” in that it has a non-zero average treatment effect. But, figuring out how it works is be especially challenging:

The high causal stimulus heterogeneity is likely to produce non-uniform causal pathways from the very first steps, thus making it difficult or impossible to trace mechanisms from particular drug ingredients [i.e., from particular genetic loci] given only associations between treatments [i.e., PGSs] and outcomes.

We agree that causal stimulus heterogeneity does make mechanistic understanding considerably more difficult. But, as we will further explain, we do not think this is a problem that is unique, or uniquely difficult, to the study of genetic causes.

Most discussions of causal stimulus heterogeneity implicitly or explicitly contrasted the messiness of PGSs and SNP arrays with the supposed homogeneity of the causal stimulus in RCTs. Lynch et al., for instance, wrote: “In most RCTs, individuals in the treatment group receive the same, or as similar as possible, treatment or causal stimulus, such as a drug or educational intervention (causal stimulus homogeneity).” Siegel made a similar claim, writing that “RCTs of treatments that prove to be highly efficacious directly demonstrate their greater uniformity of therapeutic effects.”

From our perspective as clinical psychologists who have worked to deliver “empirically supported” psychotherapy to patients in clinical practice, who have been therapists on RCTs of novel psychotherapeutic interventions, and who have implemented educational interventions in our own classrooms, these characterizations of the alleged homogeneity of environmental interventions sound disconnected from the reality of social and behavioral science. Environmental interventions are, on the whole, more like a PGS than they are like lithium: they are cocktails “with thousands of ingredients of small efficacy,” the formulation of which differs across people.

This is perhaps most obvious in the case of therapeutic and educational interventions delivered one-on-one. In an RCT of cognitive behavioral therapy, for instance, the “underlying treatment … may in a real sense differ for every single unit,” because the content of every session is tailored specifically to the individual (Smith, Reference Smith2022, p. 656). But even interventions that are not psychotherapeutic might be, in practice, implemented in a highly idiosyncratic way (see, e.g., an ethnography of welfare case workers by Watkins-Hayes, Reference Watkins-Hayes2009). And, many interventions in the social and behavioral sciences package together multiple services, not all of which are taken up (or taken up in the same way) by every participant. Consider the High/Scope Perry Preschool Program (HPPP), which we discussed in our target article. This intensive intervention combined attending preschool for 2.5 hours, 5 days a week, for 2 years, home visits by teachers for 1.5 hours per week, and monthly small groups for parents. Given the complexity of the intervention, “one is still left without actually being able to pinpoint what it was in the Perry Preschool Project that actually influenced later adult outcomes” (Schneider & Bradford, Reference Schneider and Bradford2020, p. 52).

In this way, a binary variable reflecting the presence or absence of intent-to-treat in an environmental RCT does not always, or even usually, give us granular information about the relevant difference-maker(s) or assure that the relevant difference-maker(s) are experienced homogenously across people. Rather, a binary treatment indicator often represents a gross simplification (Heiler & Knaus, Reference Heiler and Knaus2021). The situation with oft-cited naturally occurring environmental exposures may be of even lower resolution: “treatments” like being drafted into the military (Angrist, Reference Angrist1990) or living in a region of Holland occupied by the Nazis (Stein, Susser, Saenger, & Marolla, Reference Stein, Susser, Saenger and Marolla1972) are hardly homogenous or “well-defined.”

Thus, even as philosophers and plant geneticists extol the homogeneity of environmental interventions in the social and behavioral sciences, interventionists themselves paint quite a different picture: “We must expect, study and capitalize on the heterogeneity that characterizes most effects in science” (Bryan, Tipton, & Yeager, Reference Bryan, Tipton and Yeager2021, p. 986). In fact, environmental interventionists sound remarkably like behavioral geneticists: “The researcher faces a tough trade-off between interpretability and statistical power or, put differently, between learning about the effects of the underlying heterogeneous treatments and the sample size available for studying each treatment” (Smith, Reference Smith2022, p. 656). These quotes illustrate that the problem of causal stimulus heterogeneity, while definitely a formidable challenge to mechanistic understanding, is not a challenge that is unique to the study of genetic causes, but is rather a difficulty that besets most studies in the social and behavioral sciences.

In light of this shared challenge, we wholeheartedly agree with Bondarenko that the “‘second-generation’ goals of causal inquiry in the context of human behavior cannot be achieved by genetics alone, nor do genetically informed research designs provide the only possible path toward mechanistic understanding.” Indeed, at no point in our target article did we suggest that genetics can achieve anything alone, nor did we suggest (as Smith & Downes alleged) that we “hope to supplant” environmental studies. Although we are optimistic that deeper measurements of the genome and further advances in fine-mapping and gene prioritization methods will result in a higher resolution understanding of genetic difference-makers, we also think that behavior genetics should incorporate more of the conceptual and methodological tools developed by environmental interventionists who are taking heterogeneity seriously, including methods for causal inference when there are multiple versions of treatment (VanderWeele, Reference VanderWeele2022; VanderWeele & Hernan, Reference VanderWeele and Hernan2013). As Durlauf & Rustichini highlight, formal theoretical structures are needed to reveal sources of heterogeneity and generate deeper causal explanations of how biopsychosocial systems produce human behavior. Again, we agree with commentators like Taylor, Weiss, & Marshall that the “genome [is] one resource (among many) used by the developmental system to grow,” and indeed, our entire discussion of average difference-makers as non-unitary causes is based on the idea that genes “operate within intricate causal systems” (target article, abstract). As we will discuss next, we also believe that, by virtue of being a difference-making component of the causal system, genetic data may play a key role in helping identify complex etiological models.

R3.3. Which designs give actionable information?

One word that recurred throughout the commentaries was “actionable.” Turkheimer intimated that knowledge of genetic causes was not actionable because the effect sizes are too small: “The actionable part of the [genotype-phenotype] correlation, estimated as a real number in the form of a PGS, is … under 5% for even the most studied traits…” The 5% figure refers to the within-family effect size of an educational attainment polygenic score (PGS) on years of education in North American samples who have “European” genetic ancestry. Effects of this magnitude were trivialized as “weak” and “small and indeterminate.”

We disagree with this characterization. An R ² of 5% is approximately equal to a correlation (r) of 0.22, or to (assuming equal-sized groups) a Cohen's d of 0.46. By comparison, a study of Swedish children who were adopted into better socioeconomic circumstances found that adoption increased IQ scores, relative to the children's siblings who were reared by their biological parents (Kendler, Turkheimer, Ohlsson, Sundquist, & Sundquist, Reference Kendler, Turkheimer, Ohlsson, Sundquist and Sundquist2015) by around 4.5 IQ points, or a Cohen's d of 0.34. This effect of “family environment” (certainly a heterogeneous causal stimulus!) was described as “a significant advantage in IQ” (Kendler et al., Reference Kendler, Turkheimer, Ohlsson, Sundquist and Sundquist2015, p. 4612). Another study of children randomized to foster care, rather than to severely deprived institutional care, found that foster care increased average IQ scores at age 12 by d = 0.41 (Almas, Degnan, Nelson, Zeanah, & Fox, Reference Almas, Degnan, Nelson, Zeanah and Fox2016). Yet another study examined the effects of a major educational reform in Sweden, which increased the number of years of compulsory schooling, abolished academic tracking at grade six, and rolled-out a unified national curriculum (Meghir & Palme, Reference Meghir and Palme2005). Researchers leveraged the gradual implementation of the reform across different areas in Sweden and concluded that the reform increased years of schooling by about 3.5 months, d ~ 0.2. What these examples illustrate is that the estimated causal effect of the educational attainment PGS rivals the effect sizes that we observe when people experience radically sweeping changes to their environmental context.

Most effect sizes for specific environmental interventions are even smaller than 5%, and this is exactly what we would expect given the causal complexity of human behavior. Kraft (Reference Kraft2020), for instance, reviewed all of the educational studies funded by the U.S. government's Investing in Innovation fund, and found that the median effect size was d = 0.03. They concluded: “effects of 0.15 or even 0.10 SD should be considered large and impressive” (p. 248). Yeager and Dweck (Reference Yeager and Dweck2020) similarly summarized: “In the real world, single variables do not have huge effects. Not even relatively large, expensive, and years-long reforms do. If psychological interventions can get a meaningful chunk of a .20 effect size on real-world outcomes in targeted groups, reliably, cost-effectively, and at scale, that is impressive” (p. 1281). We agree. Our conclusion that genetic effect sizes are, for some phenotypes, impressive does not stem from grandiosity about genetics, but rather from humility about the difficulty of specifying any cause, artificially manipulated or naturally varying, that accounts for even a few percentage points of the variance in complex behavior.

Others pointed out that, unlike the results of some RCTs, causal genetic effects are not directly actionable because we cannot artificially manipulate the genome on the basis of that information. Pingault et al. wrote: “Thus, while RCTs can provide actionable evidence of a specific intervention's efficacy, a within-family genetic association only indicates the effect of inheriting one variant or another.” Markon similarly pointed out that “counterfactual theory … takes an unactionable epistemological stance: even if a counterfactual account informs about what would have occurred had things been different, it does not inform about what one can do now, given things as they are.” Kaplan & Bird concurred: “[t]he ‘shallowness’ of the causal knowledge gained in RCTs does not prevent them from being useful guides to practice … the situation in behavior genetics is nothing like this. Unlike in the case of RCTs, we cannot change the genetic variants associated with the phenotypic variation – and even if we could, doing so would be wildly irresponsible.” We made a similar point in our target article, writing that “even if we concede that, at a conceptual level, genes could cause average differences in human behavior, at a practical level, it is not readily apparent what we would do with this knowledge.… [W]e cannot (and should not) readily apply knowledge of genetic causes to change the genomes of large swathes of the population in the hopes of changing their outcomes” (target article, sect. 1.2, para. 6).

Given that we are not planning to change people's genotypes, Kaplan & Bird ask, “what would we gain from even an accurate finding that a particular genetic variant was associated with downstream effects?” This is a curious question to pose, as it is precisely the question that we address, at length, in the target article on which they are supposedly commenting. Our short answer is that we agree with Lewontin (Reference Lewontin1974) (quoted by Turkheimer): “The analysis of causes in human genetics is meant to provide us with basic knowledge we require for correct schemes of environmental modification and intervention” (p. 409).

And, so far, the project of devising correct schemes of environmental modification and intervention has been far less successful than commonly imagined (Kraft, Reference Kraft2020). Given that there is clearly room for improvement for successful identification of environmental intervention targets, we agree with Bondarenko that one “important role for genetic data” is as “controls in the study of environmental variables … when applied with care, genetic controls may help address some of the worries that our ‘first-generation’ knowledge of environmental factors does not meet a stringent epistemic standard.” We also agree with commentators who pointed out that that genetic results can be usefully exploited in Mendelian randomization studies (Pingault et al.) for the study of phenotypic causation (Pingault, Richmond, & Davey Smith, Reference Pingault, Richmond and Davey Smith2022), which “given their speed and relative low cost” are a “useful first-step to guide future randomization/intervention studies” (Haworth & Wootton).

Finally, simply knowing that something has genetic causes – even if you can't identify them or manipulate them – can and has changed clinical practice. For example, genetic research on substance use disorders (SUDs) contributed to a paradigm shift in conceptualizing addiction as a chronic disease that resides within the body of the individual rather than as a moral deficit that resides within their will (Hall, Carter, & Forlini, Reference Hall, Carter and Forlini2015; Volkow & Koob, Reference Volkow and Koob2015). Now, educating patients about genetic effects on addiction is a standard part of psychoeducation that can reduce stigma and increase motivation for treatment (Hassan et al., Reference Hassan, Ragheb, Malick, Abdullah, Ahmad, Sunderji and Islam2021; Ray, Reference Ray2012). Causal knowledge has the power to produce important changes in our intentions and actions as scientists and clinicians, whether those causes can be manipulated or not.

R3.4. Which designs have external validity?

Most commentators who were critical of our target article had the intuition that RCTs in the social and behavioral sciences were valuable research endeavors, and objected to our comparing within-family genetic studies to them. A few commentators, however, brought up that RCTs also have their limitations, most prominently, that they sacrifice external validity and generalizability for the sake of internal validity. Shen & Feldman, for instance, pointed out that causal knowledge built from within-family comparisons does not necessarily generalize on a population scale. Borger et al. also decried the “limited ability…to generalize,” and Siegel warned that GWAS results “may not apply to other ethnic groups, a non-uniformity that may exacerbate health disparities.” Syed offered a particularly trenchant summary of the problem:

It is a fact of the design that RCTs sacrifice external validity for the sake of internal validity, being high in efficacy, showing promising results in trials, but low in effectiveness, or lack of results when translated to real-life conditions … RCTs have been further criticized … for their lack of inclusion of racial/ethnic minorities and thus limited generalizability.

For those interested in “humans in general” (Byrne & Olson), causal knowledge that is not built from culturally and demographically representative samples – and therefore not expected to apply equally across subgroups – is inherently flawed. As Syed said, “no diversity in, no causes at all.” In contrast, others were less interested in “humans in general” knowledge, and were instead concerned with the “individual making important life-choices” (Miller). And, in an interesting counterpoint to Syed's commentary, Bourrat suggested that, in some cases, “local causal knowledge can be more useful for explanation and intervention than more generalisable knowledge.” Similar to Cerezo's emphasis on triggering conditions, Bourrat pointed out that accounting for context can reveal specificities about a causal relationship that are masked when aggregating more generally.

A medical example of the unique contribution of local causation comes from oncology research, where chemotherapy was found to have no average impact on patient survival in a cohort of individuals with stage IB lung cancer, but rather improved patient survival only in those with a tumor size greater than 0.4 centimeters (Strauss et al., Reference Strauss, Herndon, Maddaus, Johnstone, Johnson, Harpole and Green2008). This is what Bakermans-Kranenburg and Van Ijzendoorn (Reference Bakermans-Kranenburg and Van Ijzendoorn2015) referred to as the “hidden efficacy of interventions.” The effect of chemotherapy on patient survival cannot be expected to generalize across all individuals with stage IB lung cancer, but it makes a significant difference for some people.

This example illustrates that locality is not always a curse and that non-portable causes can still sometimes be useful. Finding that a genetic effect holds within a particular ancestral group but not another, or is manifest under certain social conditions but not others, or applies within families but not between them, can be valuable because it allows us to build causal knowledge within family or cultural systems.

The tension between Syed's and Bourrat's commentaries – both of which we think make incisive and valuable points – highlights what Richard Levins described as “the contradictory desiderata of generality, realism, and precision…” (Levins, Reference Levins1966, p. 431). The more we attempt to generalize, the more we collapse over dimensions of variability that meaningfully influence the causal relationship; the more we specify local variables, the more we restrict the applicability of our findings (see Yarkoni [Reference Yarkoni2022] and related commentaries for a discussion about the precision of estimation/breadth of generalization trade-off).

With these contradictory desiderata in mind, we strongly agree with Syed that randomization is not enough: who is being randomized across what dimensions of human experience? We endorse their conclusion that “diversity is central to first-generation studies” and that “the racial/ethnic diversity of samples included in behavior genetic studies… must be central to any effort to build generalizable causal knowledge.” Similarly, we agree with Byrne & Olson that, behavior geneticists have a responsibility to make “analytic choices” that “enhance the visibility of context,” in particular, sampling across sociocultural contexts that have heretofore been largely excluded from genetics research. And, we agree with Eftedal & Thomsen that “[b]ehavioral genetics should broaden its empirical scope beyond single-culture WEIRD samples.”

Accordingly, we applaud recent efforts within the field to conduct GWASs in non-European ancestral groups (Gulsuner et al., Reference Gulsuner, Stein, Susser, Sibeko, Pretorius, Walsh and McClellan2020; Pereira, Mutesa, Tindana, & Ramsay, Reference Pereira, Mutesa, Tindana and Ramsay2021), advance methodology to increase the integration of diverse samples in GWASs (Mathur et al., Reference Mathur, Fang, Gaddis, Hancock, Cho, Hokanson and Johnson2022), improve discovery of within-family genetic effects (Howe et al., Reference Howe, Nivard, Morris, Hansen, Rasheed, Cho and Davies2022), develop equitable partnerships among international institutions that promote resource sharing and shared-infrastructure development (Martin et al., Reference Martin, Stroud, Abebe, Akena, Alemayehu, Atwoli and Chibnik2022), and build more representative biobanks like the Trans-Omics for Precision Medicine Program that allow for genetic discovery within cultural subgroups (Popejoy & Fullerton, Reference Popejoy and Fullerton2016). These initiatives represent important steps toward building more representative causal knowledge. Finally, we also echo Tarchi et al.'s warning that an absolute requirement for a genetic study to have a causal identification strategy or to provide mechanistic understanding might have undesirable consequences for representation in genetics: “to consider ‘association studies’ as secondary or even detrimental should be critically evaluated,” lest we exacerbating problems of exclusion in genetic research.

R4. Is behavior genetics a qualitatively different endeavor?

Although most commentators offered critical refinements to our target article, suggesting ways to improve future studies or adding nuance to the interpretation of existing effects, Burt denounced the entire enterprise of behavior genetics as “impracticable.” In particular, they argued, while genetics “has the potential to advance understanding of human health and disease,” it was not “appropriate” to apply genetic methods to the study of “complex, social, non-disease achievements or behaviors.” (Kaplan & Bird similarly distinguished between what they referred to as “disease GWAS” and “sociobehavioral GWAS.”) In contrast, we find the distinction between “disease” and “complex, social (non-disease)” behavior artificial, and think it is conceptually incoherent to champion the virtues of medical genetics and criticize the utility of behavior genetics in the same breath.

Despite the distinction between “disease” and “complex, social non-disease” phenotypes being central to their argument, Burt gives only a circular, “I know it when I see it” non-definition of what should be, in their view, off-limits to genetic study: “complex social traits are defined by social context and thus irreducibly social.” But, the distinction between “disease” and “social (non-disease)” phenotypes is as contentious, fluid, and historically and culturally contingent, as the distinction between art and obscenity. Consider again the example of substance use disorders (SUDs) (a common subject of behavioral genetic research). In the last two decades, it has become increasingly popular to view SUDs under a medical model (Leshner, Reference Leshner1997). The American Society of Addiction Medicine (ASAM) defines addiction as “a primary, chronic disease of brain reward, motivation, memory and related circuitry” (ASAM, 2017). Yet many scholars have pushed back on the medicalization of SUDs (not to mention other psychiatric conditions) (Borsboom, Cramer, & Kalis, Reference Borsboom, Cramer and Kalis2019; Heilig et al., Reference Heilig, MacKillop, Martinez, Rehm, Leggio and Vanderschuren2021), arguing that it is “reductively inattentive to individual values and social context” (Courtwright, Reference Courtwright2010, p. 144). This debate is not without stakes: legal challenges to the incarceration of individuals with substance-related charges have hinged on the question of whether SUDs are “diseases” or “behaviors” (Commonwealth v. Eldred, 2018). The most commonly used diagnostic system in North American psychiatry, the DSM-V, offers no clarity, defining SUDs as a constellation of biological (e.g., physiological withdrawal), behavioral (e.g., disengaging from hobbies to protect use), and social (e.g., use interfering with relationships) symptoms. In the current debates about whether SUDs should be viewed as a disease or as an “irreducibly social” behavior, we hear echoes of previous and ongoing debates about how to best understand, for example, melancholy, Asperger's, psychosis, sexual orientation. Burt would have us believe that they can resolve these debates by fiat.

The distinction that Burt draws between “complex social (non-disease)” versus “disease” maps onto their distinction between “downward social causation,” in which “sociocultural forces … sort and select individuals based on genetically influenced traits” versus “upward genetic causation,” which operates “from genetic differences to trait differences through biological pathways.” The former is said to produce “artificial” genetic associations; the latter “authentic” ones.

This pat story neglects the role of “downward social causation” in disease and disabilities. Monogenic retinitis pigmentosa, for instance, is a rare disease that causes progressive loss of sight; an article in Genome Medicine summarized that the “first symptoms are retinal pigment on fundus evaluation, …eventually leading to legal blindness in a few decades” (Ayuso & Millan, Reference Ayuso and Millan2010, p. 1). What an interesting phrase, “legal blindness”! Blindness is defined by visual acuity. What makes blindness “legal” is whether legislators and policymakers deem the loss of visual acuity to be sufficiently severe enough that one, for example, qualifies for Social Security disability benefits, can no longer operate an automobile, gets special tax exemptions. A monogenic disorder with a well-defined biological pathway causes legal blindness; legal blindness is an artificially bounded category that is created when sociocultural forces – like the IRS – sort and select individuals based on their genetically influenced traits. Legal blindness is genetically caused and is “irreducibly social.”

And so is every other human phenotype. “Upward genetic causation” and “downward social causation” are always operating, on every human phenotype, because humans are social animals. Genes act on our bodies; society acts on our bodies. Society changes our biology; our biology changes how society responds to us. Every aspect of human life reflects these two streams of influence. Sometimes one stream is a raging torrent; sometimes the other is a trickle. But, regardless of their relative width and depth and ferocity, where the streams of influence mix and mingle is, for us, the most scientifically interesting point of study. Confluences are sacred.

R5. Parting thoughts

In closing, let us revisit a quote from Lynch et al.: “The high causal stimulus heterogeneity is likely to produce non-uniform causal pathways from the very first steps, thus making it difficult or impossible to trace mechanisms from particular drug ingredients [i.e., from particular genetic loci] given only associations between treatments [i.e., PGSs] and outcomes” (emphasis added).

The “or” in their sentence captures a core disagreement surfaced in this set of commentaries: difficult or impossible? We think that it will, in practice, be difficult – very difficult – to trace the mechanisms by which polygenic causal signals make a difference for human behavior, and to leverage that knowledge to improve human lives. We do not think that it is, in principle, impossible.

Considered from one angle, our rejection of epistemological skepticism is not the least bit surprising: we are psychologists, a profession that, by definition, presupposes that we can know things scientifically about why humans think, feel, and behave the way they do, and strives to use that knowledge, even when it is incomplete and flawed, to change how humans think, feel, and behave. We are writing, after all, in a journal called Behavioral and Brain Sciences. What is striking about these commentaries is that some working scientists appear to embrace a highly selective epistemological skepticism: knowing, in their view, is impossible, but only knowing about genetic influences on behavior, not about genetic influences on “disease,” or about environmental influences on behavior.

We think this selective skepticism is incoherent, because, as we've described in this reply, we see the conceptual and practical difficulties in understanding genetic influences on behavior as largely of a piece with other subfields. As Turkheimer (Reference Turkheimer2012) so pithily summarized: “Genome-wide association studies of behavior are social science” (our emphasis added). If we are going to continue our audacious attempts to study free-range humans scientifically, then we will necessarily grapple with causal stimulus heterogeneity and trade-offs between internal and external validity and small effect sizes and opaque mechanisms and uncertain generalizability and “downward” social causation and causal complexity, whether our work seeks to understand genetic causes or environmental ones, whether our work focuses on diseases or behaviors.

If we treat complexity as a “dead end” (Plomin & Daniels, Reference Plomin and Daniels1987), if we dismiss attempts to understand it as “far-fetched” (Kaplan & Turkheimer, Reference Kaplan and Turkheimer2021), if we arbitrarily declare some areas of inquiry “inappropriate,” then we run the risk of missing out on meaningful progress. “Even if we never understand biology completely…we can understand enough to interfere” (Hayden, Reference Hayden2010, p. 667). The only way forward is to muddle through.

References

Almas, A. N., Degnan, K. A., Nelson, C. A., Zeanah, C. H., & Fox, N. A. (2016). IQ at age 12 following a history of institutional care: Findings from the Bucharest Early Intervention Project. Developmental Psychology, 52, 1858.CrossRef Google Scholar PubMed

American Society of Addiction Medicine (2017). Retrieved from https://www.asam.org/docs/default-source/advocacy/2-asam-general-information-flyer.pdf?sfvrsn=3c6f46c2_2 Google Scholar

Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records. American Economic Review, 80, 313–336.Google Scholar

Ayuso, C., & Millan, J. M. (2010). Retinitis pigmentosa and allied conditions today: A paradigm of translational research. Genome Medicine, 2, 1–11.CrossRef Google Scholar PubMed

Bakermans-Kranenburg, M. J., & Van Ijzendoorn, M. H. (2015). The hidden efficacy of interventions: Gene×environment experiments from a differential susceptibility perspective. Annual Review of Psychology, 66, 381–409.CrossRef Google Scholar PubMed

Borsboom, D., Cramer, A. O., & Kalis, A. (2019). Brain disorders? Not really: Why network structures block reductionism in psychopathology research. Behavioral and Brain Sciences, 42, E2. doi:10.1017/S0140525X17002266CrossRef Google Scholar

Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature Human Behaviour, 5, 980–989.CrossRef Google Scholar PubMed

Chetty, R., Hendren, N., & Katz, L. F. (2016). The effects of exposure to better neighborhoods on children: New evidence from the moving to opportunity experiment. American Economic Review, 106, 855–902.CrossRef Google Scholar PubMed

Commonwealth v. Eldred (2018). 480 Mass. 90. MA: Supreme Judicial Court. Retrieved from https://law.justia.com/cases/massachusetts/supreme-court/2018/sjc-12279.html Google Scholar

Courtwright, D. T. (2010). The NIDA brain disease paradigm: History, resistance and spinoffs. BioSocieties, 5, 137–147.CrossRef Google Scholar

Emerson, R. W. (1993). Self-reliance, and other essays. Dover (original work published in 1841).Google Scholar

Fisher, R. (1952). Statistical methods in genetics. Heredity, 6, 1–12 (reprinted in 2010, International Journal of Epidemiology, 39, 329–335). https://doi.org/10.1093/ije/dyp379 CrossRef Google Scholar

Gulsuner, S., Stein, D. J., Susser, E. S., Sibeko, G., Pretorius, A., Walsh, T., … McClellan, D. J. (2020). Genetics of schizophrenia in the South African Xhosa. Science (New York, N.Y.), 367, 569–573.CrossRef Google Scholar PubMed

Hall, W., Carter, A., & Forlini, C. (2015). The brain disease model of addiction: Is it supported by the evidence and has it delivered on its promises? The Lancet Psychiatry, 2, 105–110.CrossRef Google Scholar PubMed

Hassan, A. N., Ragheb, H., Malick, A., Abdullah, Z., Ahmad, Y., Sunderji, N., & Islam, F. (2021). Inspiring Muslim minds: Evaluating a spiritually adapted psycho-educational program on addiction to overcome stigma in Canadian Muslim communities. Community Mental Health Journal, 57, 644–654.CrossRef Google Scholar PubMed

Hayden, E. C. (2010). Human genome at ten: Life is complicated. Nature, 464, 664–667.CrossRef Google Scholar

Heiler, P., & Knaus, M. C. (2021). Effect or treatment heterogeneity? Policy evaluation with aggregated and disaggregated treatments. arXiv preprint arXiv:2110.01427.Google Scholar PubMed

Heilig, M., MacKillop, J., Martinez, D., Rehm, J., Leggio, L., & Vanderschuren, L. J. (2021). Addiction as a brain disease revised: Why it still matters, and the need for consilience. Neuropsychopharmacology, 46, 1715–1723.CrossRef Google Scholar PubMed

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. doi:10.1080/01621459.1986.10478354CrossRef Google Scholar

Howe, L. J., Nivard, M. G., Morris, T. T., Hansen, A. F., Rasheed, H., Cho, Y., … Davies, N. M. (2022). Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nature Genetics, 54, 581–592.CrossRef Google Scholar PubMed

Kaplan, J. M., & Turkheimer, E. (2021). Galton's Quincunx: Probabilistic causation in developmental behavior genetics. Studies in History and Philosophy of Science, 88, 60–69.CrossRef Google Scholar PubMed

Kendler, K. S., Turkheimer, E., Ohlsson, H., Sundquist, J., & Sundquist, K. (2015). Family environment and the malleability of cognitive ability: A Swedish national home-reared and adopted-away cosibling control study. Proceedings of the National Academy of Sciences, 112, 4612–4617.CrossRef Google Scholar PubMed

Kinney, D. (2019). On the explanatory depth and pragmatic value of coarse-grained, probabilistic, causal explanations. Philosophy of Science, 86, 145–167.CrossRef Google Scholar

Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49, 241–253.CrossRef Google Scholar

Lehrman, D. S. (1970). Semantic and conceptual issues in the nature–nurture problem. In Aronson, L. R. (Ed.), Development and evolution of behavior. Essays in memory of TC Schneirla (pp. 17–52). WH Freeman.Google Scholar

Leshner, A. I. (1997). Addiction is a brain disease, and it matters. Science (New York, N.Y.), 278, 45–47.CrossRef Google Scholar PubMed

Levins, R. (1966). The strategy of model building in population biology. American Scientist, 54, 421–431.Google Scholar

Lewontin, R. C. (1974). Annotation: The analysis of variance and the analysis of causes. American Journal of Human Genetics, 26, 400–411.Google Scholar PubMed

Martin, A. R., Stroud, R. E., Abebe, T., Akena, D., Alemayehu, M., Atwoli, L., … Chibnik, L. B. (2022). Increasing diversity in genomics requires investment in equitable partnerships and capacity building. Nature Genetics, 54, 740–745.CrossRef Google Scholar PubMed

Mathur, R., Fang, F., Gaddis, N., Hancock, D. B., Cho, M. H., Hokanson, J. E., … Johnson, E. O. (2022). GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing. Communications Biology, 5, 1–9.CrossRef Google Scholar PubMed

Meghir, C., & Palme, M. (2005). Educational reform, ability, and family background. American Economic Review, 95, 414–424.CrossRef Google Scholar

Mill, J. S. (2002). A system of logic. University Press of the Pacific (original work published in 1843).Google Scholar

Pereira, L., Mutesa, L., Tindana, P., & Ramsay, M. (2021). African genetic diversity and adaptation inform a precision medicine agenda. Nature Reviews Genetics, 22, 284–306.CrossRef Google Scholar PubMed

Pingault, J. B., Richmond, R., & Davey Smith, G. (2022). Causal inference with genetic data: Past, present, and future. Cold Spring Harbor Perspectives in Medicine, 12, a041271.CrossRef Google Scholar PubMed

Plomin, R., & Daniels, D. (1987). Why are children in the same family so different from one another? Behavioral and Brain Sciences, 10, 1–16.CrossRef Google Scholar

Popejoy, A. B., & Fullerton, S. M. (2016). Genomics is failing on diversity. Nature, 538, 161–164.CrossRef Google Scholar PubMed

Ray, L. A. (2012). Clinical neuroscience of addiction: Applications to psychological science and practice. Clinical Psychology: Science and Practice, 19, 154–166. https://doi.org/10.1111/j.1468-2850.2012.01280.x Google Scholar

Ross, L. N. (2021). Causal concepts in biology: How pathways differ from mechanisms and why it matters. The British Journal for the Philosophy of Science, 72, 131–158. doi:10.1093/bjps/axy078CrossRef Google Scholar

Ross, L. N. (2022). Cascade versus mechanism: The diversity of causal structure in science. PhilSci Archive. Retrieved from http://philsci-archive.pitt.edu/20215/1/Ross_Cascade.pdf CrossRef Google Scholar

Schneider, B., & Bradford, L. (2020). What we are learning about fade-out of intervention effects: A commentary. Psychological Science in the Public Interest, 21, 50–54. doi:10.1177/1529100620935793CrossRef Google Scholar

Smith, J. A. (2022). Treatment effect heterogeneity. Evaluation Review, 46, 652–677. doi:10.1177/0193841X221090731CrossRef Google Scholar PubMed

Stein, Z., Susser, M., Saenger, G., & Marolla, F. (1972). Nutrition and mental performance: Prenatal exposure to the Dutch famine of 1944–1945 seems not related to mental performance at age 19. Science (New York, N.Y.), 178, 708–713.CrossRef Google Scholar

Strauss, G. M., Herndon, J. E., Maddaus, M. A., Johnstone, D. W., Johnson, E. A., Harpole, D. H., … Green, M. R. (2008). Adjuvant paclitaxel plus carboplatin compared with observation in stage IB non-small-cell lung cancer: CALGB 9633 with the Cancer and Leukemia Group B, Radiation Therapy Oncology Group, and North Central Cancer Treatment Group Study Groups. Journal of Clinical Oncology, 26, 5043–5051.CrossRef Google Scholar

Turkheimer, E. (2012). Genome wide association studies of behavior are social science. In K. S. Plaisance & T. A. C. Reydon (Eds.), Philosophy of behavioral biology (pp. 43–64). Springer.CrossRef Google Scholar

VanderWeele, T. J. (2022). Constructed measures and causal inference: Towards a new model of measurement for psychosocial constructs. Epidemiology (Cambridge, Mass.), 33, 141–151.CrossRef Google Scholar PubMed

VanderWeele, T. J., & Hernan, M. A. (2013). Causal inference under multiple versions of treatment. Journal of Causal Inference, 1, 1–20.CrossRef Google Scholar PubMed

Volkow, N. D., & Koob, G. (2015). Brain disease model of addiction: Why is it so controversial? The Lancet Psychiatry, 2, 677–679.CrossRef Google Scholar PubMed

Watkins-Hayes, C. (2009). The new welfare bureaucrats: Entanglements of race, class, and policy reform. University of Chicago Press.CrossRef Google Scholar

Woodward, J. (2005). Making things happen: A theory of causal explanation (paperback). Oxford University Press.Google Scholar

Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, E1. doi:10.1017/S0140525X20001685CrossRef Google Scholar

Yeager, D. S., & Dweck, C. S. (2020). What can be learned from growth mindset controversies? American Psychologist, 75, 1269–1284.CrossRef Google Scholar PubMed