Introduction
Birth weight is an important marker of current and future health, and has been used in many epidemiological studies of determinants of health and disease from childhood through adulthood to old age.Reference Kuh, Ben-Shlomo, Lynch, Hallqvist and Power 1 , Reference Baker, Olsen and Sørensen 2 Some studies have recorded birth weight directly in official records,Reference Leon, Lithell and Vågerö 3 but many studies rely on recalled birth weight reported by the participants or their mothers.Reference Rich-Edwards, Stampfer and Manson 4 Several studies have found that maternal recall is fairly accurate, even years after the birth,Reference Casey, Rieckhoff and Beebe 5 , Reference Tate, Dezateux and Cole 6 but to our knowledge there has been no systematic review to establish whether this finding is consistent across all published studies. This systematic review and meta-analysis of published observational studies aimed to determine the agreement between birth weight recalled by parent or self any time after birth, and the actual birth weight recorded in official records.
Methods
Data sources
We followed the Meta-Analyses of Observational Studies in Epidemiology (MOOSE) guidelines for the conduct,Reference Stroup, Berlin and Morton 7 and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines for the reporting,Reference Moher, Liberati, Tetzlaff and Altman 8 of this systematic review. M.G.Z. performed the literature search on MEDLINE, EMBASE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) from inception to May 2015 using terms as both keywords and indexing (MeSH) terms: birth weight AND (mental recall OR self-report) AND (recorded OR actual OR verified) (full strategy: Supplementary material 1). We also searched reference lists and performed a forward citation search of all included papers.
Study selection
We included studies in the systematic review which addressed the question: ‘Does recalled birth weight correlate with recorded birth weight?’ We included both self and parental recall, with no restriction on time from birth. We excluded studies that did not report a ‘gold standard’ for birth weight (recorded in official document, e.g. birth certificate or birth register). We excluded individuals with specific mental or physical illnesses to ensure results were applicable to the general population, but included control groups if these were reported separately. We excluded studies that selected participants on the basis of abnormal births (e.g. low birth weight or preterm) as a high-risk pregnancy or birth may affect frequency of measurement, and influence maternal recall, but included studies that included all unselected births. We excluded studies which only categorized birth weight into two or three categories. There were no exclusions by age, sex, socioeconomic status, ethnicity or country, or language of publication. M.G.Z., S.M. and T.H.M. independently identified studies for inclusion, resolving any disagreements by consensus, and/or discussion with S.D.S. and R.M.R. The protocol is available by contacting the authors.
Data extraction
M.G.Z. and T.H.M. independently extracted relevant information on study characteristics (Table 1), and results (Table 2) directly to Excel spreadsheets. This included factors which may influence recall of birth weight, that is time since birth, method of recall (questionnaire or interview) and parity. Each paper was assessed qualitatively for major sources of bias or confounding.
BW, birth weight; ICC, intraclass correlation; LBW, low birth weight; NR, not recorded.
Where data reported in oz, converted to g (1 oz=28 g).
Where mean difference is negative, recorded BW larger than recalled BW.
a Included in meta-analysis of mean difference.
b Included in meta-analysis of correlation only.
MD, mean difference.
Where data reported in oz, converted to g (1 oz=28 g).
Where MD is negative, recorded BW larger than recalled BW.
a Included in meta-analysis of mean difference.
b Included in meta-analysis of correlation only.
c Correlation coefficient/κ reported for categorical analysis therefore not included in meta-analysis.
Where data were not published, we contacted authors twice by email and post. We received a response with data from two (Sou,Reference Sou, Chen, Hsieh and Jeng 9 data included; Tehranifar,Reference Tehranifar, Liao, Flom and Terry 10 some required data not collected, therefore not included) and two further stating that data were not available. If there was no response, we estimated values using data or figures in the paper, for example, the standard deviation of mean differences.Reference O’Sullivan, Pearce and Parker 11 – Reference Catov, Newman and Kelsey 13 Where studies reported some form of correlation between the measures (Pearson’s r, Spearman’s ρ, ICC or κ) this was used in the main analysis if calculated on continuous (individual) birth weight measures, but not if calculated using categories of birth weight. Where more than one measure was reported, we used Pearson’s r.
Where no correlation measure was reported, we used the summary estimate from the other studies as described below. Jaspers et al.Reference Jaspers, de Meer, Verhulst, Ormel and Reijneveld 14 reported an upper CI which appeared too large (0.16 pounds=80 g), given the mean difference of 25 g and the lower interval of 10 g. We contacted the author but have not received a reply, so have used an upper CI of 40 g.
The main quality assessment was the risk of bias in recall of birth weight due to access to the gold standard (e.g. birth certificate). We categorized risk of bias as high if the subjects had access to this document at the time of the study, low if they did not have access, or if this was unclear (i.e. not reported, but possible, for example, telephone interview where parent would have had access to birth records kept at home).
Meta-Analysis
The meta-analysis was conducted with Comprehensive Meta Analysis V3.3 (Biostat, Englewood, CO, USA) using inverse variance weighting and the method of moments for random effects.Reference DerSimonian and Laird 15 This means that the impact of the sample size is proportional to its square root. The main analysis summarized the mean difference in grams between measured and recalled birth weight.
To accurately calculate the variance of the difference requires knowledge of the correlation between recalled and measured birth weight.
The first step was to produce a summary estimate of the correlation from those studies that reported it. The summary estimate was then used in the main analysis for those studies that did not report a correlation.
A preliminary fixed effects analysis revealed high levels of heterogeneity (I 2=80%); we therefore report summary effects from random effects models.
Sensitivity analyses were conducted for (1) recall bias (only including studies without recall bias); (2) time elapsed since birth (only including those >1 year); (3) parity correction (only including studies which corrected for parity); (4) studies using estimated values; (5) study sample size (omitting the two largest studies and conducting a leave-one-out analysis); (6) the estimated correlation between measures (using the values of the 95% CI in place of the summary estimate).
Subgroup analyses were conducted for (1) self v. parental recall; (2) metric v. imperial units of measurement; (3) high v. low and middle income countries. The first two were pre-specified, whereas the third was post-hoc, suggested by a reviewer. Meta-regression was used to explore further significant subgroup differences.
Results
From 962 abstracts, 147 full-text articles were assessed (Fig. 1), and 40 studies were included in the qualitative synthesis (Table 1),Reference Tate, Dezateux and Cole 6 , Reference Sou, Chen, Hsieh and Jeng 9 – Reference Jaspers, de Meer, Verhulst, Ormel and Reijneveld 14 , Reference Pyles, Stolz and Macfarlene 16 – Reference Bat-Erdene, Metcalfe, McDonald and Tough 48 with 23 samples from 19 studies included in the meta-analysis of correlation, and 29 samples from 26 studies included in the meta-analysis of mean difference.Reference Tate, Dezateux and Cole 6 , Reference Sou, Chen, Hsieh and Jeng 9 , Reference O’Sullivan, Pearce and Parker 11 – Reference Jaspers, de Meer, Verhulst, Ormel and Reijneveld 14 , Reference Pyles, Stolz and Macfarlene 16 , Reference Donoghue and Shakespeare 17 , Reference Porteous, Meskin, Proshek and Ten Bensel 19 , Reference Hoekelman, Kelly and Zimmer 20 , Reference Burns, Moll, Rost and Lauer 25 , Reference Seidman, Slater, Ever-Hadani and Gale 26 , Reference Lumey, Stein and Ravelli 28 – Reference Gaskin, Walker, Forrester and Grantham-McGregor 32 , Reference Lederman and Paxton 34 – Reference Boeke, Marin and Oliveros 45 , Reference Gayle, Yip and Frank 47 Only four non-English papers were identified, and from non-expert translation three did not appear eligible, and oneReference Victora, Barros, Martines, Béria and Vaughan 23 was included in narrative review only (Table 1).
Qualitative synthesis
In total, 40 studies were eligible for inclusion in the systematic review (Table 1). They were heterogeneous: size in the recalled group ranging from 14 to 46,637 (median 257), the year of publications ranging from 1935 to 2013; the majority from the United States (18 studies) and Europe (13 studies); birth information was mostly reported by mothers (31 samples), self (eight samples) or either parent (five samples). Two studies reported both mother and self-report.Reference Troy, Michels and Hunter 30 , Reference Lucia, Luo, Gardiner, Paneth and Breslau 41 The time to recall for parental report varied from 3 weeks to 96 years, and for self-report from 27 to 78 years. Data collection was by interview (20 studies, including three by telephone), questionnaire (17 studies) or both. Recorded data were from clinical (hospital or birth register) records (33 studies), birth certificates (four studies), or research databases collected at birth (four studies). The majority reported metric measures (g); where imperial measures were used we converted to metric (1 oz=28 g). Note one study used ‘Dutch modern pounds’=500 g.14
There were 10 samples from nine studies,Reference Tehranifar, Liao, Flom and Terry 10 , Reference Hoekelman, Kelly and Zimmer 20 , Reference Victora, Barros, Martines, Béria and Vaughan 23 – Reference Wilcox, Gold and Tuboku-Metzger 27 , Reference Lule, Webb and Ndibazza 46 , Reference Bat-Erdene, Metcalfe, McDonald and Tough 48 which did not provide data for meta-analysis. These included from 47 to 2552 mothers (median 99) (Table 1) and generally reported good agreement within birth weight categories (Table 2), with over 50% of participants reporting agreement within 25 g (1 oz) (20,23), and 70–90% agreeing within 100 g.Reference Hoekelman, Kelly and Zimmer 20 – Reference Eaton-Evans and Dugdale 24 , Reference Wilcox, Gold and Tuboku-Metzger 27 , Reference Gayle, Yip and Frank 47 , Reference Bat-Erdene, Metcalfe, McDonald and Tough 48 The majority of studies were small (n<200), with an unclear risk of bias (i.e. most studies did not report whether or not the informant had access to a recorded birth weight). Bat-erdene et al.Reference Bat-Erdene, Metcalfe, McDonald and Tough 48 (n=2552) estimated maternal recall at up to 3 months compared with electronic health records and found that 11.1% had exact recall, and 88.4% within 50 g; Victora et al.Reference Victora, Barros, Martines, Béria and Vaughan 23 (n=1800) in Brazil at 9–15 months found 60% of mothers recalled the exact weight.
The largest study by far was eligible for meta-analysis: Gayle et.al.Reference Gayle, Yip and Frank 47 (n=46,637), followed up participants in the Tennessee Women, Infants and Children Supplemental Feeding Program in the United States, and found 70.6% mothers had exact recall, and 89% within 28 g. This study included 20% preterm, and 7.4% low birth weight, but we did not exclude this study as these groups were not intentionally oversampled. The time to recall was not reported, though they reported that there was no difference in recall if child’s age was greater or <1 year. There was no access to the electronic health record. Lower accuracy was associated with infant’s low birth weight, poor birth outcome, poorer education, black race, single marital status and age <18 years. Mothers reported a 0.2 oz (6 g) lower mean birth weight compared with birth certificates.
Most studies do not report the proportion who were unable to recall birth weight: in Allen et al.Reference Allen, Ellison, Dos Santos Silva, De Stavola and Fentiman 40 this was 47% (Table 2). In summary, included studies find that almost 90% of mothers recall birth weight to within 1–2 oz (Table 2).
Meta-analysis
We included 23 samples from 19 studies (total n=7406) in the meta-analysis of correlation, and 29 samples from 26 studies (total n=72,114) in the meta-analysis of differences in birth weight (Table 1 and 2): three studiesReference Catov, Newman and Kelsey 13 , Reference Gaskin, Walker, Forrester and Grantham-McGregor 32 , Reference Lucia, Luo, Gardiner, Paneth and Breslau 41 had two sets of data which allowed separate analysis: two age groups;Reference Gaskin, Walker, Forrester and Grantham-McGregor 32 first v. subsequent births;Reference Catov, Newman and Kelsey 13 maternal v. self recallReference Lucia, Luo, Gardiner, Paneth and Breslau 41 (Table 2). Sample size ranged from 14 to 46,637, median 265.
Correlation
There was a strong correlation between recalled and recorded birth weight, estimated as 0.90 (CI 0.86–0.93) (Fig. 2). This estimate of the correlation was used in the main analysis for studies that did not report a correlation.
Differences in absolute birth weight
The absolute effect size of the difference in birth weight between recalled and recorded was very small, not statistically significant, and unlikely to be clinically important: 1.4 g (−4.0 to 6.9 g) (Fig. 3).
Sensitivity analysis
Sensitivity analyses to assess the effect of – (1) recall bias; (2) time elapsed since birth; (3) parity correction; (4) studies using estimated values – all showed little effect on the results (Supplementary Figs 1–4). Leaving out the two very large studies – Gayle (n=46,637) and Tate (n=11,890) – yielded a summary estimate of 5.82 g (−4.36, 16.00). A leaving one out analysis showed that no other study affected the summary estimate by more than 2 g (Supplementary Fig. 5). For eight studies, we used a summary estimate of the correlation. We therefore also performed sensitivity analyses in which we substituted the upper and lower 95% limits of the estimated correlation (0.93 and 0.85) for those studies that did not report one. The results (mean difference, 95% CI) are: 1.88 g (−3.64, 7.41) and 0.96 g (−4.50, 6.39), for the upper and lower limit, respectively.
Subgroup analyses
Subgroup analysis by informant and units of measurement yielded subgroup estimates that were not significantly different (Supplementary Figs 6 and 7). In contrast, the analysis by country income category revealed a striking difference. Low and middle income countries appear to overestimate birth weight by around 80 g (57,103) (Fig. 4). The income categorization explained 77% of between study variance, but unexplained variance was still moderately high (I 2=48%).
Risk of bias
Most studies were observational cohort studies of good quality with little evidence of major source of biases or confounding factors. Some studies analyzed subgroups to determine if there were subgroups with higher or lower errors. Inclusion and exclusion criteria were generally not well reported. The main source of bias was the possibility that participants were not blinded to the recorded birth weight (e.g. birth certificate), and for most studies it was unclear whether or not participants had access to such records. One excluded studyReference Troude, L’Hélias and Raison-Boulley 49 explicitly asked parents to copy results from a personal child health record. Results were essentially unchanged if we excluded studies where access to the birth weight record was possible (difference in means −0.04 g (CI −5.6–5.5 g).
Discussion
This systematic review of 40 studies (total n=78,997 births) and meta-analysis in 29 samples from 26 studies (total n=72,114) shows that recalled birth weight has excellent agreement with recorded birth weight: pooled estimate of correlation in 23 samples from 19 studies (total n=7406 births) was 0.90 (95% CI 0.86–0.93), with a small absolute difference: range from −86 to +129 g; random effects estimate 1.4 g (95% CI −4.0–6.9 g). There was no evidence for an effect of self or parental recall, age at recall or time elapsed since birth event on the validity of recalled birth weight. There was, however, evidence of higher recalled birth weight of 80 g (95% CI 57–103 g) in low or middle income countries, in post-hoc analysis.
The majority of the studies included reported high agreement, with a small (clinically insignificant) absolute difference. In studies which reported findings in categories, rather than absolute values, over 50% of participants reported agreement within 25 g (1 oz). If a 100 g error was tolerated, most studies reported agreement between 70 and 90%. Some of the differences may be due to reporting (rounding) errors: if reporting in imperial measures to the nearest ounce, the margin of reporting error could be up to 56 g (2 oz).
A strength of our study is that a systematic and comprehensive review process, devised with an experienced librarian, reported in line with PRISMA guidelines, was followed for this review. Two reviewers independently assessed eligibility of the titles, abstracts and full-text studies. We were able to conduct a meta-analysis of a significant number of studies with a large pooled sample size. Studies only including clinical populations, for example, mental or physical illnesses were excluded. We did this to ensure that our results were generalizable to the general population. Future systematic reviews can establish if the findings are similar in clinical subgroups.
However, there are some potential limitations of our study. The search terms were broad, and it is possible we have missed some potentially eligible studies. We also excluded studies that categorized births into three or less groups. The studies are heterogeneous in terms of size, countries, ethnicities, age groups, methodology (e.g. data collection methods, gold standard used), and reporting of statistical analysis. However, we performed sensitivity analyses to assess the influence of several potential influences on results, for example, imperial v. metric measurement, sample size, time since birth, first born v. subsequent birth, self v. parental recall, and found that there was no statistically significant influence on results. We also assessed the effect of the two largest studies: removing them increased the summary estimate from 1.4 g to 5.8 g, but neither of these are clinically significant. A further limitation is that the majority of studies were small, and the overall results are predominantly affected by a few large studies (in qualitative analysisReference Victora, Barros, Martines, Béria and Vaughan 23 , Reference Gayle, Yip and Frank 47 , Reference Bat-Erdene, Metcalfe, McDonald and Tough 48 , in meta-analysisReference Tate, Dezateux and Cole 6 , Reference Jaspers, de Meer, Verhulst, Ormel and Reijneveld 14 , Reference Araujo, Dutra and Hallal 42 , Reference Adegboye and Heitmann 43 ). However, the smaller studies had similar findings in qualitative review and meta-analysis.
Any validation study is limited by the data available: here, we required both the availability of a historical record, and an individual’s recall. Clinical records may not be accessible in some countries, accurate data may not be recorded particularly in home births. Recovery of recorded birth weights could be as low as 10%. Historical records require transcription from hand-written ledgers for electronic analyses. Birth certificates include birth weights in some countries (e.g. United States) but not all. Recall rates – where reported – were variable, for example, self-recall 24Reference Andersson, Niklasson and Lapidus 12 or 46%.Reference Kemp, Gunnell, Maynard, Davey Smith and Frankel 37 This may vary for several reasons, for example, by country: in Africa up to 25% could not recall birth weight;Reference Lule, Webb and Ndibazza 46 due to maternal of fetal factors such as maternal education;Reference Gayle, Yip and Frank 47 or due to neonatal complications.Reference Seidman, Slater, Ever-Hadani and Gale 26 Furthermore, there are many methods of reporting the agreement between two measures.
We report correlation and mean difference, but acknowledge that overall correlation coefficient is limited as a measure of agreement: it measures the strength of the relationship between two variables, not the agreement between them; it is unaffected by the scale of measurement (e.g. grams or kilograms); it depends of the range of the measurements; it may mask variability within subgroups, or in certain parts of the distribution.Reference Andersson, Niklasson and Lapidus 12 , Reference Bland and Altman 50 The Pearson correlation coefficient is, however, required to correctly estimate the variance of the mean difference, so we would suggest that authors of future studies include this along with other measures of agreement.
We did not assess risk of bias using formal tools: there is currently no consensus on the best method of quality assessment for observational studies. The major source of potential bias was whether the individual had access to the recorded birth weight: for example in Catov et al.Reference Catov, Newman and Kelsey 13 , the mother brought in the birth certificate at the time of interview, which was used as the record of actual birth weight. However, the results were similar in studies where there was no access to the recorded birth weight. Some studies suggest that recall may be more accurate within some ethnic, socioeconomic or clinical subgroups.Reference Tate, Dezateux and Cole 6 , Reference Andersson, Niklasson and Lapidus 12 We did not extract data relating to this, and many studies did not report these data.
Birth weight from historical records has been used in many epidemiological studies, particularly relating to the Developmental Origins of Health and Disease.Reference Kuh, Ben-Shlomo, Lynch, Hallqvist and Power 1 , Reference Baker, Olsen and Sørensen 2 It is debated whether recalled birth weight is sufficient to explore the influence of early life factors as part of life course epidemiology. However, it is still widely used, and the findings from this systematic review and meta-analysis suggest that recalled birth weight can be reliability used as an estimate of actual birth weight, where birth records are not available, for example as a risk factor for later disease.Reference Kuh, Ben-Shlomo, Lynch, Hallqvist and Power 1 , Reference Baker, Olsen and Sørensen 2 Recalled birth weight also appears valid in low birth weight and preterm births, as part of population studies, but future studies should explore whether there are different rates of recall in clinical subgroups. There is insufficient evidence to confidently extrapolate this finding to low income countries, and future studies should explore whether the reported recall of higher birth weight in low and middle income countries is replicated, and explore potential reasons for this.
Conclusion
This systematic review and meta-analysis suggests that where birth weight is recalled, it can confidently be used as a reliable estimate of actual birth weight, particularly in high income countries.
Acknowledgments
Thanks to Jillian Hosie for administrative support in sending letters to request missing data and Sheila Fisken for advice on search strategy. Thanks to Sou and Tehranifar for replying to request for additional information.
Financial Support
This work did not receive external funding. The University of Edinburgh Centre for Cognitive Aging and Cognitive Epidemiology (S.D.S.) is part of the cross council Lifelong Health and Wellbeing Initiative (G0700704/84698). Funding from the Biotechnology and Biological Sciences Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Medical Research Council; and British Heart Foundation (R.M.R.) and Tommy’s (R.M.R.) is gratefully acknowledged. G.D. is funded by the Medical Research Council (MC_UU_12017-13).
Conflicts of Interest
None.
Supplementary Material
To view supplementary material for this article, please visit https://doi.org/10.1017/S2040174416000581