There is an inextricable and inexorable link between us and the environment. Our physiological responses and adaptations, most especially changes in body water, water-seeking behaviour and appetite, are profoundly influenced by ambient conditions. Yet, despite their critical importance, ambient conditions are rarely accounted for in nutritional studies. At the same time, many statistical models are published with low goodness-of-fit values(Reference Lemetais, Melander and Vecchio1–Reference McKenzie, Perrier and Guelinckx3), if reported at all(Reference Kant, Graubard and Atchison4–Reference Rosinger, Lawman and Akinbami10). Low goodness-of-fit, corresponding to a large proportion of unexplained variance, leads to models with reduced inferential support. This is a classic quandary in statistics: a potentially misleading result from an under-specified model. However, it is not necessarily so in nutritional science, as we can and ought to include environmental data in analyses pertaining to hydration and nutrition.
Properly, many laboratory- and some field-based investigations use a range of strategies to control for environmental factors(Reference McKenzie, Perrier and Guelinckx3,Reference Armstrong, Johnson and Munoz11–Reference Figaro and Mack15) . These practices likely stem from documented impacts of environmental factors on water intake behaviours and body water balance such as, but not limited to, ambient temperature, hours of sunlight exposure and relative humidity. For example, residents in the southern USA exhibited a 20 % increase in water intake during the summer v. winter months(Reference Heller, Sohn and Burt16); residents of several European countries exhibited increased urine and plasma osmolality with high environmental temperature exposure despite reduced physical activity patterns(Reference Mora-Rodriguez, Ortega and Fernandez-Elias17); and Greek residents were more likely to be in the lowest and highest water balance categories, suggesting some do not effectively compensate or might overcompensate for total water losses(Reference Malisova, Bountziouka and Panagiotakos8). Optimal water balance might also be threatened by sunlight exposure(Reference Carter, Muller and Roberts18) and high relative humidity(Reference McCutcheon, Geor and Hare19,Reference Maughan, Otani and Watson20) . Beyond hydration, benefits of environmental data inclusion extend to nutritional and other areas of inquiry such as that related to vitamin D and bone health(Reference Holick21–Reference Cashman23), appetite and dietary intake(Reference Cashman23,Reference Glerup, Mikkelsen and Poulsen24) , skin cancer(Reference Lefkowitz and Garland25,Reference Brash, Rudolph and Simon26) and more(Reference Wacker and Holick27–Reference Magin29). Also, there is a real and important trade-off between controlled design v. study generalizability, to say nothing of the impracticality of controlling environmental variables. Furthermore, there is risk associated with poor control: failure to recognize the imperfections of a control creates opportunity for unrecognized sources of variance to contaminate the data set. Thus, in many cases, a controlled design is not tenable. Yet in the majority of cases where a control is not a clear value-add, an environmental covariate likely is.
For investigators engaging in de novo data collection, there is an increasing number of options for environmental data acquisition via wearable sensors(Reference Chung, Na and Lee30). Where wearable sensors are possible, the resultant data can be impressively high-resolution and therefore highly informative(Reference Wininger and Pidcoe31). Where sensor-based data collection is not feasible, it may be possible to leverage web-available climate archives for information pertaining to weather at a given location at a given time. As supplement to the present commentary, we provide a strategy for implementing such a data fusion approach. Our specific goal in the commentary is to not only call attention to the need for investigators to control or adjust for environmental factors in their analyses, but to provide a tool to enable this. Below, we give overview to our approach, which is a computer code that accesses a large, web-based, weather archive to merge selected environmental data with an empirical data set as might be created in a hydration or nutritional study.
The code piece, provided in Supplemental File S1 and described in Supplemental Files S2–S4 (see online supplementary material), is written in the R programming language. R is a freely available software and this code segment is completely self-contained, with only two requisite steps for operation: (i) installing the required R packages listed just above the User Input section; and (ii) modification of the input data (in the User Input), which are Participant Identifier, Latitude and Longitude of each participant, and Date of observation. The package install is a one-time process; the participant information can be changed ad hoc. With these modest start-up operations completed, the code can be executed immediately, needing modification only to add or subtract parameters, or to customize to access a different database. The output of this code is a display of a fully merged data set with temperature, dew point, precipitation (daily representative values), distance between participant and the accessed weather station, altitude and hours of daylight (Supplemental File S4). For a basic tutorial on R operation, including detailed background on all operations written into this code, please refer to a primer published recently by one of us(Reference Prokop and Wininger32).
Our choice of database was mostly in the interest of usefulness: the National Climate Data Center is a well-organized repository of millions of weather records, with substantial quality control, and has been utilized in many scientific applications. Whereas the National Climate Data Center participates in the World Meteorological Organization’s consortium of data-sharing entities, this particular data set furnishes records from more than 25 000 stations in 249 countries and allows facile reference by latitude–longitude. These data are accessible for non-commercial use across the globe, although many countries warehouse their own copies of the World Meteorological Organization data set, so it is not strictly necessary to access these data from a server based in the USA. For that matter, this code can be altered to point to any data set, including non-World Meteorological Organization resources, with appropriate modifications to accommodate differences in data file structure. Run times will vary depending on many factors, but in test runs using a Windows 7 32-Bit PC with Intel Core i5-2400 CPU @ 3.10GHz, 3GB RAM, web retrieval consistently averaged 1.3–1.6 s per page, i.e. approximately 40–45 pages per min.
Data resolution will vary between databases and within locale: some regions have a high density of weather stations reporting atmospheric conditions with high frequency; in other cases, it may only be possible to find a distant weather station with intermittent reporting. Furthermore, there are many reasons why a given individual experiences exposures that are not exactly captured by a weather repository: the individual might be travelling beyond his/her specified coordinates or he/she may have spent all day indoors, and it may not be known if the individual was engaging in vigorous exercise, etc. We note that while we are publishing a code in R script, there is no restriction on which specific coding approach can be used. Every programming language known to the authors allows web-based retrieval. We selected R on the basis of its popularity, accessibility, and widespread documentation and support. Lastly, we will note that it is not strictly necessary to use an automated script to accomplish the goal of adding environmental data to an empirical data set: for small data sets, this can be done manually; however, for data sets of sufficiently large size, this becomes a strenuous task, with increasing opportunity for human error.
The interplay between heat stress and water- and nutrient balance is a growing area of focus(Reference Malisova, Bountziouka and Panagiotakos8,Reference Westerterp, Plasqui and Goris14,Reference Glerup, Mikkelsen and Poulsen24) . Other environmental variables, for example humidity and length of day, seem likely to add further value as covariates in statistical models related to intake of food and water. Seasonality corrections are common in laboratory studies(Reference McKenzie, Perrier and Guelinckx3,Reference Armstrong, Johnson and Munoz11–Reference Figaro and Mack15) but are not nearly as ubiquitous as they ought to be. Given that we are already regularly publishing and building on hydration models with poor goodness-of-fit, it is curious that we are not merging data related to ambient conditions de rigueur. Any additional explained variance can only be helpful.
Acknowledgements
Financial support: The work was supported by the College of Education, Nursing, and Health Professions of the University of Hartford, which had no additional role. Conflict of interest: The authors have no conflicts of interest to report. Authorship: C.X.M.: design and writing, editing and approval of the manuscript. M.W.: design, code creation and proofing, writing, editing and approval of the manuscript. Ethics of human subject participation: Not applicable, this work does not involve human participants.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1368980019003343