The ability to store, process, merge, analyze, and share the increasingly large volumes of data produced by experiments and simulations is an ongoing challenge for materials research, even as the wealth of data available represents exciting new possibilities for materials design. Six US federal departments and agencies recently announced more than $200 million in new research and development efforts related to managing and evaluating large data sets, as part of a new initiative led by the Obama Administration’s Office of Science and Technology Policy (OSTP). These new efforts have the potential to significantly accelerate the pace of materials discovery and development.
In March, OSTP Director John Holdren introduced the Big Data Research and Development Initiative. The initiative aims to advance the core technologies required to contend with huge volumes and varieties of data in order to accelerate discovery in science and engineering, strengthen national security, and transform teaching and learning. The $200 million in new research and development (R&D) includes projects within the National Science Foundation (NSF), Department of Energy (DOE), National Institutes of Health (NIH), Department of Defense (DOD), Defense Advanced Research Projects Agency (DARPA), and the US Geological Survey (USGS).
Over the last decade new technologies and tools have led to a vast increase in the amount of materials-related data that are produced even in small, individual experiments, with exciting potential. However, managing those data and using them to make scientific inferences remains challenging for the materials community, as well as the larger scientific community. “We are generating all of this data, but data is not the same thing as information, knowledge, and better decision making,” said Tom Kalil, Deputy Director for Policy at OSTP.
One of the main data challenges facing the materials research community is being able to combine diverse data streams, said Ian Robertson, Division Director for Materials Research in the Mathematical and Physical Sciences Directorate of NSF. The materials community produces data from a variety of different sources and techniques and over different time and length scales. Each of these data streams may shed some light on the structure of a material, for example, but combining multiple streams could lead to a much clearer understanding of the big picture. “We need to learn how to put all of those data pieces together in order to tackle the larger problem,” said Robertson.
Other data challenges facing the materials community include storing and visualizing large data sets and creating a collaborative culture where data sharing between research groups is encouraged, said Robertson. The materials research community could learn a lot about meeting these challenges from fields like physics, biology, and astronomy, which are further ahead in these areas, he said. In addition, these are areas being addressed in the Big Data Initiative. By investing in the development of tools and techniques to better manage and manipulate data, the initiative will free up researchers to focus more on the analysis and interpretation of the data they collect.
The efforts announced in the Big Data Initiative take many different forms, but one of the main goals of the initiative is to focus on data management tools and techniques that can be used broadly. For example, DOE is providing $25 million to establish the Scalable Data Management, Analysis, and Visualization Institute, which will develop management and visualization tools for use with the Department’s supercomputers. NSF is funding a $10 million “Expeditions in Computing” project based at the University of California–Berkeley, that aims to integrate machine learning, cloud computing, and crowd sourcing to create more powerful techniques for turning data into information.
The Big Data Initiative is a response to a 2010 report by the President’s Council of Advisors on Science and Technology (PCAST), which concluded that the federal government is under-investing in networking and information technology (NIT) R&D. The study found that previous federal investments in these areas have paid off tremendously in terms of economic competiveness, national security, and quality of life, but that changes in the NIT landscape require increased federal investment.
“The investments that our Nation has made in NIT R&D are among the best investments that our Nation has made,” reads the report. But it cautions, “the NIT research landscape is changing rapidly and dramatically. . . . These changes will require additional resources—some combination of new funds and redirected existing funds—along with additional attention by multiple Federal agencies.”
Federal departments and agencies, along with businesses, universities, and professional societies, are also addressing materials-related data challenges as part of another OSTP initiative, the Materials Genome Initiative. The Materials Genome Initiative was announced in June 2011 and aims to cut in half the time it takes to discover, develop, and manufacture new materials, and to dramatically reduce the associated cost. This initiative has led to new investments, for example, in computational tools, simulation software, materials databases, and standards for handling digital data.
In addition to participating in the Materials Genome Initiative, OSTP is also encouraging industry, universities, and nonprofits to participate in the Big Data Initiative and help develop the tools needed to take full advantage of the large amount of data researchers are now able to capture. For more information on the Big Data and Materials Genome Initiatives, visit the OSTP website at www.ostp.gov.