School of Physics, Astronomy, and Computational Science and Department of Statistics, George Mason University
Abstract: On March 29, 2012, the Obama administration announced the Big Data Research and Development Initiative. A number of U.S. federal agencies including the National Science Foundation, the National Institutes of Health, the Department of Defense, the Department of Energy and the U.S. Geological Survey have committed substantial additional funds to Big Data projects. The White House press release described the goals of the Big Data Initiative: “to advance the state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data; harness these technologies to accelerate the pace of discovery in science and engineering; to strengthen our national security, and transform teaching and learning; and to expand the work force to needed to develop and use Big Data technologies.”
It should be noted that the scale of what is considered Big Data has been increasing steadily. Kilobytes (103), megabytes (106), gigabytes (109), and terabytes (1012) by now are familiar to any researcher using modern computer resources. The Earth Observing System of the Jet Propulsion Laboratory introduced serious consideration of petabytes (1015). Data collection systems looming on the horizon such as the Large Synoptic Survey Telescope promise data on the scale of exabytes (1018). It is conceivable that data collection methods in the future may generate data sets of the scale of zettabytes (1021) and yottabytes (1024). The issue with big data is that while computing power doubles every 18 months (Moore’s Law) and I/O bandwidth increases about 10% every year, the amount of data doubles every year. It is clear that conventional distributed systems such as those employed by Google, Facebook, and JPL (distributed active archive centers) must be expanded to include such new technologies as hadoop and new analysis methods. In this lecture, I will focus on aspects of these Big Data issues.
Jack A. Kaye
Associate Director for Research of the Earth Science Division within NASA’s Science Mission Directorate (SMD)
Abstract: The development of Earth System Science has led to the creation of one of the most data-intensive scientific disciplines that exists today, with ever-increasing amounts of data being generated by an expanding pool of providers; at the same time, the number and scope of users are growing dramatically. As one of the institutions that facilitated the development of Earth System Science with the global view provided by Earth-orbiting satellites, NASA is responsible for the regular creation of enormous amounts of space-based data as well as computationally-intensive model-generated data sets that incorporate these observations for assimilation and reanalysis and use them for simulation of future evolution of the Earth system. NASA also looks to enhance use of data from diverse sources by its scientific and applications users, so not only does it have to provide archival, distribution, and stewardship of its data, but it also has to develop and share tools that simplify access and utilization of data, as well as address the large computational demands associated with the data utilization. The success of NASA's Earth Observing System Data and Information System in meeting these challenges (along with NASA's "full and open" data policy that serves to release data to the widest possible community as soon as possible after launch) is serving as a pathfinder for the Global Change Information System of the US Global Change Research Program, as well as for the international Group on Earth Observations. In this talk, the overall approach and examples of the ways that NASA is serving the scientific and applications communities by meeting the challenge of data access will be shared.
$300 (early bird) $375 (at door)
One day only
$175 (early bird and at door)
Student with ID - $100
Chapman Faculty and Staff - $50