Datasets & Tools

Datasets

As a Department of Energy National Laboratory, Berkeley Lab hosts many publicly available scientific datasets. This list features a selection of datasets of potential interest for machine learning applications.

AmeriFlux is a network of PI-managed sites measuring ecosystem CO2, water, and energy fluxes in North, Central and South America. It was established to connect research on field sites representing major climate and ecological biomes, including tundra, grasslands, savanna, crops, and conifer, deciduous, and tropical forests. As a grassroots, investigator-driven network, the AmeriFlux community has tailored instrumentation to suit each unique ecosystem. This “coalition of the willing” is diverse in its interests, use of technologies and collaborative approaches. As a result, the AmeriFlux Network continually breaks new ground. Contact: ameriflux-support@lbl.gov

Ambient environmental radiological (gamma-ray and neutron) data alongside a suite of contextual sensors (video, lidar, hyperspectral). Contact: bjquiter@lbl.gov

Fruitfly functional genomics data repository. Contacts: BPBowen@lbl.gov, ORuebel@lbl.gov

The U.S. Department of Energy’s (DOE) Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) is a new data archive for Earth and environmental science data. ESS-DIVE is funded by the Data Management program within the Climate and Environmental Science Division under the DOE’s Office of Biological and Environmental Research program (BER), and is maintained by the Lawrence Berkeley National Laboratory. ESS-DIVE will archive and publicly share data obtained from observational, experimental, and modeling research that is funded by the DOE’s Office of Science under its Subsurface Biogeochemical Research (SBR) and Terrestrial Ecosystem Science (TES) programs within the Environmental Systems Science (ESS) activity. Contact: ess-dive-support@lbl.gov

Flow is a traffic control benchmarking framework that provides a suite of traffic control scenarios (benchmarks), tools for designing custom traffic scenarios, and integration with deep reinforcement learning and traffic microsimulation libraries. Flow software and datasets are open-source for public use under the MIT license. Contact: flow.berkeley@gmail.com

Today, eddy covariance measurements of carbon dioxide and water vapor exchange are being made routinely on all continents. The flux measurement sites are linked across a confederation of regional networks in North, Central and South America, Europe, Asia, Africa, and Australia, in a global network, called FLUXNET. This global network includes more than eight hundred active and historic flux measurement sites, dispersed across most of the world’s climate space and representative biomes. This global FLUXNET dataset was built using data through 2015. Contact: fluxdata-support@lbl.gov

Today, eddy covariance measurements of carbon dioxide and water vapor exchange are being made routinely on all continents. The flux measurement sites are linked across a confederation of regional networks in North, Central and South America, Europe, Asia, Africa, and Australia, in a global network, called FLUXNET. This global network includes more than eight hundred active and historic flux measurement sites, dispersed across most of the world’s climate space and representative biomes. This global FLUXNET dataset was built using data through 2006. Contact: fluxdata-support@lbl.gov

By computing properties of all known materials, the Materials Project aims to remove guesswork from materials design in a variety of applications. Experimental research can be targeted to the most promising compounds from computational data sets. Researchers will be able to data-mine scientific trends in materials properties. By providing materials researchers with the information they need to design better, the Materials Project aims to accelerate innovation in materials research. Contact: feedback@materialsproject.org

Mass spectrometry imaging (MSI) is widely applied to image complex samples for applications spanning health, microbial ecology, and high throughput screening of high-density arrays. MSI has emerged as a technique suited to resolving metabolism within complex cellular systems; where understanding the spatial variation of metabolism is vital for making a transformative impact on science. OpenMSI provides a web-based gateway for management and storage of MSI data, the visualization of the hyper-dimensional contents of the data, and the statistical analysis. Contacts: BPBowen@lbl.gov, ORuebel@lbl.gov

A number of features of our power distribution grid make it particularly vulnerable to cyber attacks. By installing micro phasor measurement units (µPMUs) in key locations in the electric distribution system and evaluating the data from them, we aim to design and implement a measurement network that can detect and report the resultant impact of cyber security attacks. The data collected by these units supports a variety of projects to determine whether refined measurement of voltage phase angles can enable advanced diagnostic, monitoring, and control methodologies in distribution systems, and to begin developing algorithms for diagnostic applications based on µPMU data. Contact: sppeisert@lbl.gov

Tools, use cases and guides for getting started with machine learning.