Projects

CAMERA

Center for Advanced Mathematics for Energy Research Applications


CAMERA is an integrated, cross-disciplinary center aimed at inventing, developing, and delivering the fundamental new mathematics required to capitalize on experimental investigations at scientific facilities. Jointly funded by the Office of Advanced Scientific Computing Research (ASCR) and the Office of Basic Energy Sciences (BES) within the US Department of Energy's Office of Science, CAMERA identifi€es areas in experimental science that can be aided by new mathematical insights, develops the needed algorithmic tools, and delivers them as user-friendly software to the experimental community. Led by James Sethian.

DAPHNE


The Deep and Autonomously Performing High-Speed Networks (DAPHNE) project aims to develop reliable and robust networks with guaranteed high-throughput data transfer and uninterrupted performance for science needs while exploring smart contracts and blockchains as a means of reliable and distributed machine learning communication across distributed nodes. Supported by a DOE Early Career Award, this research couples deep learning methods with software defined networking (SDN) for predicting real-time network behavior and avoiding data traffic congestion or degraded network performance. Led by Mariam Kiran.

MetaBio IDS

The overarching objective of this interdisciplinary science project is to leverage new theory and observations in land, atmosphere and space-based research to accurately partition global carbon fluxes between terrestrial ecosystems and the atmosphere at high spatial and temporal resolution. Machine learning, in particular simple and deep neural networks and generalized additive models, have proven powerful tools by which to do so. We employ them to both diagnose biases in global land surface models, and to derive new information from time-series of carbon fluxes between ecosystems and the atmosphere provided by distributed sensing networks such as AmeriFlux. Doing so both provides novel model diagnostics and amplifies the impact and utility of DOE investments in observational platforms. Led by Trevor Keenan.

Science Search


Next-generation scientific discoveries rely on the insights we can derive from the large amounts of data that are produced through simulations and experimental and observational facilities. Today however, data is accessed and analyzed primarily by those who generate or produce the data, since it is difficult to search and find relevant data sets. The goal of Science Search is to use machine learning techniques to generate automated metadata that will enable search on a range of scientific datasets. Enabling search on data will accelerate scientific discoveries through virtual experiments, multidisciplinary and multimodal data assimilation. Led by Katie Antypas.

ExaLearn


ExaLearn is an ECP co-design center working towards exascale machine-learning software for use by ECP applications projects, other ECP co-design centers and U.S. Department of Energy (DOE) experimental facilities and computing facilities. Berkeley Lab is one of eight DOE national laboratories collaborating the R&D process which will produce a scalable and sustainable machine learning software framework that allows application scientists and the applied mathematics and computer science communities to engage in co-design for learning, the center will also collaborate with ECP PathForward vendors on the development of exascale machine-learning software.

Data Analytics for Commercial Buildings


State of the art analytics software and modeling tools can provide valuable insights into efficiency opportunities. However, prior research has shown that key barriers include relatively limited data sources (smart meters and weather being most common in commercial tools), or reliance upon user-provided inputs for which default values may be the fallback. There is great opportunity to apply techniques based on multi-stream data fusion and machine learning to overcome these challenges.Led by Jessica Granderson.

IDEAL

Image across Domains, Experiments, Algorithms and Learning

The high data-throughput of scientific instruments has made image recognition one of the most challenging problems in scientific research today. Supported by a U.S. DOE Early Career Award, IDEAL focuses on computer vision and machine learning algorithms and software to enable timely interpretation of experimental data recorded as 2D or multispectral images. Led by Daniela Ushizima.

AR1K Project

Engineering Agriculture through Machine Learning in BioEPIC


Scientists at the Department of Energy’s Lawrence Berkeley National Laboratory, working with the University of Arkansas and Glennoe Farms, are bringing together molecular biology, biogeochemistry, environmental sensing technologies, and machine learning, to help revolutionize agriculture and create sustainable farming practices that benefit both the environment and farms. If successful, we envision being able to reduce the need for chemical fertilizers and enhance soil carbon uptake, thus improving the long-term viability of the land, while at the same time increasing crop yields.Funding: Lab Directed Research and Development (LDRD) grant. Led by Ben Brown.

Feedstock to Function


Improving biobased product and fuel development through Adaptive Technoeconomic and Performance Modeling. The purpose of this project is to develop a comprehensive ‘Feedstock to Function’ tool (F2FT) that harnesses the power of machine learning to predict properties of high-potential molecules (fuels, fuel co-products, and other bioproducts) derived from biomass and to evaluate the cost, benefits, and risk of promising biobased molecules or biofuels to enable faster, less expensive bioprocess optimization and scale-up. Led by Vi Rapp.

CIGAR

Cybersecurity via Inverter-Grid Automatic Reconfiguration


This project is performing R&D to enable distribution grids to adapt to resist a cyber-attack by (1) developing adaptive control algorithms for DER, voltage regulation, and protection systems; (2) analyze new attack scenarios and develop associated defensive strategies. Funding: DOE CESER's CEDS program. Led by Sean Peisert and Dan Arnold.

Deep Learning for Science


The DL4SCI LDRD is examining three key CS challenges: handling complex datasets, developing interpretable methods, and improving performance and scaling. This work is being motivated by realistic problems that span a number of Berkeley Lab divisions and science areas: predicting cosmological constants from 3D simulations (cosmology), obtaining sub-pixel accuracy for electron counts (electron microscopy), classification of one vs. two-photon particle showers (nuclear physics).Funding: Lab Directed Research and Development (LDRD) grant. Led by Prabhat.

Route Choice Behavior at Urban Scale


Knowing how individuals move between places is fundamental to advance our understanding of human mobility at urban scale, plan infrastructures, and the development of transportation systems. Current route-choice models that are used in transportation planning are based on the widely accepted assumption that people follow the minimum cost path, despite empirical support that contradicts that, it has been the best solution for lack of better sources of information. Today, individual traces from location based services collected by smart devices give us an unprecedented opportunity to learn how citizens organize their travel plans into a set of routes, and how similar behavior patterns emerge among distinct individual choices. Led by Marta Gonzalez.

Machine Learning to Predict Crop Performance Under U.S. Climate Scenarios


The goal of this project is to understand both long-term trends and year-to-year stability in energy crop yields under future climate conditions and ultimately how different engineered traits may translate into different responses to abiotic stressors related to climate change. Researchers have evaluated a number of supervised ML algorithms to determine which are most effective at using historical sorghum yield data to predict energy sorghum yield based on different climatological, soil, and other geographical data. They have developed a model that uses CMIP5 climate data (ensemble model result) to predict sorghum yields across the U.S. through 2099. The project is funded as part of JBEI’s Life-Cycle, Economics & Agronomy Division (LEAD) as a collaboration between Berkeley Lab and Argonne National Lab Research is ongoing. Led by Corinne Scown.

Occupancy-Responsive Model Predictive Control at Room, Building, and District Levels


The project develops and field tests an open source computational framework that implements MPC at three scales: room, building and district (a group of buildings) to optimize building operation and thus reduce energy use and improve occupant comfort. An accurate prediction of internal heat gains and occupants’ thermal demands is the prerequisite for the development and implementation of MPC. Machine learning techniques are used to: (1) infer occupant count from WiFi data; (2) recognize electricity consumption patterns; and (3) predict occupancy, plug-load and internal heat gains. Led by Tianzhen Hong.

Machine Learning to Extract Features from Massive Distributed Acoustic Sensing Data


Machine learning has transformed the time-consuming task of developing custom analysis tools into a feasible computational task. As the operator of some of the largest high-performance computing facilities, DOE could significantly improve these machine learning tools and accelerate scientific discoveries. One of the key challenges in this process is the interpretability of the results from the automated learning process. For example, deep neural networks are known to be effective in extracting signals, but their results are notoriously hard to understand. In this work, we plan to extend key ideas from statistical mechanics to improve understanding and guide the design of well-known machine learning algorithms. The new approaches will not only produce more interpretable results but also dramatically increase the convergence rate of the associated learning algorithms. The design of these tools will be guided by the requirements of the on-going Distributed Acoustic Sensing project at Berkeley Lab. Funding: Lab Directed Research and Development (LDRD) grant. Led by John Wu.

Joint Social Sequence Analysis to Predict Travel Behavior


The analysis of categorical and longitudinal time series, called sequences, has great value for social science applications to study the span of life trajectories, careers, decision points, and family structure. The goal of this project is to investigate life-long lifecycle trajectory dynamics based on demographic characteristics, education and other lifestyle variables. By analyzing entire lifelong sequences, it is possible to discover representative patterns from the overall life trajectory of a given individual’s characteristics and the pathway through which one arrives at a given state, travel decision or mobility behavior. In contrast to traditional big-data approaches that aim to solve the "largeness" of numerical data, we aim to tackle the "largeness" arises from data types, dimensionality, and heterogeneity in data quality that are common in categorical social sequences. This work has resulted in one IEEE conference publication and one journal article published by Association for Computing Machinery (ACM). Led by Ling Jin , Anna Spurlock, and Annika Todd.

The Chemical Universe through the Eyes of Generative Adversarial Neural Networks


This project is developing generative machine learning models that can discover new scientific knowledge about molecular interactions and structure-function relationships in chemical sciences. The aim is to create a deep learning network that can predict properties from structural information but can also tackle the “inverse problem,” that is deducing structural information from properties. To demonstrate the power of the neural network, we focus on bond breaking in mass-spectrometry, combining experimental data with HPC computational chemistry data. Funding: Lab Directed Research and Development (LDRD) grant. Led by Wibe Albert de Jong

Interactive Machine Learning for Tomogram Segmentation and Annotation


Funding: Lab Directed Research and Development (LDRD) grant. Led by Nicholas K. Sauter.

Combining Data-driven and Science-based Generative Models

This project investigates the many connections between data-driven and science-driven generative models. When do scientists use physical models to create synthetic data for science applications? When do we supplement them with data driven machine learning models? Conversely, can researchers use physical models to improve on the current data-driven generative models in machine learning Funding: Directed Research and Development (LDRD) grant. Led by Uros Seljak.