Machine Learning for Science




About Machine Learning at Berkeley Lab

At Berkeley Lab, computer scientists, mathematicians, and domain scientists are collaborating to turn burgeoning datasets into scientific insights through machine learning.

Machine learning is a branch of artificial intelligence that works by make inferences from raw data using sophisticated algorithms and powerful computers. For online shoppers, that means better "you might also like..." suggestions, but for scientists, machine learning tools can reveal profound insights hiding in ballooning datasets.

With five DOE national user facilities (for nanotechnology science, high-performance computing, synchrotron x-ray research, networking and genomics) world class applied mathematics, computer and computational science, and a pool of scientific talent that has produced 13 Nobel laureates, scientific machine learning has found fertile ground at Berkeley Lab.

From Growing Data Sets to Scientific Insights

Thanks to better instruments, including technologies developed at the Lab, we can see things at a microscopic and atomic scale. We can measure vibrations imperceptible to the human eye and capture high-resolution images of objects millions of light years away. But those instruments produce vastly larger datasets than ever. The Large Synoptic Survey Telescope (LSST) will produce 20 terabytes of data every night, about 60 petabytes over its lifetime. The Large Hadron Collider will have even more, with 50 petabytes in 2018 alone and 500 petabytes by 2024 (not including the 900 petabytes from past experiments). Conventional data analysis alone can't keep up.

With machine learning (ML), models are automatically derived from data. These models can be used to identify features, reduce complexity, and control experiments. But scientists need to explain their findings, so Berkeley Lab's research into machine learning builds on its foundational work in mathematics to develop methods that are are consistent with physical laws, robust in the presence of noisy or biased data, and capable of being interpreted and explained in scientifically meaningful ways.

A Nexus for Machine Learning

Using ML in over 100 different projects, Berkeley Lab scientists have tracked atomic particles, searched for better battery materials, analyzed traffic patterns, improved crop yields, pinpointed extreme weather in exascale climate simulations, and pieced together metagenomic puzzles from billions of DNA fragments. And, we're just getting started.

As a Department of Energy National Laboratory, we also develop and share the algorithms, software, tools and libraries that are foundational to scientific machine learning. We gather, organize and store huge scientific datasets in areas such as materials, energy, environment, biology, genomics, and astronomy. And we develop tools and advanced networking facilities to make these datasets more searchable and accessible using (what else?) machine learning.