Machine Learning for Science

From Data to Scientific Insights

At Berkeley Lab, computer scientists, mathematicians, and domain scientists from across the Lab are collaborating to turn burgeoning datasets into scientific insights through machine learning.

Machine learning methods make inferences from raw data using sophisticated algorithms and powerful computers. For online shoppers, that means better "you might also like..." suggestions, but for scientists, machine learning tools can reveal profound insights hiding in ballooning datasets.

Thanks to better instruments, including technologies developed at the Lab, we can see things at a microscopic and atomic scale. We can measure vibrations imperceptible to the human eye and capture high-resolution images of objects millions of light years away. But those instruments produce vastly larger datasets than ever. The Large Synoptic Survey Telescope (LSST) will produce 20 terabytes of data every night, about 60 petabytes over its lifetime. The Large Hadron Collider will have even more, with 50 petabytes in 2018 alone and 500 petabytes by 2024 (not including the 900 petabytes from past experiments). Conventional data analysis alone can't keep up.

With machine learning (ML), models are automatically derived from data. These models can be used to identify features, reduce complexity, and control experiments. But scientists need to explain their findings, so Berkeley Lab's research into machine learning builds on its foundational work in mathematics to develop methods that are are consistent with physical laws, robust in the presence of noisy or biased data, and capable of being interpreted and explained in scientifically meaningful ways.

Using ML in over 100 different projects, Berkeley Lab scientists have tracked atomic particles, searched for better battery materials, analyzed traffic patterns, improved crop yields, pinpointed extreme weather in exascale climate simulations, and pieced together metagenomic puzzles from billions of DNA fragments. And, we're just getting started.

Berkeley Lab: A Nexus for Machine Learning

With five DOE national user facilities (for nanotechnology science, high-performance computing, synchrotron x-ray research, networking and genomics) world class applied mathematics, computer and computational science, and a pool of scientific talent that has produced 13 Nobel laureates, scientific machine learning has found fertile ground at Berkeley Lab.

As a Department of Energy National Laboratory, we also develop and share the algorithms, software, tools and libraries that are foundational to scientific machine learning. We gather, organize and store huge scientific datasets in areas such as materials, energy, environment, biology, genomics, and astronomy. And we develop tools and advanced networking facilities to make these datasets more searchable and accessible using (what else?) machine learning.


Twinkle, Twinkle Little Star, Code Will Tell Us What You Are

Berkeley Lab researcher wins machine-learning competition for upcoming survey telescope data

January 25, 2019

Simulations on the Cheap

ExaLearn project's machine learning will interpolate details from sparse data. First use: Cosmology.

March 10, 2109

ClimateNet: Learning on a Global Scale

Bringing ML tools to climate and weather science

February 25, 2019