Machine Learning for Science
AI-Fueled Software Reveals Accurate Protein Structure Prediction
Berkeley Lab researchers helped validate new algorithm, RosETTAFold
September 7, 2021
The dream of predicting a protein shape just from its gene sequence is now a reality thanks to an artificial intelligence (AI) algorithm recently validated by a research collaboration including Berkeley Lab’s Molecular Biophysics & Integrated Bioimaging Division. Predicting protein shapes is a long sought-after breakthrough for structural biologists because it offers a key to understanding protein functions to accelerate treatments for diseases like cancer and COVID-19. Read more>>
About Machine Learning at Berkeley Lab ⤓
Machine learning is a promising branch of artificial intelligence that Berkeley Lab scientists develop and employ in hundreds of projects every day. Our researchers track atomic particles, search for better battery materials, analyze traffic patterns, improve crop yields, pinpoint extreme weather in exascale climate simulations, and piece together metagenomic puzzles from billions of DNA fragments using tools, technology, computing and networking resources, and advanced mathematics, much of it developed by Berkeley Lab scientists. ⤓ Scroll down for more.
A Powerful Scientific Tool
Machine learning is a branch of artificial intelligence that makes inferences from raw data using sophisticated algorithms and powerful computers. For online shoppers, that means better "you might also like..." suggestions. But for scientists, machine learning tools can reveal profound insights hiding in ballooning datasets.
Thanks to better instruments, including technologies developed at Berkeley Lab, we can see things at a microscopic and atomic scale, measure vibrations imperceptible to the human eye, and capture high-resolution images of objects millions of light-years away. But those instruments produce vastly larger datasets than ever. The Large Synoptic Survey Telescope (LSST) will produce 20 terabytes of data every night, about 60 petabytes over its lifetime. The Large Hadron Collider has already produced 900 petabytes of data (50 petabytes in 2018 alone) and expects to create another 500 petabytes by 2024. Conventional data analysis alone can't keep up.
Using machine learning techniques, often tightly integrated with high-performance computing (HPC), models can be automatically derived from that data. These models can be used to identify features, reduce complexity, and control experiments.
Math, Software, Tools to Spur Innovation
Berkeley Lab's research into machine learning builds on its foundational work in mathematics to develop methods that are consistent with physical laws, robust in the presence of noisy or biased data, and capable of being interpreted and explained in scientifically meaningful ways.
As a Department of Energy National Laboratory, we develop and share the algorithms, software, tools, and libraries that are foundational to scientific machine learning. We gather, organize and store huge scientific datasets in areas such as materials, energy, environment, biology, genomics, and astronomy. Coupled with high-performance computing optimized for machine learning and the advanced networking capabilities of Berkeley Lab’s national user facilities, NERSC and ESnet, researchers at the lab and across the DOE complex are cracking tough science problems with artificial intelligence techniques.