Supercomputing Rises to Grand Scientific Challenges

Advancing both scientific knowledge and supercomputing capabilities, the Laboratory’s Computing Grand Challenge Program turns 20 this year.

Supercomputer racks in different rooms.
Through Livermore’s Grand Challenge Program, researchers are awarded time allocations on Livermore’s unclassified systems, which are ranked among the world’s top supercomputers.

From workhorse commodity clusters to the world’s fastest supercomputer, high-performance computing (HPC) investments drive research and programmatic work throughout Lawrence Livermore. The Laboratory delivers 3 exaFLOPS of computing power to a range of mission-driven efforts including stockpile modernization, nuclear counterproliferation, infrastructure resilience, cybersecurity, and more. (An exaFLOP is 1 quintillion or 1018 floating-point operations per second.)

Most of Livermore’s HPC resources are dedicated to stockpile stewardship through the Advanced Simulation and Computing Program, though a portion is carved out for broader pursuits under the Multiprogrammatic and Institutional Computing (M&IC) Program. The two programs are complementary, with M&IC managing institutional allocations on unclassified systems.

Demand for these allocations is high, and each research team’s requirements are different. M&IC Director Becky Springmeyer points out, “Some researchers need more memory or scalability, while others need a consistent allocation of computing resources to move forward.” M&IC distributes allocations to Livermore researchers via avenues such as the Laboratory Directed Research and Development Program and the Computing Grand Challenge Program.

For 20 years, the Grand Challenge Program has awarded unclassified computing resources to teams with high-impact projects. (See S&TR, July/August 2011, A “Grand” Way to Visualize Science; and September 2015, Science on a Grand Scale.) This unique program is a peer-reviewed competition in which researchers submit proposals for time allocations, detailing how HPC will help achieve project goals. Many awardees have garnered external visibility, and with nearly 200 publications and invited talks since the program’s inception, all have contributed to scientific literature. Fred Streitz, the Grand Challenge Program’s first director, states, “The program highlights M&IC investments, and we’ve been able to increase the award allocations over time. Proposals have touched every corner of the Laboratory including engineering, hydrodynamics, cosmology, nuclear physics, quantum chemistry, and more.”

Unparalleled Computing

One reason the Grand Challenge has endured for two decades is the program’s unique qualities. Although other Department of Energy (DOE) laboratories provide nationwide HPC access for open science projects, none funnels computing resources into a program such as the Grand Challenge. Judy Hill, who assumed the program’s directorship in 2024, states, “This competition is dedicated to projects led by Livermore researchers, who can investigate problems they might not otherwise have the computing resources to tackle. Similar programs at other HPC facilities are typically not focused in this way. The investment on behalf of our own research community is commendable.”

Additionally, the Grand Challenge offers researchers an education—which is never truly complete—in HPC. Streitz compares computing at regular and extreme scales to driving a car versus flying a jet, noting, “In addition to the program enabling incredible science, it continues to develop user expertise on these state-of-the-art machines. This program is a training ground for teams to use new hardware and think on a grand scale.” The allocations are only awarded for a 12-month period, so teams must be prepared to take immediate advantage of their assigned supercomputer. In 2025, Grand Challenge awardees are using four unclassified systems: Tuolumne, Lassen, Dane, and Ruby, which was recently retired. Teams can request time on the system or systems that best match their needs.

The program also boosts the Laboratory’s smaller-scale research activities. Through the Grand Challenge, M&IC Deputy Director Greg Tomaschke notes, “Areas that don’t receive the same level of funding as Strategic Deterrence or the National Ignition Facility can still access the necessary computing resources to make significant breakthroughs in their fields.”

Indeed, the Grand Challenge encourages researchers to think outside the proverbial box. Erik Draeger, who led the program from 2019 to 2024, states, “Many projects can enable discovery in areas outside of their original scope. Accessing substantial computing resources for interesting science campaigns provides an intellectual outlet that makes the Laboratory an exciting and dynamic place to work.”

Tough Competition

Not every team that applies for the Grand Challenge is chosen. The proposal process kicks off in late summer, with stringent reviews throughout the fall. The review committee consists of the program director and 20 to 30 Livermore staff from all scientific and engineering domains. Many committee members serve for several years, and reviewers from academia and other national laboratories are tapped for additional perspectives. “Partnerships are essential to our success in HPC, and the quality of the Grand Challenge is buoyed by having outside reviewers in the mix,” says Springmeyer.

The committee evaluates proposals based on multiple factors: significance and impact of science, significance and impact of computational approach, quality of HPC research plan, quality and extent of external collaborations, alignment with the Laboratory’s strategic science and technology vision, and past Grand Challenge performance. “Successful proposals need an interesting application capable of using HPC resources in a nontrivial way, such as parallel strong scaling or an innovative workflow,” explains Draeger. “The project should enhance the Laboratory’s visibility while making a meaningful contribution to science.” (See the box below.) 

Computer simulation of a human arterial system.
Leveraging Grand Challenge allocations, Livermore’s collaboration with Duke University generated this simulation of full-body arterial geometry. (Image by Liam Krauss.)

Collaboration Is Key

A common thread among Grand Challenge projects is collaboration, especially with academia where professors and students can participate in the Laboratory’s unclassified work and gain experience with top supercomputers. “Students have seemingly endless energy and time, and their contributions are incredibly valuable,” says awardee Erik Draeger.

Draeger has collaborated with Duke University on a series of Grand Challenge projects related to understanding human physiology and improving patient outcomes. The relationship stems from Draeger’s early mentorship of Amanda Randles, a former Lawrence Fellow who is now associate professor of Biomedical Sciences at Duke. Thanks in part to Grand Challenge resources, Randles’s students have written dissertations and published papers while learning about a national laboratory.

The benefits flow in the other direction, too. Draeger states, “Through this collaboration, we can showcase the Laboratory’s unique resources in a way that easily captures the public imagination. We have a window into how the next generation of researchers sees HPC (high-performance computing) and engages with new technologies like AI and cloud computing. As Livermore envisions the HPC center of the future, this insight will make sure we’re well-aligned with the newest ideas and capabilities.”

The Livermore–Duke collaboration originally focused on large-scale arterial simulations, which required careful system load balancing and memory management to achieve exceptional resolution. This work evolved into red blood cell modeling to explore cancer metastasis and other disorders, then led to fluid–structure interactions that required sophisticated multiphysics algorithms to minimize time to solution. Now, Draeger’s team is combining wearable data, such as from fitness trackers, with predictive simulations to monitor patients’ circulatory health. The goal is for AI models to build patient-specific hemodynamic simulations that help determine risk and detect disease.


 

Bar chart of computing resource allocations over a nine-year period.
The amount of computing resources allocated to Grand Challenge projects has generally grown since the program’s inception, with the last 10 years’ totals shown here in teraFLOPS (1012 floating-point operations per second). The 2025 introduction of the Tuolumne supercomputer for unclassified research dramatically increased the total.

Applicants must describe their plans for two allocation tiers. The committee chooses which proposals are likely to have the highest visibility and impact, and these Tier 1 awardees receive 100,000 to 200,000 node hours. Tier 2 projects are given 25,000 to 50,000 node hours. This year, the committee selected five Tier 1 projects and 17 Tier 2 projects, awarding more than 5 million node hours. (In distributed computing environments, nodes are hardware devices containing multiple kinds of processors, and a node hour measures resource usage. For example, 2 node hours can mean computation on 1 node for 2 hours or 2 nodes for 1 hour.) Grand Challenge teams are invited to present their work to the rest of the Laboratory in quarterly seminars.

The program’s proposal rules and selection methods have changed over the years. Streitz instituted a formal process and schedule along with the tiers. Draeger refined the latter by requiring proposals to assume Tier 2 allocations while also making the case for Tier 1, which helped level the playing field. The committee has also adjusted the scoring criteria to more carefully consider a project’s likelihood of success and level of impact.

Computing (R)evolution

As HPC technology has evolved, so too has the Grand Challenge. Over the program’s lifetime, the computing power of Livermore’s unclassified HPC resources has increased 10,000-fold. Processing speed that used to be expressed in gigaFLOPS (109) is now clocked in petaFLOPS (1015).

When Streitz joined the Laboratory 25 years ago, the Blue Gene/L system was the pinnacle of computing at 136 teraFLOPS. “A Grand Challenge allocation on Tuolumne is the equivalent of multiple simulation runs on the full Blue Gene/L machine. Those runs can be performed easily on a fraction of today’s machines,” he says. “We allocate monumental levels of computing resources compared to even 10 years ago.”

In just the past decade, Grand Challenge awardees transitioned from terascale systems like Catalyst to petascale systems like Lassen, which was the world’s 11th fastest computer when it came online in 2018. This year’s awardees are learning how to use Tuolumne, currently the 12th fastest computer in the world and 13 times more powerful than Lassen. As researchers begin using Tuolumne, Hill says, “We expect they will be able to solve problems at higher resolution, with higher fidelity models, and faster in terms of time to solution.” Tomaschke, who manages the allocations, adds, “We can’t expect a Tuolumne-level increase every year, but Livermore’s long-standing support for M&IC and the Grand Challenge Program will ensure that researchers continue to have access to world-class computing resources.”

Scale is not the only moving target. Recent years have seen HPC shift to a heterogeneous paradigm, wherein many computing architectures include central (CPUs) and graphics processing units (GPUs). The latest generation of Livermore’s supercomputers introduced the accelerated processing unit (APU), which combines CPUs and GPUs for even faster processing. Cloud computing and AI-optimized hardware have also entered mainstream HPC configurations. Now, instead of pitching their projects in terms of CPU and GPU hours, Grand Challenge applicants calculate their requests according to node hours. (See the box below.)

Grand Challenge teams epitomize the Laboratory’s multidisciplinary culture as domain scientists and HPC experts traverse this learning curve together. The outcome is the point: Grand Challenge projects advance both scientific knowledge and HPC possibilities.

Measuring Physics Frontiers

Theoretical particle physics is a mainstay of Livermore’s fundamental science research. Pavlos Vranas’s team pursues three lines of inquiry in this field: quantum chromodynamics (QCD), the study of interactions between subnuclear particles, such as quarks and gluons; the Higgs boson, an elementary particle discovered at the Large Hadron Collider in 2012; and composite dark matter theory, one of the enduring mysteries of the universe. (For related work, see S&TR, April/May 2014, Nuclear Fusion through a Computational Microscope; July 2020, Tuning into Dark Matter; and the article Casting a Light on Dark Matter in this issue.) The only feasible way to examine these theories is to discretize space–time on a mesh and calculate particle interactions with large-scale numerical simulations. For a decade, the team has leaned on Grand Challenge allocations to compute what cannot be done with pencil and paper.

In collaboration with universities, national laboratories, and industry partners, the team’s 2024 project focused on stealth dark matter (SDM), a Livermore-developed theory that dark matter is a composite substance. SDM follows the same pattern as QCD. Just as protons and neutrons are composed of smaller quarks and gluons, dark matter particles may be made up of similar constituents. “This idea is intuitive because the density of dark matter in the universe is very similar to that of visible matter (QCD). This insight may be a hint that visible and dark matter have similar origin and properties,” says Vranas. “If we’d applied for computing time outside the Laboratory, reviewers may have rejected it in favor of more incremental science research. The Grand Challenge is willing to do the big science of the unknown.”

Grid with multicolored lattices with arrows connecting parts of the grid to GPUs and CPUs.
A particle physics team used its Grand Challenge allocation on the petascale Lassen supercomputer for studies of quantum chromodynamics and dark matter. Capturing the necessary physics requires huge lattices with millions of points—shown here with dots indicating locations of matter in space–time and waves indicating forces acting on the matter. Each lattice is divided into sublattices (left) that are then mapped to graphics processing units (GPUs, right) for parallel processing. Lassen’s architecture connects four GPUs to two central processing units (CPUs) within a node.

The team further developed the related theory of hyper stealth dark matter (HSDM) consistent with known observations. The Big Bang produced gravitational waves that continue to reveal information about the early universe. HSDM predicts that the thermodynamic transition—from plasma to cold dark matter—could be responsible for some of these waves. The team used lattice QCD codes to simulate this phase transition and compute the range of dark matter masses that make it possible. HSDM has not yet been validated with experiments but could be, thanks to the team’s Grand Challenge results. Observatories can use the data to look for the phase transition at certain frequencies.

With a Tier 1 allocation on Lassen, the team applied lattice gauge theory methods to calculate 4D lattices. Matter particles are defined on a lattice’s vertices, while force particles are defined on the connections between vertices. Capturing all the necessary physics requires huge lattices with millions of points. “The experience whets our appetite for longer simulations and larger lattices. We’ll saturate as much of a supercomputer as we can,” says Vranas.

In 2025, Vranas and colleagues have returned to QCD, augmenting studies of composite particle interactions with machine learning (ML). Similar to other Grand Challenge teams, they had the chance to run simulations on El Capitan before the exascale system was locked down for classified projects. Vranas recalls, “Running the first lattice simulation at more than two exaFLOPS on the fastest supercomputer in the world was exhilarating. The Grand Challenge Program led us to that moment.”

Architecture Adaptations

A substantial challenge—and opportunity—in high-performance computing (HPC) is the portability of software and application codes to new architectures. “Supercomputers, similar to any other machinery, aren’t built to last for years and years. After five to seven years, increasing hardware failures start to occur. Newer equipment can provide significant benefits from available computing power or energy efficiency,” explains Grand Challenge Director Judy Hill. Former Director Fred Streitz adds, “Pushing science forward at the extremes of computing is more important with the advent of GPUs (graphics processing units) and different flavors of network connections. Programming for these computers is more complicated now.”

Just as different companies produce many types of laptops and smartphones, HPC vendors produce their own unique hardware components. Grand Challenge teams whose projects span multiple allocations contend with this incompatibility. For example, a team may have developed their simulation code when Livermore had only central processing unit (CPU)-based supercomputers. They may have successfully ported the code to the Lassen system’s NVIDIA-built GPUs but need to adjust further for Tuolumne’s processors, which were built by AMD.

Grand Challenge teams must factor in their portability needs and find ways to maximize code and machine efficiency. One 2025 team is doing this work simultaneously, running molecular dynamics simulations on Lassen while testing their machine-learning workflow on Tuolumne. Over the project’s lifetime, the team has used various commercial products, open-source software, and proprietary codes to find the best combination for deployment on evolving HPC hardware.

Given the vast development landscape, codes and computers are not always in sync. For instance, commercially available molecular dynamics codes only began accommodating GPUs after the trend gained traction. Computational biologist Helgi Ingólfsson notes, “Livermore has machines with a lot of GPUs, so the problem became how to fully use a machine with a commercial code that wasn’t ready for it. The work requires a large team to keep codes running consistently and performing efficiently.”

Sizing Up Seismic Waves

Livermore’s seismology research encompasses not only earthquakes but also volcanic eruptions, meteor strikes, mining activity, infrasound (sound waves with frequencies below the lower limit of human audibility), and above- or underground explosions. High-resolution simulations assist with monitoring the effects and evaluating the risks of these hazards, thus directly supporting national security and nonproliferation missions. Arben Pitarka’s Grand Challenge team has been investigating seismic wave generation and propagation from different sources using the Laboratory-developed SW4 (Seismic Waves, 4th Order) code alongside experimental data.

SW4 models seismic activity in 3D at fourth-order spatiotemporal accuracy. (See S&TR, January/February 2018, Modeling Seismoacoustic Waves of an Explosive Nature.) “SW4 has been essential to understanding the physical basis of explosion-generated seismic waves, and through the Grand Challenge Program we demonstrated its efficiency on parallel computing hardware,” states Pitarka. The U.S. Air Force Research Laboratory uses SW4, and the code is a core part of DOE’s EQSIM earth simulation platform, developed during the Exascale Computing Project.

Using Tier 2 allocations on Dane and Lassen, the Grand Challenge team is expanding the frequency range of the code’s simulations—a task requiring substantial computational power. Pitarka notes, “We are running on both CPU- and GPU-based platforms, which improves SW4’s performance. The broader frequency range takes into account the effect of nonlinear soil response to ground motion and seismic events.”

Additional development plans include applying ML techniques to improve the accuracy and performance of the physics-based models, with SW4’s forward wave propagation capabilities helping to validate and test this ML-based approach. Another goal is to attain unprecedented high-frequency modeling (up to 200 hertz, or 200 wave cycles per second) of explosion sources on GPUs. The results can inform DOE-relevant efforts such as nuclear or chemical explosion monitoring and seismic source discrimination as well as more general engineering applications.

Elevation map indicating earthquake locations.
With Grand Challenge allocations and the SW4 (Seismic Waves, 4th Order) code, a Livermore team simulated ground motion (0 to 15 hertz, red lines) at nine locations for the DOE’s Source Physics Experiment (SPE). The SPE analyzed underground chemical explosions at the Nevada National Security Site, providing insight into how different geologic features respond to detonation.

Pitarka and colleagues have achieved seismic simulation milestones in previous Grand Challenge projects and on multiple supercomputers. For example, in 2018, they ran the largest Northern California earthquake simulation on the Laboratory’s Sierra system, resolving a 7.0-magnitude earthquake along the Hayward Fault to 10 hertz and 203 billion grid points—a new record for simulations of this region’s seismic activity. The group also supported DOE’s Source Physics Experiment with simulations of shear waves caused by underground chemical explosions. (See S&TR, January 2021, Seismic Sleuths Set Off the Source Physics Experiment.)

This evolving work to develop an HPC-based seismic wave simulation framework has attracted international attention. The team collaborates with researchers in France, Italy, and Japan in addition to key U.S. institutions, enhancing the Laboratory’s leadership in physics-based seismoacoustic modeling. Pitarka adds, “An important outcome made possible by the Grand Challenge Program is the training of several Livermore summer students who actively participated in these projects.”

Scaling Cancer Research

With a long history of mission-adjacent biomedical research, Livermore has played a prominent role in DOE’s ongoing collaboration with the National Cancer Institute. (See S&TR, May 2021, 60 Years of Cancer Research.) The collaboration combines HPC, experimental and patient data, and AI and ML techniques to identify new cancer therapeutics, predict malignant cell development, and improve cancer diagnostics. Ultimately, computing technologies and cancer research contribute to progress in both.

One team has taken advantage of Tier 1 and 2 allocations for research on RAS and RAF proteins, which are linked to nearly one-third of all human cancers. “RAS and RAF are a part of the main growth-signaling pathway in mammalian cells. When cancerous mutations are present, the pathway is rigged to remain open, so the cancer cells keep growing,” explains Helgi Ingólfsson, who co-leads the ADMIRRAL (AI-Driven Multiscale Investigation of the RAS/RAF Activation Lifecycle) project with Laboratory physicist Felice Lightstone and Dwight Nissley of the Frederick National Laboratory for Cancer Research. The team is modeling RAS–RAF interactions to pinpoint critical steps in this pathway. The results will help researchers design targeted drug interventions.

The key to RAS–RAF modeling is molecular dynamics (MD) simulations, which illuminate behaviors of atoms and molecules over detailed timescales. Still, MD simulations are insufficient and slow. Ingólfsson points out, “Although this signaling pathway has been studied extensively, the research community lacks detailed mechanistic understanding at a molecular level.” So, the team developed a multiscale, ML-driven framework that resolves coarse and fine time steps within the simulation. (See S&TR, March 2019, Machine Learning on a Mission.)

Known as MuMMI (Multiscale Machine-learned Modeling Infrastructure), the framework determines the best use of computational resources while running optimized MD simulations. Computational biologist Tim Carpenter states, “Macroscale simulations are computationally cheap to run, but microscale simulations are expensive. Even with the Laboratory’s incredible HPC resources, we still must be prudent in how we use them.” MuMMI assists researchers in selecting specific regions for zooming in to finer detail, thus narrowing the scope and number of expensive simulations, which feed back into MuMMI to improve subsequent iterations.

MuMMI’s ML algorithms improve both simulation efficiency and supercomputer usage. “We can make decisions to optimize the utility of the machine. MuMMI launches the continuously updating model and keeps the entire HPC allocation fully occupied for high throughput,” says Ingólfsson. This scalability has enabled the team to run ensemble models that incorporate tens to hundreds of thousands of simulations at unprecedented spatiotemporal scales. For example, Carpenter notes, “We’re running simulations where the conditions are all slightly different and observing a continuum of protein and cell behaviors, which brings us closer to what happens experimentally and suggests new experiments. The ADMIRRAL project has identified aspects of the RAS protein that haven’t been studied before.”

The team is breaking new ground in computer science, too. “The Grand Challenge allowed us to conceive of and demonstrate a framework that manages multiscale simulations on a large computational scale and on machines with many GPUs. Scientists can now explore hypotheses that were out of reach before,” explains computer scientist Loïc Pottier. MuMMI relies on the publicly available GROMACS MD simulation code as well as two of the Laboratory’s open-source software tools co-developed for this project’s needs. For example, the Maestro software that manages HPC simulation workflows came out of the ADMIRRAL collaboration, while the award-winning Flux resource scheduling tool enabled MuMMI to scale to Livermore’s latest HPC architectures. (See S&TR, July/August 2022, Optimizing Workflow with Flux.)

Computer simulation of increasingly detailed images.
The MuMMI (Multiscale Machine-learned Modeling Infrastructure) framework helps researchers locate specific coarse-grained features to simulate at finer scales. Powered by machine-learning (ML) algorithms, this macro- to microscale framework speeds up simulation time while ensuring computational efficiency across a supercomputer’s nodes. In this example, areas at macroscale resolution (1,000 nanometers [nm]) are highlighted (left), then resolved to coarse grain at 30-nm resolution (middle). The scale is further refined to even more particles to reveal atomistic details (right). Results are then fed back into previous steps to improve the entire multiscale model upon iteration.

The Next 20 Years

If the only constant is change, then the next two decades of the Grand Challenge are sure to usher in fresh ideas. Springmeyer predicts, “These teams are raising the next generation of scientists and engineers, and together we’ll write a future that we can’t even imagine yet.” Draeger adds, “The Grand Challenge gives researchers critical energy to demonstrate the impact of new workflows and approaches. I think we’ll see a massive amount of innovation and novelty, greater even than what we’ve seen so far.”

The novelty will arrive on the computational front as well as the scientific. In the AI era, hardware advancements are no longer the only way to scale an application or improve its performance. “I think we’ll see a change in the types of Grand Challenge proposals submitted,” considers Hill. “Researchers are already using machine learning as an accelerator of traditional simulation and replacing some high-fidelity models with inference approximations.” Complex workflows that incorporate AI interfaces, surrogate models, and simulation ensembles are quickly becoming mainstream at Livermore.

Amid this progress, the Grand Challenge Program continues to hone researchers’ HPC skills to drive scientific breakthroughs. Pushing both of these envelopes is the ultimate grand challenge. Streitz asserts, “We want research at the edge of what’s possible and exploring beyond it. By effectively leveraging computing at extreme scales, scientists are not just doing more of what they could do before. They’re doing what was previously impossible. They’re moving the needle.”

—Holly Auten

For further information contact Judy Hill (925) 422-5201 (hill134 [at] llnl.gov (hill134[at]llnl[dot]gov)).