Charging Up and Rolling Out

Back to top

Back to top

People standing next to rooftop equipment
Lawrence Livermore’s Exascale Computing Facility Modernization (ECFM) project management team ensured that the project stayed on task and on budget. Brandon Hong, Chris Deprater, and Bradley Davis (from left to right) played a critical role in leading the project’s implementation, coordinating the many logistics involved, and overcoming the obstacles and challenges that the project faced in the past several years. (Anna Maria Bailey and other project team members not pictured.)

Lawrence Livermore’s El Capitan supercomputer will come online in early 2023 with a peak performance of more than 2.0 exaflops (2 quintillion floating-point operations per second). The massive computing capabilities of El Capitan and anticipated follow-on computing systems will require correspondingly massive infrastructure, cooling, and power. In response, the Laboratory’s Exascale Computing Facility Modernization (ECFM) project team has planned, designed, and executed a project to nearly double the power available for computing systems.

Providing a dedicated utility-power electrical substation and cooling towers to add 40 megawatts (MW) to the 45 MW capacity already allotted to high-performance computing (HPC) required multilevel government approval, stakeholder coordination including state and local governments as well as local constituents, partnerships with local and regional utilities providers, and strategies to stay on schedule despite the COVID-19 pandemic and supply chain delays. In a testament to the team’s creative thinking, innovative problem solving, and exceptional collaboration, the ECFM project proceeded ahead of schedule and the team met its delivery goals.

Aerial image of lab buildings
The ECFM project under construction.

Designing for the Future

Long-term planning was vital for a project of this size and scope, and Livermore began preparing for El Capitan’s needs in 2004 to ensure site infrastructure would support Livermore’s HPC needs. Anna Maria Bailey, program manager for the ECFM project, says, “We understood immediately that the industrial-scale power supply of 13.8 kilovolts (kV) used at Livermore would be insufficient for future HPC needs and that the project would have to upgrade to utility-scale power of 115 kilovolts.”

To transition to utility-scale, Livermore embarked on an extensive, phased Critical Decision process in alignment with Department of Energy (DOE) requirements. The ECFM project cleared three major rounds of DOE approval—mission need, alternative selection and cost range, and performance baseline—before construction could even begin. A utility facility study, site impact study, and environmental impact review were also required by DOE. While similar studies typically take upwards of two years, coordination with utility providers such as the Western Area Power Administration (WAPA), Pacific Gas and Electric, and Hetch Hetchy Water and Power enabled Livermore to complete the studies in half the usual time.

Bailey describes the ramp up to the approval process as intense. “When we started design, project management resources familiar with similar construction projects, such as the Livermore Computing Center, had moved on to other roles or projects at the Laboratory,” she says. “Fortunately, our senior leadership recognized just how critical this project would be for the Laboratory’s mission, and we were able to draw on experienced people in other areas of the Laboratory, bring people back from retirement to help with project controls and risk assessments, and hire new staff to increase our capability.”

The project included design of transmission lines, core-type transformers, cooling towers, and pumps as well as intricate control features hidden from view. Bailey says, “Since we’re operating at utility-scale power to meet the Exascale Computing Facility’s needs, we’ve moved beyond localized electrical controls and data acquisition systems. The large relays and controls we have in place for this facility must be timed and programmed for the right electrical sequencing.” Checks and balances ensure efficient, safe, and steady energy distribution. For example, WAPA operates the 115 kV tie line feeding the electrical substation, enabling the system to automatically carry and transfer the load between two power sources while maintaining the transformer’s connection to the load. Brandon Hong, systems engineer, says, “Controls are critical for this project. The power will be distributed to a larger sector of the California Independent System Operator grid. Any power swings will be noticed at the utility grid level, not just the Laboratory’s 13.8-kilovolt grid level.”

The cooling system provides an additional 18,000 tons of cooling capacity and incorporates controls including multiple fail safes, function checks, and sequencing. Sensors, control relays, and switches integrated into the Laboratory’s existing building management system adjust to a variety of load conditions. Newer sensors in the control systems create the opportunity to move toward condition-based maintenance, saving future labor costs.

The system is designed to operate without the use of chillers. Instead, water supplied at a temperature typically between 24 and 29 ˚C flows through cooling distribution units rather than directly to machines. Even if water temperatures rose due to environmental conditions, the system will operate—without chillers—within the American Society of Heating, Refrigeration, and Air Conditioning Engineers’ guidelines for the W32/W40 (between 32 and 40 ˚C) classification. By eliminating traditional chillers, the new system will save more than 60,000 MW hours annually to meet sustainability and energy efficiency goals.

Construction…and COVID

The start of construction signaled that the approval stage was complete and potential planning and design challenges had been overcome. At the same time, the team appreciated the gargantuan task ahead. Bradley Davis, deputy project manager for the ECFM project, says, “We basically combined five large projects to support the facility—the transmission lines, substation, cooling towers, processing cooling loop that brings water to and from the building, and the power distribution.” To meet the production scope, the team added four construction managers and two project managers to keep the work on task, on time, and on budget.

Construction began in 2020 and was almost immediately impacted by the COVID-19 pandemic. Hong says, “The procurement process alone was a whole different beast with long lead times for absolutely everything we needed due to supply chain delays.” The ECFM project team also encountered equipment delivery challenges. One piece of equipment was so large, California Highway Patrol coordination and support was required for transport from southern California to the Laboratory. In another case, a cooling tower change sent the team back through the procurement process midstream. Systems engineer Chris Deprater says, “We had to switch to a completely new cooling tower design with different specifications and sequence of operations because the original vendor could not meet the required California structural code standards. On top of that, we needed to continue operating within the same schedule, no delays.”

Aerial image of construction progress
In March 2020, the site for the substation and cooling towers serving the Exascale Computing Facility had only just been cleared and prepared for construction.

Workers’ health and safety requirements ramped up as the construction managers developed and implemented an effective COVID-19 plan with protocols such as mask wearing, social distancing, and communication by hand signal in close quarters. Bailey says, “The construction management team treated COVID‑19 as another potential job hazard that required careful planning. They met the challenge with excellent handling of the additional safety measures. While we saw some cases of COVID-19 among the construction workers, we were able to mitigate rapid spread by quarantining where necessary, bringing additional crews onsite as needed, and staggering the different tasks around discrete areas.”

COVID-related travel restrictions also posed significant challenges for specialty contractors traveling to and working at the project site. Bailey says, “These contractors are one-in-a-million subject matter experts. Most people aren’t putting in a substation every day. We partnered with an expert team in extremely high demand to do the utility-scale work.” When the specialty contractors were inadvertently exposed to COVID-19 during the 2020 holiday season, the ECFM project management team had to revisit its safety controls to find alternative solutions. As a result, people who did not need to be onsite were asked to work remotely, and Livermore provided a COVID coach to work with contractors to ensure updated safety plans were feasible and effective.

Aerial image of construction progress
A little more than a year into construction, the new cooling towers and pump system had been built, and work to install the electrical substation progressed.

Looking to the Future

Despite the construction challenges, the ECFM project team not only met project demands but stayed ahead of schedule. Deprater credits the team and creative problem solving. “No one on the team took ‘can’t’ for an answer,” he says. Hong adds, “We found another solution or a creative, innovative work-around to every obstacle. We planned far enough in advance to provide float when we needed it. In addition, Anna Maria (Bailey) knew every step of the project, how much time it would take, and how early we had to act to make things happen. She was a real asset.”

Although El Capitan has not yet come online, Livermore Computing and the project management team are looking ahead to future, more demanding power, cooling, and infrastructure needs. Ever-advancing HPC capabilities are at the heart of Lawrence Livermore’s stockpile stewardship mission, and efforts to advance supercomputing capabilities are a critical priority. (See S&TR, September 2016, Laying the Groundwork for Extreme-Scale Computing.) The Laboratory has already started planning for the next supercomputer, labeled Advanced Technology System 6 (ATS6), and will initiate procurement in a few years. The team’s forward-looking strategic planning to anticipate enduring and growing needs ensures that ATS6 will be able to take advantage of the same infrastructure put in place through the ECFM project.

Aerial image of completed construction project
By February 2022, construction on the substation, cooling towers, and pump system was largely completed, and the ECFM project shifted focus to ensuring the systems inside the Exascale Computing Facility would be ready for El Capitan’s deployment.

Moving forward, Bailey emphasizes the ongoing demand for infrastructure growth. “Right now, in addition to the new ECFM transformer lineup, we have three main transformers tied in parallel as the base load for the Laboratory,” she says. “When you consider the potential power and cooling demand of future supercomputers, we will need to add capacity to the entire site. New innovations will be needed. For now, we will meet the expected HPC load for another 10 years.”

—Sheridan Hyland

Key Words: Advanced Technology System 6 (ATS6), Critical Decision, El Capitan supercomputer, exaflop, Exascale Computing Facility Modernization (ECFM) project, high-performance computing (HPC), Livermore Computing.

For further information contact Anna Maria Bailey (925) 423-1288 (bailey31 [at] llnl.gov (bailey31[at]llnl[dot]gov)).