High-Performance Computing Takes Aim at Cancer

October/November 2016

View Article in PDF

Supplemental Material

High-Performance Computing Takes Aim at Cancer

(vitanovski-fotolia.com)

Combining extraordinary processing capability with enormous storage capacity and advanced simulation and analytical software, supercomputers have become essential to national security, scientific discovery, engineering, technology, and industry. Some of the world’s most powerful supercomputers are located at Lawrence Livermore, where they support the National Nuclear Security Administration’s Stockpile Stewardship Program and make possible advances in areas such as materials science, chemistry, and energy, among others.

Livermore researchers have recently been calling national attention to applying the power of high-performance computing (HPC) to biology. According to Dave Rakestraw, head of Livermore’s chemical, biological, and explosives security program, the Laboratory is fostering collaborations across academia, industry, and government that promote HPC as a revolutionary approach to improved understanding of human health. The effort focuses on countering biosecurity threats, overcoming infectious disease challenges, and laying foundations for the future of critical care.

Now, a historic partnership between the Department of Energy (DOE) and the National Cancer Institute (NCI) is applying the formidable computing resources at Livermore and other DOE national laboratories to advance cancer research and treatment. Announced in late 2015, the effort will help researchers and physicians better understand the complexity of cancer, choose the best treatment options for every patient, and reveal possible patterns hidden in vast patient and experimental data sets. The DOE–NCI agreement features three pilot programs that bring together nearly 100 cancer and biomedical researchers, computer scientists, and engineers. Livermore researchers are playing important roles in all three programs. Participants also include Argonne, Los Alamos, and Oak Ridge national laboratories; NCI’s Frederick National Laboratory for Cancer Research (FNLCR); and the U.S. Department of Veterans Affairs.

“One of the goals of this partnership is to bring about a huge shift in how biological and medical research will be performed in the future,” says Fred Streitz, director of Livermore’s High Performance Computing Innovation Center. “We are investing in the computational tools needed to move the medical community toward a predictive approach to cancer,” he says. “Such tools may help explain why one cancer treatment is successful with one patient but fails with the next.” In that respect, the DOE–NCI partnership supports President Barack Obama’s Precision Medicine Initiative, which promotes developing treatments for various medical conditions that take into account patients’ individual variability in genes, microbiomes (the collection of microbes in or on the body), environment, health history, lifestyle, and diet.

The partnership is also a key element of the National Cancer Moonshot Initiative, which, under the direction of U.S. Vice President Joe Biden, seeks to double the rate of progress in the understanding, prevention, diagnosis, and treatment of cancer. On June 28, 2016, a summit for Cancer Moonshot was held at Howard University in Washington, D.C., that joined Vice President Biden with more than 350 researchers, oncologists, and care providers.

Jason Paragas, Livermore’s director of innovation, was instrumental in bringing together high-level officials for the DOE–NCI cancer research partnership. He notes that the agreement is aligned with the National Strategic Computing Initiative, which is designed to ensure the United States continues leading the world in HPC over the coming decades. “NCI understands that the complexity of cancer initiation and growth demands the same computational approaches Livermore has spent decades developing for both national security and scientific discovery,” says Paragas. “NCI managers recognize that the newer computational architectures inside the latest machines provide an opportunity to think about biology in a novel way by combining the best of simulation and data science.”

According to Jim Brase, deputy associate director for science and technology in Lawrence Livermore’s Computation Directorate, this partnership underscores how Livermore can work closely with research partners to advance medical breakthroughs. “Our expertise is in computing, not cancer,” he says. “Medical advances in this area require an effective partnership with NCI.”

Vice President Joe Biden holds the first meeting of the Cancer Moonshot Task Force as part of the federal initiative to double the rate of progress in the understanding, prevention, diagnosis, and treatment of cancer. The large-scale effort involves hundreds of researchers, oncologists, and care providers. (Photo courtesy of Pete Souza, White House.)

Data May Reveal Patterns

Advanced data analytics—an approach that uses machine-learning algorithms to search for connections within vast amounts of data—is a key component of the DOE–NCI research. Recently, Livermore-developed deep-learning networks, based loosely on neural pathways in the human brain, have been used to create advanced models based on patterns buried deep within data sets. (See S&TR, June 2016, Deep Neural Networks Bring Patterns into Focus.) Streitz says, “Merging data analytics and simulation could potentially transform how we do scientific research.”

All three DOE–NCI pilot programs will develop advanced data analytics for large sets of patient, drug, experimental, and other cancer-related data to uncover correlations that are too complex for humans to discern. Each pilot will also be applying uncertainty quantification, a statistical process that increases confidence in the conclusions drawn from data analytics. The process, improved over the years by Lawrence Livermore weapons scientists, has been highly effective in stockpile stewardship work for assessing the expected performance of nuclear weapons systems without nuclear testing.

Amy Gryshuk, Director of Strategic Engagements and Alliance Management for Livermore’s Physical and Life Sciences Directorate, and Eric Stahlberg, Director of the High Performance Computing Initiative at FNLCR, coordinate and provide project management for the three pilot programs, which are aimed at improving drug therapy for cancer patients, simulating human RAS proteins to facilitate cancer drug development, and analyzing extremely large NCI databases to optimize cancer therapies. “The DOE–NCI partnership is critical to the success of the program and has resulted in a unified and formidable endeavor that includes multiple institutions with diverse cultures, capabilities, and fields of research,” says Gryshuk. The pilots will also identify requirements for future supercomputer architectures and data analytics software.

Livermore scientists are applying advanced machine-learning algorithms to search for connections within vast amounts of data. Shown here is a graph representing the metadata of thousands of archived documents, illustrating the complexity and expansive nature of data analytics. (Image courtesy of Martin Grandjean.)

Learning from Cancer Cell Cultures

The first pilot program is led by Rick Stevens at Argonne National Laboratory and Jim Doroshow at NCI, with bioinformatics scientist Jonathan Allen heading Livermore’s participation. This team aims to outperform current methods for selecting cancer treatments through the development of algorithms that produce powerful new predictive models. The work includes both statistical and mechanistic models (how tumor cells promote unchecked cell growth and how cancer drugs interact with those cells). The models are expected to help researchers speedily and inexpensively predict the effectiveness of potential cancer drugs and more quickly identify and evaluate promising new pharmaceuticals. The pilot program also promises to provide new insights into tumor biology and critical cancer pathways.

For several years, Allen has been working on methods to rapidly detect and characterize pathogenic organisms such as viruses, bacteria, and fungi. Allen’s team previously developed the Livermore Metagenomic Analysis Toolkit (LMAT), a group of software programs that quickly compares metagenomic data (environmental genetic material) to large collections of already sequenced human and microbial genomes. (See S&TR, October/November 2015, Two-Part Microbial Detection Enhances Bioidentification.) LMAT uses unique search algorithms that exploit large memory computer architectures such as those being implemented for the DOE–NCI research.

The computer models will be based on well-documented data generated by numerous cell lines—populations of cells taken from different human tumors and grown and maintained in a laboratory. Allen says, “We will look for key patterns such as molecular signatures that correlate with certain outcomes to build a model of the drugs’ effectiveness at countering tumor growth.”

The researchers will start with the NCI-60 tumor cell line to determine the tumors’ response to thousands of available drugs. This group of 60 different human tumor cell lines includes leukemia, melanoma, and cancers of the lung, colon, brain, ovary, breast, prostate, and kidney. Studying tumor cell cultures is critical because “it’s difficult to know what’s happening inside a human tumor,” explains Allen. Researchers expect to add data from other cell lines and from patient-derived xenograft (PDX) models, wherein cells from human tumors are transplanted to mice, to better capture details of how tumors grow and respond to different treatments. The long-range goal is to have more than 1,000 PDX models available for screening to study the tumors’ heterogeneity. The resulting model repository will be used to characterize tumor viability and provide a computerized platform for testing new drugs.

Looking Forward to New Generations of Supercomputers

Livermore’s suite of powerful unclassified supercomputers such as Catalyst, Cab, and Vulcan will play an important role in all three Department of Energy (DOE)–National Cancer Institute (NCI) pilot programs. Developed in partnership with Intel and Cray, the Laboratory’s Catalyst machine has a unique architecture designed to collect, manage, and analyze vast quantities of data. “Catalyst serves as a test bed to optimize strategies for data-intensive computing,” says Fred Streitz, director of Livermore’s High Performance Computing Innovation Center (HPCIC).

The computing techniques developed during the pilot programs will be scaled to the next generation of supercomputers being produced as part of the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) effort. CORAL-class machines will be operating beginning in late 2017. Livermore’s machine, Sierra, will be capable of at least 150 petaflops (10¹⁵ floating operations per second), 15 times the power of current supercomputers.

In June, the National Nuclear Security Administration (NNSA) and other government representatives dedicated a new supercomputing facility at Livermore. The $9.8 million facility, which adjoins the Livermore Valley Open Campus (LVOC), provides added flexibility to accommodate future advances in computer technology and meet a rapidly growing demand for unclassified high-performance computing (HPC). Home to HPCIC, the LVOC area facilitates collaborations with industry and academia. Next year, the facility will house a smaller, unclassified companion to Sierra to support academic alliances, the DOE–NCI partnership, and other efforts of national importance.

Researchers participating in the three pilot programs anticipate a formidable class of supercomputers still under design. The exascale machines will be capable of 1 billion billion calculations per second, a significant performance increase over existing systems. DOE’s NNSA and Office of Science have launched the Exascale Computing Project, and the first exascale machines are scheduled to arrive in 2023.

The deputy associate director for science and technology in Livermore’s Computation Directorate, Jim Brase, says, “CORAL machines will show us what exascale will look like. We already know that we need to scale our codes to exascale machines with new types of CPUs, GPUs, neurosynaptic chips, and other specialty processors inspired by the architecture of the human brain.”

Officials from DOE’s National Nuclear Security Administration (NNSA) and government representatives dedicate a new supercomputing facility at Lawrence Livermore. (from left) Michael Macial, mayor of Tracy, California; Charles Verdon, Lawrence Livermore’s principal associate director for Weapons and Complex Integration; Kathleen Alexander, NNSA administrator; Pat Falcone, Lawrence Livermore’s deputy director for Science and Technology; Nicole Nelson-Jean, NNSA Livermore Field Office manager; and John Marchand, mayor of Livermore, California. (Photo by Julie Russell.)

Modeling Cancer Initiation Events

Livermore’s Streitz and Dwight Nissley at NCI lead the second pilot program, which promises to deliver the computational advances necessary for understanding cancer initiation in RAS proteins located in cell membranes. Found in all human cells and organs, these proteins are involved in transmitting signals within cells and regulating diverse cell behaviors. When a RAS protein is switched on, it activates other proteins, which then trigger other genes involved in cell growth, differentiation, and survival. Under normal function, a RAS protein switches off after other proteins are switched on. However, RAS gene mutations can lead to proteins’ permanent activation. These mutations are responsible for up to 30 percent of all human cancers, including some of the most deadly forms, such as pancreatic.

The fundamental mechanism by which RAS proteins initiate uncontrolled cell growth is still a mystery. NCI has large amounts of data on the physical, chemical, and biological characteristics of RAS genes and proteins, data which were obtained through x-ray crystallography, cryoelectron microscopy, and other imaging techniques. The team will couple experimental data with atomic-resolution molecular dynamics simulations to build a model of RAS protein biology in varying types of cell membranes. The RAS model will permit easy manipulation of particular tissues and simulate the effects of environmental and genetic factors present in human populations or specific to individuals.

According to Streitz, a major advance in this area of research would be a comprehensive approach to explain the mechanisms of the protein and the onset of cancer. “Although RAS is found only in the cell membrane, it starts a cascade of events that involves many processes happening simultaneously,” he says. “These events are not linear, so they cannot be modeled sequentially or simply. However, we can simulate the membrane environment and explore how it operates and interacts with other proteins and with cancer drugs.”

The models will use machine-learning algorithms combined with uncertainty quantification to optimize the simulations of RAS interactions with RAF (a protein activated by RAS). The investigators plan to use the model’s ability to predict the fundamental mechanism of RAS-driven cancer initiation and growth in the various tissue types to identify potential treatments for inhibiting RAS activation in normal cells.

As part of this effort, the team is developing algorithms that will automatically switch between atomistic and coarse-grained molecular dynamics, in essence, optimizing the resolution to maximize fidelity yet minimize run time. In addition, they will explore algorithms capable of autonomously generating hypotheses about signaling mechanisms. The hypotheses will then be validated through simulation, possibly identifying potential drug therapy sites among thousands of possible configurations. “This capability will be nothing short of revolutionary,” says Streitz. “It will change the way we use predictive simulations.”

Found in all human cells and organs, RAS proteins, such as the one shown in this artist’s rendering, are involved in transmitting signals within cells and regulating diverse cell behaviors. Mutations in RAS genes are responsible for up to 30 percent of all human cancers, including some of the most deadly forms. (Rendering by Elaine Meng.)

Going Deep into Patient Records

The third pilot program, led by Gina Tourasi at Oak Ridge and Lynn Penberthy at NCI, takes a population-wide approach to cancer research. The research team is analyzing cancer patients’ medical records to better understand treatment outcomes on a large scale. Livermore computational biologist and team member Todd Wasson notes that patient privacy will be strictly observed. The team has begun studying 500,000 medical records from four states—Washington, Louisiana, Georgia, and Kentucky. The records are provided by NCI’s Surveillance Epidemiology and End Results (SEER) program, which has been collecting data on cancer patients since 1973.

This pilot aims to develop processing tools for analyzing many different sets of medical records. Powerful machine-learning tools will search the data for patterns of how genetics, environment, lifestyle, and quality of health affect the progression, recurrence, and survival of cancers. The data include patient characteristics, pathology reports, specific treatment, survival, and cause of death. Since clinical text varies in writing style and expression, algorithmic development will focus on advanced machine-learning and deep-learning techniques to extract relevant features from clinical reports. In particular, investigators will be implementing natural-language processing, which enables computers to derive meaning from reports written in human languages. The machine-learning approaches could also be augmented with genomic data, images, and medical claims.

The results will help scientists improve cancer care at various levels—individuals, an entire population, or subgroups where there may be disparities in outcome. Investigators plan to produce an unprecedented predictive simulation capability. “We want to obtain a deeper understanding of cancer drivers and outcomes in the population,” says Wasson. “We’ll be looking at how different cancers respond to the same treatment and how a single type of cancer responds to different treatments.” He says the long-term goal is to support personalized therapies, as part of the Precision Medicine Initiative. “We want to provide oncologists greater confidence when they recommend a particular treatment based on the type of cancer and the individual. We don’t know what we will discover,” says Wasson. The pilot program is also expected to advance machine-learning algorithms and scalable deep-learning tools for CORAL-class supercomputers and exascale-computing platforms to permit efficient analysis of the millions of records expected annually in the cancer surveillance program.

As depicted in this graphic, the goal of the Precision Medicine Initiative is to help physicians choose the best cancer treatment for patients by taking into account the individual variability in their genes, the microbes in and on their bodies, and their physical environment, health history, lifestyle, and diet. (Image courtesy of the National Cancer Institute.)

Partnerships Are Critical to Success

The expected collaborations between biomedical researchers and clinicians and HPC teams will likely change the culture of medical research, according to Brase. “You need a big team to write codes and validate them,” he observes. This approach points to the philosophy of E. O. Lawrence, who more than 60 years ago invented “team science,” the proven concept of assembling a highly focused team of investigators from different disciplines to achieve a common, often difficult, goal.

Streitz predicts that as the value of HPC to cancer research becomes more evident, collaborations aimed at helping overcome medical challenges will become an increasingly important aspect of Livermore’s research portfolio. He observes that connecting the computational resources of DOE national laboratories to life-sciences projects may also help in developing responses to drug-resistant microbes, the ever-changing threat of bioterrorism, the intractability of other complex diseases in addition to cancer, and the rising cost of new pharmaceuticals. He emphasizes, “But we’ll always need partners such as NCI to make the progress needed in these fields.” With the help of HPC and the dedication of hundreds of scientists, doctors, and researchers, the scourge of cancer may, one day, have a cure.

"Home-Grown" Efforts Thrive at Livermore

While the DOE–NCI programs are getting started, a number of internally funded projects are already underway at Livermore to build HPC capabilities across an increasing number of disciplines. The following three efforts are funded by the Laboratory Directed Research and Development (LDRD) Program, Livermore’s single most important internal funding resource for fostering innovative science and technology.

Jonathan Allen leads an LDRD project, in collaboration with Argonne National Laboratory, the University of Chicago, and other research groups. Allen’s team is working to predict the potential for hospital patients in intensive care units (ICUs) to develop antibiotic-resistant infections, a serious problem that has resulted from overuse of antibiotics. Some drug-resistant bacteria survive these treatments or mutate to become resistant, transforming simple diseases into killers. Allen’s group has been studying collections of microbial genomes identified as resistant or susceptible to antibiotics to develop a predictive model of which ICU patients will become susceptible. The group’s methods search massive amounts of genomic data to recognize important biological features, leading to better predictions of pathogen emergence. The team is developing an analytic framework for storing and searching terabytes (1 trillion or 10¹² bytes) of genomic data and metadata.

Principal investigator Todd Wasson is working with the Research Division at Kaiser Permanente in Oakland, California, to predict the onset of sepsis in hospitalized patients. Sepsis is the body’s overwhelming response to an infection, leading to potential tissue damage, organ failure, and death. Sepsis, which occurs in 1–2 percent of all hospitalized patients and 25 percent of ICU patients, is the most common cause of death in hospitalized patients. Early detection (and, ideally, prediction) is vital because the earlier the onset of sepsis is detected, the better the possible outcomes. Wasson is building a predictive model of sepsis occurrence by using patient data—for example, blood pressure, temperature, and medication—to determine whether the patient might enter a septic state while hospitalized. “The data are not massive but extremely variable and complex because people are heterogeneous and complicated,” he observes.

A third LDRD-funded project, led by Sergio Wong, is aimed at enhancing Cardioid, the world’s most detailed model of the electrophysiology of the human heart. (See S&TR, September 2012, Venturing into the Heart of High-Performance Computing Simulations.) Developed in partnership with IBM, the code depicts the activation of each heart muscle cell and the cell-to-cell voltage transfer of up to 3 billion cells. It does so in near-real time and with unprecedented accuracy and resolution. For the first time, scientists are seeing how potentially fatal arrhythmias develop and are influenced by the administration of drugs and medical devices.

Cardioid is the world’s most detailed simulation of the human heart in action and an example of high-performance computing applied to human health. The highly scalable code replicates the heart’s electrical system, depicting the activation of each heart muscle cell in near-real time and with accuracy and resolution previously unavailable.

—Arnie Heller

Key Words: bioinformatics; cancer; Cancer Moonshot Initiative; Cardioid; Collaboration of Oak Ridge, Argonne, and Livermore (CORAL); exascale; high-performance computing (HPC); High Performance Computing Innovation Center (HPCIC); Livermore Metagenomic Analysis Toolkit (LMAT); National Cancer Institute (NCI); Precision Medicine Initiative; RAS protein; Surveillance Epidemiology and End Results (SEER).

For further information contact Amy Gryshuk (925) 424-5427 (gryshuk2@llnl.gov) or Fred Streitz (925) 423-3236 (streitz1@llnl.gov).