Jump to main content

AI tool predicts protein-protein interactions with 90% accuracy

Study has impact for understanding disease mechanisms, aiding in the diagnosis and treatment of genetic disorders

Three people wearing white lab coats looking at a large computer monitor.
Researchers in the Eugene McDermott Center for Human Growth and Development used artificial intelligence to predict protein-protein interactions. Study authors include, clockwise from bottom: Jing Zhang, Ph.D., postdoctoral researcher, Qian Cong, Ph.D., Assistant Professor, and Jimin Pei, Ph.D., Computational Biologist.

Using a new artificial intelligence (AI) protocol to decipher evolutionary signals from 21,000 species, UT Southwestern researchers have created a powerful tool to identify proteins that interact with each other.

Jing Zhang, Ph.D.

“Protein-protein interactions (PPIs) are important in nearly all biological functions,” said Jing Zhang, Ph.D., a postdoctoral researcher in the Eugene McDermott Center for Human Growth and Development and lead author of the study, published in Science. “Our predicted interactions should help to explain the mechanisms of genetic disorders.”

In this study, UTSW researchers found 16,000 single-amino acid mutations that cause diseases can be mapped to PPI interfaces, consistent with the idea that disruption of PPIs could be a common mechanism for genetic disorders. PPIs occur in many forms, such as short-term interaction of an enzyme with its inhibitor or long-term formation of stable complexes such as ion channels in the cell membrane. Researchers estimate there may be more than 100,000 human PPIs.

“Identifying new physical interactions between proteins will explain new cellular functions. This deep learning approach predicts thousands of new interactions, so the implications for discovery are remarkable,” said Ralph DeBerardinis, M.D., Ph.D., Director of the McDermott Center and Professor in Children’s Medical Center Research Institute at UT Southwestern (CRI) and of Pediatrics.

Qian Cong, Ph.D.

One key to identifying important protein interactions lies in evolution, explained Qian Cong, Ph.D., senior author of the study and Assistant Professor in the McDermott Center and of Biophysics.

“If two proteins interact and one mutates, the interaction fails until the other protein also mutates. So, if two proteins seem to affect each other in evolution, that’s a clue that they’re interacting proteins,” Dr. Cong said. “With sufficient computing power and speed, it’s possible to screen proteins and look for pairs that share a high evolutionary signal.”

“This is what AI is good for,” added Dr. Zhang.

Another key to their system was to greatly increase the amount and quality of data used to train the AI model, called RF2-PPI. As a starting point, the team used a database with 200 million predicted 3D structures of proteins by the Nobel-prize winning method, AlphaFold, from which they identified interactions between segments of proteins called domains. These interacting domains provide a large training dataset for the AI model, increasing the likelihood of discovering new protein-protein matches, the researchers reasoned.

They also adjusted the architecture of their AI model so it would work for weak, transient interactions, which are more common than strong, permanent PPIs. In addition, they included data on proteins known not to interact, so the algorithm wouldn’t be biased toward assuming interactions frequently occur. Their data thus included both interacting proteins and a control set of noninteracting molecules. The database was also 17 times larger than previous sets due to the inclusion of domain-domain interactions.

Integrating RF2-PPI and AlphaFold, they systematically screened for interactions between human proteins, predicting nearly 18,000 PPIs with expected precision of 90%. Of these PPIs, 3,631 had never been picked up by previous screenings.

The proteins in the predicted PPIs included:

  • The transmembrane protein KLRG1, which inhibits natural killer cells and T cells, interacting with TLR3, a key player in immune response whose mutation is linked to a rare immunodeficiency syndrome
  • G protein-coupled receptors, which trigger cellular responses to extracellular signals, and several proteins involved in immune response
  • Various proteins involved in building the whiplike tail of sperm
  • Several mitochondrial proteins that may assist in energy metabolism
  • Additional proteins that interact with known protein complexes, including those that maintain telomeres (regions of DNA that protect the ends of chromosomes)

The predicted interactions among transmembrane proteins are significant, because they are embedded in a fatty structure that makes them difficult to study experimentally, the researchers reported.

Jimin Pei, Ph.D.

Due to the important role of PPIs, the knowledge gained in this study will help researchers understand disease mechanisms and discover related mutations, contributing to the diagnosis and treatment of genetic disorders, said Jimin Pei, Ph.D., a Computational Biologist in the Cong Lab who is a co-first author of the study.

This is just a small step toward the ambitious goal of identifying all PPIs happening in a cell and modeling how all proteins come together to build the cells and the organisms, Dr. Cong added.

Going forward, researchers plan to further optimize their methods and integrate computational approaches with experimental data.

In addition to UT Southwestern scientists, researchers from Seoul National University and Yonsei University, both in Korea, and the University of Washington also contributed to this work.

Endowed Titles

Dr. Cong is a Southwestern Medical Foundation Scholar in Biomedical Research.

Dr. DeBerardinis holds the Eugene McDermott Distinguished Chair for the Study of Human Growth and Development and the Philip O’Bryan Montgomery, Jr., M.D., Distinguished Chair in Developmental Biology and is a Sowell Family Scholar in Medical Research.

Back-to top