A Computational Framework For Predicting Direct Contacts and Substructures Within Protein Complexes.

Funding Source

National Institutes of Health

Grant Number



Department of Physics and Computer Science - Dual Degree Engineering

Document Type


Publication Date



Understanding the physical arrangement of subunits within protein complexes potentially provides valuable clues about how the subunits work together and how the complexes function. The majority of recent research focuses on identifying protein complexes as a whole and seldom studies the inner structures within complexes. In this study, we propose a computational framework to predict direct contacts and substructures within protein complexes. In this framework, we first train a supervised learning model of l2-regularized logistic regression to learn the patterns of direct and indirect interactions within complexes, from where physical subunit interaction networks are predicted. Then, to infer substructures within complexes, we apply a graph clustering method (i.e., maximum modularity clustering (MMC)) and a gene ontology (GO) semantic similarity based functional clustering on partially-and fully-connected networks, respectively. Computational results show that the proposed framework achieves fairly good performance of cross validation and independent test in terms of detecting direct contacts between subunits. Functional analyses further demonstrate the rationality of partitioning the subunits into substructures via the MMC algorithm and functional clustering.


DOI: 10.3390/biom9110656

PubMed ID: 31717703