neurosciencesenabstract onlyPubMed — neurosciences cognitives developpementales

Hypergraph-Based Dual-Channel Improved Variational Autoencoder with Cross-Attention for Compound-Protein Interactions Identification.

Abstract

Elucidating compound-protein interactions is crucial for early drug discovery, offering insights into molecular mechanisms and therapeutic potential. While wet-lab methods detect interactions, they suffer from false positives, high costs, and labor intensity. Consequently, there is an urgent need to develop theoretical computational approaches for identifying interactions between compounds and proteins. This study introduces a novel hypergraph-based dual-channel theoretical framework integrating an enhanced variational autoencoder with a multihead cross-attention mechanism to identify potential interactions. First, a hypergraph is constructed in each channel: one with compounds as hyperedges and proteins as nodes, and another with compounds as nodes and proteins as hyperedges. Subsequently, the improved variational autoencoder is applied to both hypergraphs to extract latent feature vectors for compounds and proteins, concurrently taking into account node characteristics and hypergraph topological information. Furthermore, the multihead cross-attention mechanism is utilized to obtain interaction features from compound and protein embeddings, thereby characterizing their interactions. Finally, a deep neural network model is constructed to identify potential compound-protein interactions. Evaluated on a benchmark data set with 5-fold cross-validation, the method achieves 95.71% accuracy, 96.09% sensitivity, 95.34% specificity, 95.37% precision, a Matthews correlation coefficient of 0.9143, an area under the receiver operating characteristic curve of 0.9899 and an area under the precision-recall curve of 0.9449. The predictive capability and component contribution are confirmed through nonredundant experiments and ablation studies, respectively. Comparisons with state-of-the-art deep learning frameworks on data sets (DrugBank, GPCR, KIBA and Human) demonstrate superiority. Over one million potential compound-protein interactions were identified, with some validated via molecular docking simulations. This approach is poised to significantly aid lead compound discovery and drug repurposing.

Partager