-
摘要: 深度學習技術的長足發展與數據算力的快速提升,極大地增加了各種結構圖神經網絡優化和實現的可行性,使得圖結構數據的表示研究工作取得極大進展。已有的圖神經網絡方法主要關注圖節點之間全局信息的傳遞,理論上可證明其強大的信息表示能力。然而,面向局部拓撲具有特殊語義的圖結構數據表示時,這些通用方法缺乏靈活的局部結構表示機制,例如化學反應中組成分子的局部結構—官能團,其通常能夠決定化學分子性質并且參與化學反應過程。進一步挖掘這些局部結構的信息對基于圖表示的各類任務都是非常重要的,為此提出一個利用變分卷積推斷局部拓撲結構的圖表示方法,不僅考慮圖節點在全局結構上的關系推理與信息傳遞,還基于變分推斷自適應地學習圖數據的局部拓撲結構,利用卷積操作對局部結構進行編碼,從而進一步提高圖神經網絡的表達能力。本文工作在多個圖結構數據集上進行實驗,實驗結果表明利用局部結構信息可以有效提升圖神經網絡在基于圖的相關任務上的性能。Abstract: The development of deep learning techniques and support of big data computing power have revolutionized graph representation research by facilitating the implementation of the learning of different graph neural network structures. Existing methods, such as graph attention networks, mainly focus on global information propagation in graph neural networks, which have theoretically proven their strong representation capability. However, these general methods lack flexible representation mechanisms when facing graph data with local topology involving specific semantics, such as functional groups in the chemical reaction. Accordingly, it is of great importance to further exploit the local structure representations for graph-based tasks. Several existing methods either use domain expert knowledge or conduct subgraph isomorphism counting to learn local topology representations of graphs. However, there is no guarantee that these methods can easily be generalized to different domains without specific knowledge or complex substructure preprocessing. In this study, we propose a simple and automatic local topology inference method that uses variational convolutions to improve the local representation ability of graph attention networks. The proposed method not only considers the relationship reasoning and message passing on the global graph structure but also adaptively learns the graph’s local structure representations with the guidance of statistical priors that can be readily accessible. To be more specific, the variational inference is used to adaptively learn the convolutional template size, and the inference is conducted layer-by-layer with the guidance of the statistical priors to make the convolutional template size adaptable to multiple subgraphs with different structures in a self-supervised way. The variational convolution module is easily pluggable and can be concatenated with arbitrary hidden layers of any graph neural network. In contrast, due to the locality of the convolution operations, the relations between graph nodes can be further sparse to alleviate the over-squeezing problem in the global information propagation of the graph neural network. As a result, the proposed method can significantly improve the overall representation ability of the graph attention network using the variational inference of the convolutional operations for local topology representation. Experiments are conducted on three large-scale and publicly available datasets, i.e., the OGBG-MolHIV, USPTO, and Buchwald-Hartwig datasets. Experimental results show that exploiting various kinds of local topological information helps improve the performance of the graph attention network.
-
表 1 不同方法在OGBG-MolHIV的ROC-AUC值比較結果
Table 1. Test results of ROC-AUC on OGBG-MolHIV
Method ROC-AUC EGC-M 0.7818
Transformer0.7058
Local encoding0.7249
Multiscale local encoding0.7535
Soft-assignment local encoding0.7519
Ours0.7839 表 2 本文與最先進方法在OGBG-MolHIV的比較結果
Table 2. Comparison of our method and the state-of-the-art on OGBG-MolHIV
表 3 不同方法在USPTO數據集的R2值比較結果
Table 3. Comparison of R2 scores on the USPTO dataset
Dataset Data split Schwaller, et al.[32] Local encoding Multiscale local encoding Soft-assignment local encoding Ours Subgram Random 0.195 0.198 0.196 0.195 0.199 Time 0.142 0.146 0.147 0.145 0.150 Smoothed 0.388 0.390 0.396 0.397 0.435 Gram Random 0.117 0.118 0.119 0.118 0.121 Time 0.095 0.096 0.096 0.095 0.098 Smoothed 0.277 0.279 0.285 0.284 0.311 表 4 Buchwald-Hartwig數據集上與已有方法的平均R2值比較結果
Table 4. Comparison of average R2 scores on the Buchwald-Hartwig dataset
259luxu-164 -
參考文獻
[1] Ying R, He R N, Chen K F, et al. Graph convolutional neural networks for web-scale recommender systems // Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, 2018: 974 [2] Dai H J, Li C T, Coley C W, et al. Retrosynthesis prediction with conditional graph logic network // Advances in Neural Information Processing Systems. Vancouver, 2019: 8870 [3] Han K, Wang Y, Guo J, et al. Vision GNN: An image is worth graph of nodes // Advances in Neural Information Processing Systems. New Orleans, 2022 [4] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need // Advances in Neural Information Processing Systems. Long Beach, 2017: 5998 [5] Cho K, van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, 2014: 1724 [6] Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs // Advances in Neural Information Processing Systems. Long Beach, 2017: 1024 [7] Veličković P, Cucurull G, Casanova A, et al. Graph attention networks // International Conference on Learning Representations. Toulon, 2018 [8] Wijesinghe A, Wang Q. A New Perspective on “How graph neural networks go beyond Weisfeiler-Lehman?” // International Conference on Learning Representations. Online, 2022 [9] Liu J W, Liu J W, Luo X L. Research progress in attention mechanism in deep learning. Chin J Eng, 2021, 43(11): 1499劉建偉, 劉俊文, 羅雄麟. 深度學習中注意力機制研究進展. 工程科學學報, 2021, 43(11):1499 [10] Ying C X, Cai T L, Luo S J, et al. Do transformers really perform badly for graph representation? // Advances in Neural Information Processing Systems. Online, 2021: 28877 [11] Alon U, Yahav E. On the bottleneck of graph neural networks and its practical implications[J/OL]. arXiv preprint (2020-6-9) [2022-7-24].https://arxiv.org/abs/2006.05205 [12] Jin W G, Barzilay R, Jaakkola T. Junction tree variational autoencoder for molecular graph generation // International Conference on Machine Learning. Stockholm, 2018: 2323 [13] Chen Z D, Chen L, Villar S, et al. Can graph neural networks count substructures? // Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, 2020: 10383 [14] Bouritsas G, Frasca F, Zafeiriou S, et al. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Trans Pattern Anal Mach Intell, 2023, 45(1): 657 doi: 10.1109/TPAMI.2022.3154319 [15] Yu H, Zhao S Y, Shi J Y. STNN-DDI: A substructure-aware tensor neural network to predict drug-drug interactions. Brief Bioinform, 2022, 23(4): bbac209 doi: 10.1093/bib/bbac209 [16] Hu W, Fey M, Zitnik M, et al. Open graph benchmark: Datasets for machine learning on graphs // Advances in Neural Information Processing Systems. Online, 2020 [17] Lowe D. Chemical reactions from US patents (1976-Sep2016) [J/OL]. Figshare (2017-6-14) [2022-7-24]. https://doi.org/10.6084/m9.figshare.5104873.v1 [18] Ahneman D T, Estrada J G, Lin S, et al. Predicting reaction performance in C-N cross-coupling using machine learning. Science, 2018, 360(6385): 186 doi: 10.1126/science.aar5169 [19] Wu F, Fan A, Baevski A, et al. Pay less attention with lightweight and dynamic convolutions // International Conference on Learning Representations. New Orleans, 2019 [20] Wu Z H, Liu Z J, Lin J, et al. Lite transformer with long-short range attention[J/OL]. arXiv preprint (2020-4-24) [2022-7-24].https://arxiv.org/abs/2004.11886 [21] Gulati A, Qin J, Chiu C C, et al. Conformer: Convolution-augmented transformer for speech recognition // Interspeech Conference. Shanghai, 2020: 5036 [22] Wang Y Q, Xu Z L, Wang X L, et al. End-to-end video instance segmentation with transformers // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, 2021: 8737 [23] Wu H P, Xiao B, Codella N, et al. CvT: introducing convolutions to vision transformers // IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, 2022: 22 [24] Si C Y, Yu W H, Zhou P, et al. Inception transformer[J/OL]. arXiv preprint (2022-5-25) [2022-7-24].https://arxiv.org/abs/2205.12956 [25] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2016: 2818 [26] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning // Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, 2017: 4278 [27] Zhou B L, Andonian A, Oliva A, et al. Temporal relational reasoning in videos // Proceedings of the European Conference on Computer Vision. Munich, 2018: 803 [28] Kim Y. Convolutional neural networks for sentence classification[J/OL]. arXiv preprint (2014-8-25) [2022-7-24]. https://arxiv.org/abs/1408.5882 [29] Kingma D P, Welling M. Auto-encoding variational bayes // International Conference on Learning Representations. Banff, 2014: 1 [30] Rezende D J, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models // International Conference on Machine Learning. Beijing, 2014: 1278 [31] Maddison C J, Mnih A, Teh Y W. The concrete distribution: A continuous relaxation of discrete random variables[J/OL]. arXiv preprint (2016-11-2) [2022-7-24].https://arxiv.org/abs/1611.00712 [32] Schwaller P, Vaucher A C, Laino T, et al. Prediction of chemical reaction yields using deep learning. Mach Learn:Sci Technol, 2021, 2(1): 015016 doi: 10.1088/2632-2153/abc81d [33] Landrum G. Rdkit documentation[J/OL]. Rdkit (2012-12-1) [2022-7-24]. http://www.rdkit.org/RDKit_Docs.2012_12_1.pdf [34] Schwaller P, Laino T, Gaudin T, et al. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci, 2019, 5(9): 1572 doi: 10.1021/acscentsci.9b00576 [35] Schwaller P, Probst D, Vaucher A C, et al. Mapping the space of chemical reactions using attention-based neural networks. Nat Mach Intell, 2021, 3(2): 144 doi: 10.1038/s42256-020-00284-w [36] Tailor S A, Opolka F L, Liò P, et al. Do we need anisotropic graph neural networks? [J/OL]. arXiv preprint (2021-4-3) [2022-7-24].https://arxiv.org/abs/2104.01481 [37] Zhang M, Li P. Nested graph neural networks // Advances in Neural Information Processing Systems. Online, 2021: 15734 [38] Chuang K V, Keiser M J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science, 2018, 362(6416): 186 [39] Sandfort F, Strieth-Kalthoff F, Kühnemund M, et al. A structure-based platform for predicting chemical reactivity. Chem, 2020, 6(6): 1379 doi: 10.1016/j.chempr.2020.02.017 -