面向少樣本的知識與數據的跨模態特征融合模型

柒傳江; 王成猛; 邵立珍; 付冬梅; 周珂; 趙志毅

doi:10.13374/j.issn2095-9389.2024.10.30.004

摘要: 在少樣本學習中，純數據驅動式學習容易出現過擬合和泛化能力下降等問題，而知識與數據的融合不足也會導致模型性能受限. 因此，本文針對少樣本學習問題，提出了一種深度跨模態特征融合模型（KDFM）以融合領域知識和結構化數據特征，進而提升下游任務的性能. KDFM采用多特征交互框架：首先，基于知識圖譜建模領域中用語義表達的知識的模態，利用TransE算法提取知識節點的嵌入表示；其次，將結構化的數據模態映射為圖網絡，通過多通道圖卷積網絡捕捉特征間的高階關聯；最后，設計注意力機制動態對齊知識嵌入與數據特征，實現跨模態信息的自適應融合. 本文將所提模型分別在材料回歸和醫學分類兩個少樣本數據集上進行了驗證. 相比其他純數據驅動的模型，所提模型在各項回歸和分類任務上均取得了較好的結果. 消融實驗結果表明了所提模型的知識建模部分和跨模態融合部分的有效性. 這也說明KDFM通過多特征協同建模與高效融合策略，在一定程度上解決了少樣本下模型泛化能力弱，知識與數據模態融合困難的問題.

Abstract: The few-shot problem is a common phenomenon in machine learning, particularly in experimental science and medical research. Pure data-driven learning relies heavily on the quality and quantity of data. When data is scarce, the model is prone to overfitting and its generalization ability will decrease. However, most fields have accumulated extensive experience and knowledge. A hybrid approach that combines domain knowledge with data can effectively improve model performance. However, in the context of few-shot problems, achieving effective cross-modal feature fusion of knowledge and data is challenging. This study proposes a knowledge and data cross-modal fusion model (KDFM) to address the few-shot problem. First, numerical modal features are categorized into different feature types and modeled using graphs. For each feature type, edges within the graphs are constructed based on K-means clustering. Then, the different types of numerical features are processed through multichannel graph convolution. These graphs convert numerical modal features into graph-level features, enhancing their expressiveness. Subsequently, domain knowledge features from semantic modalities are represented by a knowledge graph. Key entities and relationships are extracted from professional books and expert experiences. The knowledge graph consists of triples formed by combinations of entities and relationships, enabling the transformation of unstructured text features into graph-level features. Textual domain knowledge and experience are organized and converted into the neural network model. A graph convolutional neural network and attention mechanisms are employed for cross-modal feature fusion between knowledge and data. The input of the graph convolutional network includes different graphs constructed from numerical data, feature vectors obtained from the knowledge graph, and numerical vectors from the data. Based on the number of feature types, multichannel graph convolution is applied to achieve deep feature fusion of knowledge and data. The output is a fused multichannel feature vector, computed using the attention mechanism, which serves as the input feature vector for downstream tasks. The proposed model was validated using two small sample datasets: one for a regression task in the materials field and the other for a classification task in the medical field. Simulation results show that, compared with other data-driven models, the proposed KDFM model exhibits excellent performance across various regression and classification tasks. In the regression task, the model achieved the best results in terms of mean squared error, mean absolute error, and R2, with R2 exceeding the suboptimal multilayer perceptron model by over 7%. In the classification task, the model was optimal in five out of seven indicators, with the remaining two indicators being suboptimal. Additionally, multiple ablation experiments were conducted to verify the effectiveness of the proposed model. By removing the modules of the knowledge graph and graph convolutional network from the full model, the study confirmed the effectiveness of both the knowledge modeling and cross-modal fusion mechanism. The proposed model addresses, to some extent, the challenges of weak generalization ability and the integration of knowledge and data modalities in few-shot problems.

面向少樣本的知識與數據的跨模態特征融合模型

Integrating knowledge and data: a cross-modal feature fusion model for few-shot problems