基于卷積與Transformer融合框架的列車輪對軸承損傷識別方法

鄧飛躍; 蔡毓龍; 王銳; 鄭守禧

doi:10.13374/j.issn2095-9389.2024.01.02.003

基于卷積與Transformer融合框架的列車輪對軸承損傷識別方法

Train wheelset bearing damage identification method based on convolution and transformer fusion framework

摘要

摘要: 針對傳統機器視覺方法在列車輪對軸承損傷檢測中存在的圖像特征提取不敏感、專家經驗要求高以及識別準確率偏低等問題，本文提出了一種基于卷積與Transformer融合框架的列車輪對軸承損傷識別方法. 首先，發展了一種圖像增強類別重組的預處理方法，消除不同類別數據樣本不均衡的影響，提高圖像數據集質量；其次，基于卷積與自注意力融合思想，設計了VGG與Transformer雙分支并行融合網絡（VGG and Transformer parallel fusion network, VTPF-Net）,綜合獲取圖像全局輪廓特征與局部細節特征信息；再次，構建了多尺度膨脹空間金字塔卷積（Multiscale dilation spatial pyramid convolution, MDSPC）模塊，利用多尺度膨脹卷積遞進融合充分挖掘特征圖中多尺度語義特征；最后，基于NEU-DET圖像缺陷數據集與自建列車輪對軸承圖像數據集進行了實驗分析. 結果表明，所提模型對NEU-DET數據中6類缺陷圖像與輪對軸承4類故障圖像的識別準確率分別為99.44%與98%，能夠較為準確識別不同損傷類型圖像樣本，在不明顯增加模型復雜度基礎上各項評價指標要顯著優于當前CNN模型、自注意力機制ViT模型以及CNN-Transformer融合模型.

Abstract: To address the issues of image feature insensitivity, high requirement of expert experience, and low recognition accuracy of traditional machine vision methods in train wheelset bearing damage detection, this paper proposes an identification method based on the framework of convolutional and transformer fusion networks for identifying damage to train wheelset bearings. First, due to the complexity of train-bearing images, their category imbalance is more severe; an image preprocessing method called image enhancement category reorganization is used to improve the quality of the acquired image dataset and eliminate the effects of the imbalance dataset. Second, a convolutional neural network (CNN) has high model construction and training efficiency due to adopting a local sensing field and weight-sharing strategy, which can only sense local neighborhoods but has limited ability to capture global feature information. Transformer is a network model based on a self-attention mechanism. With strong parallel computing ability, it can learn the remote dependencies between image pixels in the global scope and has a more powerful global information extraction ability. However, the ability to mine the local features of the image is not sufficient. Therefore, this paper presents a VGG and transformer parallel fusion network that integrates the global contour features and local details of the image based on the fusion of convolution and self-attention. Furthermore, a multiscale dilation spatial pyramid convolution (MDSPC) module is constructed to fully mine the multiscale semantic features in the feature map using multiscale dilation convolution progressive fusion. The proposed method effectively solves the problem of feature information loss due to the mesh effect caused by the expansion convolution. Additionally, embedding coordinate attention (CA) after the MDSPC module can obtain remote dependencies and more precise positional relationships of feature images from two spatial directions, which can more accurately focus on specific regions in the feature map. Finally, experimental analyses were conducted using the NEU-DET image defect and self-constructed train wheelset bearing image datasets. The experimental results demonstrate that the proposed model has an accuracy of 99.44% and 98% for recognizing six types of defects and four types of images of wheelset bearings in NEU-DET data, respectively. The feature extraction capability of the proposed model was verified using model visualization methods. Compared with existing CNN models, ViT model with self-attention mechanism, and CNN-transformer fusion model, the proposed method shows significantly better evaluation metrics and accurately identifies different types of image samples without significantly increasing the model complexity.

HTML全文

參考文獻(25)

施引文獻

資源附件(0)