基于U-Net架構改進VGG19模型的人臉表情識別方法

趙小虎; 張景怡; 焦明之; 謝禮遜; 王蘭飛; 孫維青; 張狄

doi:10.13374/j.issn2095-9389.2024.07.24.002

摘要: 針對傳統面部識別技術中存在的諸多問題，如網絡模型對關鍵通道特征的關注不足、參數量過大以及識別準確率不高等，本文提出了一種基于改進Visual Geometry Group 19（VGG19）模型的全新方案. 該方案融合了U-Net網絡架構的設計理念，并引入了改進的SEAttention模塊，以期提高模型的收斂速度和對面部細節的關注程度. 在保持VGG19深層特征提取能力的基礎上，通過特定設計的卷積層和跳躍連接，實現了對特征的高效融合與優化. 經過改進的VGG19模型，不僅能更好地提取面部特征，還能在保證準確率的前提下，降低模型參數，提高運算效率. 為了驗證改進模型的效果，利用FER2013數據集和CK+兩個數據集對本文提出的模型進行了測試. 實驗結果顯示，改進后的VGG19網絡在表情識別的準確率上分別取得了1.58%和4.04%的提升. 這一結果充分證明了本文提出的方法在解決傳統面部識別問題方面的優越性，也為面部識別技術的進一步發展提供了新的思路.

Abstract: In response to the challenges faced by traditional facial recognition techniques, such as insufficient focus on key channel features, large number of parameters, and low recognition accuracy, this study proposes an improved VGG19 model that incorporates concepts from the U-Net architecture. While maintaining the deep feature extraction capability of VGG19, which is well-regarded in the field, the model employs specially designed convolutional layers and skip connections. The use of feature cropping and stitching techniques allows the model to efficiently integrate multi-scale features, thereby enhancing the robustness and effectiveness of facial expression recognition tasks. This design ensures the seamless integration of features from different layers, which is crucial for accurate facial expression recognition, as it maximizes the information yielded from each layer. Additionally, this paper introduces an improved SEAttention module, specifically designed for facial expression recognition tasks. The innovation of the SEAttention module lies in replacing the original activation function with the Mish activation function, which can dynamically adjust the weights of different channels to enhance performance. This adjustment ensures that important features are emphasized while redundant features are suppressed, streamlining the recognition process. This selective focus significantly speeds up the convergence of the network and improves the ability of the model to detect subtle changes in facial expressions, which is especially valuable in nuanced emotional contexts. Furthermore, modifications are made to the fully connected layers by substituting the first two layers with convolutional layers while retaining the fully connected final layer. This change reduces the number of nodes in these layers from 4096, 4096, 1000 to just 7, effectively addressing the large parameter size in the VGG19 network. Additionally, this modification improves the resistance of the model to overfitting, making it more robust when applied to new data. Extensive experiments were conducted on the FER2013 and CK+ datasets, demonstrating that the improved VGG19 model significantly enhanced recognition accuracy by 1.58% and 4.04%, respectively, compared to the original version. Furthermore, the parameter efficiency of the model was thoroughly evaluated, which indicated a substantial reduction in the overall parameter count without compromising performance. This balance between model complexity and accuracy highlights the practical applicability of the proposed method in real-world facial recognition scenarios, ensuring that it can be deployed in environments with limited computational resources. In conclusion, integrating the U-Net architecture and enhanced SEAttention module into the VGG19 network led to significant advancements in facial expression recognition. The improved model not only boosts performance in terms of feature extraction and fusion but is also adept in solving the pressing problems of parameter size and computational efficiency. These innovations contribute to achieving state-of-the-art performance in facial expression recognition, making the proposed method an important contribution to advancing computer vision and deep learning. The robustness and efficiency of the proposed method highlight its potential for various applications requiring accurate real-time facial expression analysis, such as human-computer interaction, security systems, and emotion-driven computing. Future work will explore the adaptability of the model to other datasets and additional optimization techniques, aiming to further enhance its performance and expand its applicability across diverse use cases.

基于U-Net架構改進VGG19模型的人臉表情識別方法

U-Net-based VGG19 model for improved facial expression recognition