Continuous speech recognition by convolutional neural networks
-
摘要: 在語音識別中,卷積神經網絡(convolutional neural networks,CNNs)相比于目前廣泛使用的深層神經網絡(deep neural network,DNNs),能在保證性能的同時,大大壓縮模型的尺寸.本文深入分析了卷積神經網絡中卷積層和聚合層的不同結構對識別性能的影響情況,并與目前廣泛使用的深層神經網絡模型進行了對比.在標準語音識別庫TIMIT以及大詞表非特定人電話自然口語對話數據庫上的實驗結果證明,相比傳統深層神經網絡模型,卷積神經網絡明顯降低模型規模的同時,識別性能更好,且泛化能力更強.Abstract: Convolutional neural networks (CNNs), which show success in achieving translation invariance for many image processing tasks, were investigated for continuous speech recognition. Compared to deep neural networks (DNNs), which are proven to be successful in many speech recognition tasks nowadays, CNNs can reduce the neural network model sizes significantly, and at the same time achieve even a better recognition accuracy. Experiments on standard speech corpus TIMIT and conversational speech corpus show that CNNs outperform DNNs in terms of the accuracy and the generalization ability.
-

計量
- 文章訪問數: 276
- HTML全文瀏覽量: 40
- PDF下載量: 24
- 被引次數: 0