多模態學習方法綜述

陳鵬; 李擎; 張德政; 楊宇航; 蔡錚; 陸子怡

doi:10.13374/j.issn2095-9389.2019.03.21.003

摘要: 大數據是多源異構的。在信息技術飛速發展的今天，多模態數據已成為近來數據資源的主要形式。研究多模態學習方法，賦予計算機理解多源異構海量數據的能力具有重要價值。本文歸納了多模態的定義與多模態學習的基本任務，介紹了多模態學習的認知機理與發展過程。在此基礎上，重點綜述了多模態統計學習方法與深度學習方法。此外，本文系統歸納了近兩年較為新穎的基于對抗學習的跨模態匹配與生成技術。本文總結了多模態學習的主要形式，并對未來可能的研究方向進行思考與展望。

Abstract: “Big data” is always collected from different resources that have different data structures. With the rapid development of information technologies, current precious data resources are characteristic of multimodes. As a result, based on classical machine learning strategies, multi-modal learning has become a valuable research topic, enabling computers to process and understand “big data”. The cognitive processes of humans involve perception through different sense organs. Signals from eyes, ears, the nose, and hands (tactile sense) constitute a person’s understanding of a special scene or the world as a whole. It reasonable to believe that multi-modal methods involving a higher ability to process complex heterogeneous data can further promote the progress of information technologies. The concepts of multimodality stemmed from psychology and pedagogy from hundreds of years ago and have been popular in computer science during the past decade. In contrast to the concept of “media”, a “mode” is a more fine-grained concept that is associated with a typical data source or data form. The effective utilization of multi-modal data can aid a computer understand a specific environment in a more holistic way. In this context, we first introduced the definition and main tasks of multi-modal learning. Based on this information, the mechanism and origin of multi-modal machine learning were then briefly introduced. Subsequently, statistical learning methods and deep learning methods for multi-modal tasks were comprehensively summarized. We also introduced the main styles of data fusion in multi-modal perception tasks, including feature representation, shared mapping, and co-training. Additionally, novel adversarial learning strategies for cross-modal matching or generation were reviewed. The main methods for multi-modal learning were outlined in this paper with a focus on future research issues in this field.

多模態學習方法綜述

A survey of multimodal machine learning