大語言模型研究現狀與趨勢

王耀祖; 李擎; 戴張杰; 徐越

doi:10.13374/j.issn2095-9389.2023.10.09.003

摘要: 在過去20年中，語言建模（Language models，LM）已經成為一種主要方法，用于語言理解和生成，同時作為自然語言處理（Natural language processing，NLP）領域下游的關鍵技術受到廣泛關注. 近年來，大語言模型（Large language models，LLMs），例如ChatGPT等技術，取得了顯著進展，對人工智能乃至其他領域的變革和發展產生了深遠的影響. 鑒于LLMs迅猛的發展，本文首先對LLMs相關技術架構和模型規模等方面的演進歷程進行了全面綜述，總結了模型訓練方法、優化技術以及評估手段. 隨后，分析了LLMs在教育、醫療、金融、工業等領域的應用現狀，同時討論了它們的優勢和局限性. 此外，還探討了大語言模型針對社會倫理、隱私和安全等方面引發的安全性與一致性問題及技術措施. 最后，展望了大語言模型未來的研究趨勢，包括模型的規模與效能、多模態處理、社會影響等方面的發展方向. 本文通過全面分析當前研究狀況和未來走向，旨在為研究者提供關于大語言模型的深刻見解和啟發，以推動該領域的進一步發展.

Abstract: Over the past two decades, language modeling (LM) has emerged as a primary methodology for language understanding and generation. This technology has become a cornerstone within the field of natural language processing (NLP). At its core, LM is designed to train models to predict the probability of the next word or token, thereby generating natural and fluent language. The advent of large language models (LLMs), such as Bidirectional Encoder Representations from Transformers and GPT-3, marks a significant milestone in the evolution of LM. These LLMs have left a profound impact on the field of artificial intelligence (AI) while also paving the way for advancements in other domains. This progression underscores the power and efficacy of AI, illustrating how the landscape of AI research has been reshaped by the rapid advancement of LLMs. This paper provides a comprehensive review of the evolution of LLMs, focusing on the technical architecture, model scale, training methods, optimization techniques, and evaluation metrics. Language models have evolved significantly over time, starting from initial statistical language models, moving onto neural network-based models, and now embracing the era of advanced pre-trained language models. As the scale of these models has expanded, so has their performance in language understanding and generation. This has led to notable results across various sectors, including education, healthcare, finance, and industry. However, the application of LLMs also presents certain challenges, such as data quality, model generalization capabilities, and computational resources. This paper delves into these issues, providing an analysis of the strengths and limitations of LLMs. Furthermore, the rise of LLMs has sparked a series of ethical, privacy, and security concerns. For instance, LLMs may generate discriminatory, false, or misleading information, infringe on personal privacy, or even be exploited for malicious activities such as cyber-attacks. To tackle these issues, this paper explores relevant technical measures, such as model interpretability, privacy protection, and security assessments. Ultimately, the paper outlines potential future research trends of LLMs. With ongoing enhancements to model scale and efficiency, LLMs are expected to play an even greater role in multimodal processing and societal impact. For example, by integrating information from different modalities, such as images and sound, LLMs can better understand and generate language. Additionally, they can be employed for societal impact assessment, providing support for policy formulation and decision-making. By thoroughly analyzing the current state of research and potential future directions, this paper aims to offer researchers valuable insights and inspiration regarding LLMs, thereby fostering further advancement in the field.

大語言模型研究現狀與趨勢

Current status and trends in large language modeling research