<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
  • 《工程索引》(EI)刊源期刊
  • 中文核心期刊
  • 中國科技論文統計源期刊
  • 中國科學引文數據庫來源期刊

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內容
驗證碼

基于領域詞典與CRF雙層標注的中文電子病歷實體識別

龔樂君 張知菲

龔樂君, 張知菲. 基于領域詞典與CRF雙層標注的中文電子病歷實體識別[J]. 工程科學學報, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004
引用本文: 龔樂君, 張知菲. 基于領域詞典與CRF雙層標注的中文電子病歷實體識別[J]. 工程科學學報, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004
GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004
Citation: GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004

基于領域詞典與CRF雙層標注的中文電子病歷實體識別

doi: 10.13374/j.issn2095-9389.2019.09.04.004
基金項目: 國家自然科學基金資助項目(61502243,61502247,61572263);浙江省智慧醫療工程技術研究中心資助項目(2016E10011);中國博士后基金資助項目(2018M632349);江蘇省高校自然科學基金資助項目(16KJB520003)
詳細信息
    通訊作者:

    E-mail:glj98226@163.com

  • 中圖分類號: TP391.1

Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF

More Information
  • 摘要: 醫療實體識別是電子病歷文本信息抽取的基本任務。針對中文電子病歷文本復合實體較多、實體長度較長、句子成分缺失嚴重、實體邊界不清的語言特點以及標注語料難以獲取的現狀,提出了一種基于領域詞典和條件隨機場(CRF)的雙層標注模型。該模型通過對外部資源的統計分析構建醫療領域詞典,再結合條件隨機場,進行了兩次不同粒度的標注,將領域詞典識別的準確性和機器學習的自動性融為一體,從中文電子病歷文本中識別出疾病、癥狀、藥品、操作四類醫療實體。該模型在測試數據中的宏精確率為96.7%、宏召回率為97.7%、宏F1值為97.2%。同時對比分析了采用注意力機制的深度神經網絡的識別效果,因受到領域數據集大小的限制,在該測試數據集中后者表現不佳。實驗結果表明了該雙層標注模型對中文醫療實體識別的高效性。

     

  • 圖  1  基于領域詞典與CRF的雙層標注模型

    Figure  1.  Double-layer annotation model

    圖  2  DLAM與BiLSTM-Attention-CRF實體級別精確率對比

    Figure  2.  DLAM and BiLSTM-Attention-CRF precision comparison on entity

    圖  3  DLAM與BiLSTM-Attention-CRF實體級別召回率對比

    Figure  3.  DLAM and BiLSTM-Attention-CRF recall comparison on entity

    表  1  訓練集、測試集實體分布情況

    Table  1.   Distribution of entities among the training set and the test set

    DatasetDiseasesSymptomsDrugsOperationsTotal
    Training set701264854621386033
    Test set27310432089182442
    下載: 導出CSV

    表  2  領域詞典構成情況

    Table  2.   Distribution among the domain dictionary

    TypeDiseasesSymptomsOperationsDrugsKeywordsOrgansLocationPrivative
    Amount1212934611777303511612
    下載: 導出CSV

    表  3  CRF對比實驗結果

    Table  3.   Comparison experiment results of CRF %

    ModelMarco-PMarco-RMarco-F1
    Baseline(Single-layer CRF)83.368.168.1
    DLAM96.797.797.2
    下載: 導出CSV

    表  4  BiLSTM-Attention-CRF對比實驗結果

    Table  4.   Comparison experiment results of BiLSTM-Attention-CRF %

    Different characters embeddingMarco-PMarco-RMarco-F1
    Randomly initializes embedding69.5269.7069.38
    50-dimension embedding53.4254.3153.74
    150-dimension embedding73.4377.8575.54
    300-dimension embedding55.3661.0357.88
    下載: 導出CSV

    表  5  DLAM與現有模型結果對比

    Table  5.   Comparison of DLAM and existing model results %

    ModelMarco-PMarco-RMarco-F1
    CRF_multi-features[27]92.0387.0989.49
    BiLSTM-CRF[27]91.1289.7490.43
    DLAM96.7097.7097.20
    下載: 導出CSV
    <th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
    <progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
    <th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
    <progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
    <th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
    <progress id="5nh9l"><noframes id="5nh9l">
    259luxu-164
  • [1] Zhang L B. Word Segmentation and Named Entity Mining Based on Semi Supervised Learning for Chinese EMR[Dissertation]. Harbin: Harbin Institute of Technology, 2014

    張立邦. 基于半監督學習的中文電子病歷分詞和名實體挖掘[學位論文]. 哈爾濱: 哈爾濱工業大學, 2014
    [2] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[J/OL]. arXiv preprint. (2015-08-09) [2019-09-04]. https://arxiv.org/abs/1508.01991
    [3] Wang Y Q, Yu Z H, Chen L, et al. Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. J Biomed Inf, 2014, 47: 91 doi: 10.1016/j.jbi.2013.09.008
    [4] Xu Y, Wang Y N, Liu T R, et al. Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. J Am Med Inf Assoc, 2014, 21(e1): e84 doi: 10.1136/amiajnl-2013-001806
    [5] Lei J B, Tang B Z, Lu X Q, et al. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inf Assoc, 2014, 21(5): 808 doi: 10.1136/amiajnl-2013-002381
    [6] Xu Y, Ge Y Q, Wang Q, et al. Medical name entity recognition and application in Chinese admission record of stroke patients based on CRF and RUTA rule. J Sun Yat-sen Univ Med Sci, 2018, 39(3): 455

    許源, 葛艷秋, 王強, 等. 基于CRF與RUTA規則相結合的卒中入院記錄醫學實體識別及應用. 中山大學學報(醫學版), 2018, 39(3):455
    [7] Zhang X W, Li Z. Chinese electronic medical record named entity recognition based on multi-feature fusion. Softw Guide, 2017, 16(2): 128

    張祥偉, 李智. 基于多特征融合的中文電子病歷命名實體識別. 軟件導刊, 2017, 16(2):128
    [8] Yu L, Jin L Z, Wang M F, et al. Recognition of human hypoxic state based on deep learning. Chin J Eng, 2019, 41(6): 817

    于露, 金龍哲, 王夢飛, 等. 基于深度學習的人體低氧狀態識別. 工程科學學報, 2019, 41(6):817
    [9] Xia Y B, Zhen J L, Zhao Y F, et al. Deep learning based named entity recognition of electronic medical record. Electron Sci Technol, 2018, 31(11): 31

    夏宇彬, 鄭建立, 趙逸凡, 等. 基于深度學習的電子病歷命名實體識別. 電子科技, 2018, 31(11):31
    [10] Li F, Zhang M S, Tian B, et al. Recognizing irregular entities in biomedical text via deep neural networks. Pattern Recognit Lett, 2018, 105: 105 doi: 10.1016/j.patrec.2017.06.009
    [11] Liu Z J, Yang M, Wang X L, et al. Entity recognition from clinical texts via recurrent neural networks. BMC Med Inf Decis Making, 2017, 17(Suppl 2): 67
    [12] Chowdhury S, Dong X S, Qian L J, et al. A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. BMC Bioinf, 2018, 19(Suppl 17): 499
    [13] Shen Z. Named Entity Recognition for Chinese Electronic Record with Neural Network[Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2018

    申站.基于神經網絡的中文電子病歷命名實體識別[學位論文]. 北京: 北京郵電大學, 2018
    [14] Wei Q K, Chen T, Xu R F, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database, 2016, 2016: baw140 doi: 10.1093/database/baw140
    [15] Wu Y H, Yang X, Bian J, et al. Combine factual medical knowledge and distributed word representation to improve clinical named entity recognition. AMIA Annu Symp Proc, 2018, 2018: 1110
    [16] Jagannatha A N, Yu H. Bidirectional RNN for medical event detection in electronic health records // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California, 2016: 473
    [17] Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records[J/OL]. arXiv preprint. (2018-05-11) [2019-09-04]. https://arxiv.org/abs/1801.07860
    [18] Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review. J Biomed Inf, 2018, 77: 34 doi: 10.1016/j.jbi.2017.11.011
    [19] Luka G, Andrey K, Paul G, et al. Named entity recognition in electronic health records using transfer learning bootstrapped neural networks[J/OL]. arXiv preprint. (2019-07-29) [2019-09-04]. https://arxiv.org/abs/1901.01592
    [20] Li W, Zhao D Z, Li B, et al. Combining CRF and rule based medical named entity recognition. Appl Res Comput, 2015, 32(4): 1082 doi: 10.3969/j.issn.1001-3695.2015.04.029

    栗偉, 趙大哲, 李博, 等. CRF與規則相結合的醫學病歷實體識別. 計算機應用研究, 2015, 32(4):1082 doi: 10.3969/j.issn.1001-3695.2015.04.029
    [21] Shi C Y, Xu Z J, Yang X J. Study of TFIDF algorithm. J Comput Appl, 2009, 29(Suppl 1): 167

    施聰鶯, 徐朝軍, 楊曉江. TFIDF算法研究綜述. 計算機應用, 2009, 29(增刊 1):167
    [22] Li H, Statistical learning methods. Beijing: Tsinghua University Press, 2012

    李航. 統計學習方法. 北京: 清華大學出版社, 2012
    [23] Yang J F, Guan Y, He B, et al. Corpus construction for named entities and entity relations on Chinese electronic medical records. J Softw, 2016, 27(11): 2725

    楊錦鋒, 關毅, 何彬, 等. 中文電子病歷命名實體和實體關系語料庫構建. 軟件學報, 2016, 27(11):2725
    [24] Uzuner O, South B R, Shen S Y, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inf Assoc, 2011, 18(5): 552 doi: 10.1136/amiajnl-2011-000203
    [25] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]. arXiv preprint. (2017-12-06) [2019-09-04]. https://arxiv.org/abs/1706.03762
    [26] Luo L, Yang Z, Yang P, et al. An attention-based BiLSTM-CRF approach to document level chemical named entity recognition. Bioinformatics, 2018, 34(8): 1381 doi: 10.1093/bioinformatics/btx761
    [27] Zhang Y, Wang X W, Hou Z, et al. Clinical named entity recognition from Chinese electronic health records via machine learning methods. JMIR Med Inf, 2018, 6(4): e50 doi: 10.2196/medinform.9965
  • 加載中
圖(3) / 表(5)
計量
  • 文章訪問數:  1866
  • HTML全文瀏覽量:  2098
  • PDF下載量:  91
  • 被引次數: 0
出版歷程
  • 收稿日期:  2019-09-04
  • 刊出日期:  2020-04-01

目錄

    /

    返回文章
    返回