-
摘要: 注意力缺陷多動障礙(ADHD)是兒童期最常見的精神疾病之一,在大多數情況下持續到成年期。近年來,基于功能磁共振數據的ADHD分類成為了研究熱點。文獻中已有的大多數分類算法均假設樣本是均衡的,然而事實上,ADHD數據集通常是不平衡的。傳統的學習算法會使得分類器傾向于多數類樣本,從而導致性能下降。本文研究了基于不平衡神經影像數據的ADHD分類問題,即基于靜息狀態功能磁共振數據對ADHD進行分類。采用功能連接矩陣作為分類特征,提出了一種基于多目標支持向量機的ADHD數據分類方案。該方案將不均衡數據分類問題建模為具有三個目標的支持向量機模型,其中三個目標分別為最大化分類間隔、最小化正樣本誤差和最小化負樣本誤差,進而正負樣本經驗誤差可以被分開處理。然后采用多目標優化的法向量邊界交叉法對模型進行求解,并給出一組代表性的分類器供決策者進行選擇。該方案在ADHD-200競賽的五個數據集上進行測試評估,并與傳統分類方法進行對比。實驗結果表明本文提出的三個目標支持向量機分類方案比傳統的分類方法效果好,可以有效的從算法層面解決數據不平衡問題。該方案不僅可用于輔助ADHD診斷,還可用于阿爾茨海默病和自閉癥等疾病的輔助診斷。Abstract: Attention deficit hyperactivity disorder (ADHD) is one of the most common mental disorders during childhood, which lasts until adulthood in most cases. In recent years, ADHD classification based on functional magnetic resonance imaging (fMRI) data has become a research hotspot. Most existing classification algorithms reported in the literature assume that samples are balanced; however, ADHD data sets are usually imbalanced. Imbalanced data sets can cause the performance degradation of a classifier by imbalanced learning, which tends to overfocus on the majority class. In this study, we considered an imbalanced neuroimaging classification problem: classification of ADHD using resting state fMRI. We used the functional connection matrix of fMRI as the classification feature and proposed a multi-objective data classification scheme based on a support vector machine (SVM) to aid the diagnosis of ADHD. In this scheme, the imbalanced data classification problem is formulated as an SVM model with three objectives: maximizing the margin, minimizing the sum of positive errors, and minimizing the sum of negative errors. Accordingly, the positive and negative sample empirical errors can be separately handled. Then, the model is solved by a multi-objective optimization method, i.e., normal boundary intersection method. A set of representative classifiers are computed for selection by decision makers. The proposed scheme was tested and evaluated on five data sets from the ADHD-200 consortium and compared with traditional classification methods. Experimental results show that the proposed three-objective SVM classification scheme is better than traditional classification methods reported in the literature. It can effectively address the data imbalance problem from the algorithm level. This scheme can be used in the diagnosis of ADHD as well as other diseases, such as Alzheimer’s and Autism.
-
表 1 ADHD-200數據集描述
Table 1. Description of ADHD-200 data sets
Data set Total number of subjects Number of ADHD subjects Number of NC subjects KKI 83 22 61 NYU 216 118 98 Peking-1 85 24 61 Peking-2 67 35 32 Peking-joint 194 78 116 表 2 訓練集/交叉驗證集上的性能評價
Table 2. Evaluation of the training/cross-validation data set
Classifier 1 2 3 4 5 6 Accuracy 0.6600/0.6842 0.6400/0.6842 0.6600/0.7368 0.6800/0.3842 0.6600/0.7368 0.6000/0.6316 G-means 0.6547/0.5311 0.6607/0.6202 0.6929/0.7161 0.7237/0.6794 0.7182/0.7596 0.6299/0.5883 Classefier 7 8 9 10 11 Accuracy 0.6000/0.6842 0.6200/0.6842 0.6000/0.6842 0.6200/0.6842 0.5800/0.5789 G-means 0.6299/0.6794 0.6726/0.6794 0.6547/0.7161 0.6841/0.7161 0.6268/0.5991 表 3 不同方法的平均準確度/g-means值
Table 3. Average accuracy/g-means value for different methods
Data set L1SVM L2SVM B-SVM RF ELM T-SVM KKI 0.635/0.421 0.634/0.515 0.732/0.527 0.725/0.530 0.696/0.622 0.753/0.606 NYU 0.545/0.543 0.556/0.542 0.643/0.624 0.608/0.610 0.588/0.594 0.703/0.698 Peking-1 0.725/0.683 0.714/0.664 0.801/0.677 0.770/0.688 0.677/0.647 0.813/0.711 Peking-2 0.636/0.637 0.665/0.683 0.807/0.776 0.635/0.649 0.564/0.601 0.845/0.851 Peking-joint 0.630/0.615 0.624/0.611 0.742/0.764 0.665/0.686 0.625/0.613 0.751/0.743 MNIST 0.977/0.783 0.978/0.797 0.979/0.800 0.975/0.790 0.969/0.00 0.984/0.849 259luxu-164 -
參考文獻
[1] American Psychiatric Association. Diagnostic and statistical manual of mental disorders. BMC Med, 2013, 17: 133 [2] Saad J F, Kohn M R, Clarke S, et al. Is the theta/beta EEG marker for ADHD inherently flawed? J Attention Disord, 2018, 22(9): 815 doi: 10.1177/1087054715578270 [3] Chang C W, Ho C C, Chen J H. ADHD classification by a texture analysis of anatomical brain MRI data. Front Syst Neurosci, 2012, 6: 66 [4] Kuang L D, Lin Q H, Gong X F, et al. Model order effects on ICA of resting-state complex-valued fMRI data: application to schizophrenia. J Neurosci Methods, 2018, 304: 24 doi: 10.1016/j.jneumeth.2018.02.013 [5] Hojjati S H, Ebrahimzadeh A, Khazaee A, et al. Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM. J Neurosci Methods, 2017, 282: 69 doi: 10.1016/j.jneumeth.2017.03.006 [6] Castellanos F X, Margulies D S, Kelly C, et al. Cingulate-precuneus interactions: a new locus of dysfunction in adult attention-deficit/hyperactivity disorder. Biol Psychiat, 2008, 63(3): 332 doi: 10.1016/j.biopsych.2007.06.025 [7] Du J Q, Wang L P, Jie B, et al. Network-based classification of ADHD patients using discriminative subnetwork selection and graph kernel PCA. Comput Med Imag Graph, 2016, 52: 82 doi: 10.1016/j.compmedimag.2016.04.004 [8] Qureshi M N I, Jo H J, Lee B. ADHD subgroup discrimination with global connectivity features using hierarchical extreme learning machine: resting-state FMRI study // 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). Melbourne, 2017: 529 [9] Miao B, Zhang Y L. A feature selection method for classification of ADHD // Proceedings of 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS). Dalian, 2017: 21 [10] Riaz A, Asad M, Alonso E, et al. Fusion of fMRI and non-imaging data for ADHD classification. Comput Med Imag Graph, 2018, 65: 115 doi: 10.1016/j.compmedimag.2017.10.002 [11] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321 doi: 10.1613/jair.953 [12] Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell, 2016, 5(4): 221 doi: 10.1007/s13748-016-0094-0 [13] He H B, Garcia E A. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 2009, 21(9): 1263 doi: 10.1109/TKDE.2008.239 [14] Shao L Z, Xu Y D, Fu D M. Classification of ADHD with bi-objective optimization. J Biomed Inf, 2018, 84: 164 doi: 10.1016/j.jbi.2018.07.011 [15] Bellec P, Chu C, Chouinard-Decorte F, et al. The neuro bureau ADHD-200 preprocessed repository. Neuroimage, 2017, 144: 275 doi: 10.1016/j.neuroimage.2016.06.034 [16] Friston K J. Functional and effective connectivity: a review. Brain Connect, 2011, 1(1): 13 doi: 10.1089/brain.2011.0008 [17] Reris R, Brooks J P. Principal component analysis and optimization: a tutorial // Proceedings of 14th INFORMS Computing Society Conference, Richmond, Virginia, US, 2015: 212 [18] Cortes C, Vapnik V. Support-vector networks. Mach Learn, 1995, 20(3): 273 [19] Aytug H, Say?n S. Exploring the trade-off between generalization and empirical errors in a one-norm SVM. Eur J Oper Res, 2012, 218(3): 667 doi: 10.1016/j.ejor.2011.11.037 [20] A?kan A, Say?n S. SVM classification for imbalanced data sets using a multiobjective optimization framework. Ann Oper Res, 2014, 216(1): 191 doi: 10.1007/s10479-012-1300-5 [21] Das I, Dennis J E. Normal-boundary intersection: a new method for generating the Pareto surface in nonlinear multicriteria optimization problems. SIAM J Optim, 1998, 8(3): 631 doi: 10.1137/S1052623496307510 [22] Breiman L. Random forests. Mach Learn, 2001, 45(1): 5 doi: 10.1023/A:1010933404324 [23] Peng X L, Lin P, Zhang T S, et al. Extreme learning machine-based classification of ADHD using brain structural MRI data. PloS One, 2013, 8(11): e79476 doi: 10.1371/journal.pone.0079476 -