<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">

面向材料數據的主動回歸學習方法

Active regression learning method for material data

  • 摘要: 材料的生產環境和測量條件不同,導致用于機器學習的材料數據的噪聲較大。對材料數據進行標注需要一定的專業知識和專業技能,因此標注成本也相對較高。這兩方面的因素給機器學習應用于材料領域帶來了巨大挑戰。為應對這個挑戰,提出了一個主動回歸學習方法,由離群點檢測模塊、貪婪采樣模塊和最小變化采樣模塊組成。同其他主動學習方法相比,該方法整合了離群點檢測機制,選取高質量樣本的同時有效地排除了噪聲數據的影響,避免了沉沒成本。在公開數據集和非公開數據集上與最新的主動回歸學習方法進行了對比實驗,實驗結果表明本文方法在相同的數據量下訓練的任務模型性能指標相比于其他模型平均提高15%,且只需30%~40%的數據量作為訓練集就可以達到甚至超過使用全部數據訓練任務模型的精度。

     

    Abstract: To date, artificial intelligence has been successfully applied in various fields of material science, but these applications require a large amount of high-quality data. In practical applications, many unlabeled data points but few labeled data points can be obtained directly. The reason is that data annotations require fine and expensive experiments, and the cost of time and money cannot be ignored. Active learning can select a few high-quality samples from many unlabeled data points for labeling and use as little labeling cost as possible to optimize task model performance. However, active learning methods suitable for material attribute regression are poorly understood, and the general active learning method cannot easily avoid the negative effects of noise data, resulting in decreased costs. Therefore, we propose a new active regression learning method that includes the following features: (1) outlier detection module: using the labeled data prediction from a task model trained to fit and the labeled dataset to train the auxiliary classification model for classifying outliers and then excluding the samples that are most likely to be outliers in the unlabeled dataset; (2) greedy sampling: an iterative method is adopted to select the data farthest from the data in the labeled dataset and the selected data in the geometric space to fully consider sample diversity; and (3) minimum change sampling: selecting the unlabeled data with minimum change before and after the task model, which is trained on the labeled dataset. This part of the data is relatively lacking in the feature space of the labeled dataset. We performed experiments on the concrete slump test dataset and the negative coefficient of thermal expansion dataset and compared our method with the latest active regression learning methods. The results show that other methods do not necessarily improve task model performance after labeling data in each active learning circle on noisy datasets, and the final performance cannot reach the level of the task model trained by all data. Under the same amount of data, the performance index of the task model trained by our method is improved by 15% on average compared with other models. Because of the addition of an outlier detection mechanism, our method can effectively avoid sampling outliers when selecting high-quality samples. The task model trained using only 30%–40% of the data can achieve or even exceed the accuracy of the task model trained by all data.

     

/

返回文章
返回
<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
259luxu-164