<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
Volume 45 Issue 7
Jul.  2023
Turn off MathJax
Article Contents
CHEN Xue-hui, FENG Yan, QIAN Quan. Differential privacy protection random forest algorithm and its application in steel materials[J]. Chinese Journal of Engineering, 2023, 45(7): 1194-1204. doi: 10.13374/j.issn2095-9389.2022.05.29.002
Citation: CHEN Xue-hui, FENG Yan, QIAN Quan. Differential privacy protection random forest algorithm and its application in steel materials[J]. Chinese Journal of Engineering, 2023, 45(7): 1194-1204. doi: 10.13374/j.issn2095-9389.2022.05.29.002

Differential privacy protection random forest algorithm and its application in steel materials

doi: 10.13374/j.issn2095-9389.2022.05.29.002
More Information
  • Corresponding author: E-mail: qqian@shu.edu.cn
  • Received Date: 2022-05-29
    Available Online: 2022-07-27
  • Publish Date: 2023-07-25
  • Data-driven material informatics is considered the fourth paradigm of materials research and development (R&D), which can greatly reduce R&D costs and shorten the R&D cycle. However, the data-driven method increases the risk of privacy disclosure when sharing and using materials data and sensitive information such as key processes in materials R&D. Therefore, privacy-preserving machine learning is a key issue in material informatics. The mainstream privacy protection methods in the current times include differential privacy, secure multi-party computation, federated learning, etc. The differential privacy model proposes strict definitions and metrics for quantitative evaluation of privacy protection, and the noise added by differential privacy is independent of the data scale. Only a small amount of noise is required to achieve a high level of protection, which considerably improves data usability. A novel differential privacy preserving random forest algorithm (DPRF) is proposed based on the fact that random forest is one of the most widely used models in material informatics. DPRF introduces the Laplace mechanism and exponential mechanism of differential privacy during the decision process tree building. First, the total privacy budget for the DPRF algorithm is set and then equally divided into each decision tree. During the tree-building process, the splitting features are randomly selected in the decision tree by the exponential mechanism and noise is added to the number of nodes by the Laplace mechanism, which is effective for differential privacy protection for the random forest. In experiments such as steel fatigue prediction experiments, the efficacies of DPRF under centralized or distributed data storage are verified. By setting different privacy budgets, the R2 of the predicted results of the DPRF algorithm can reach more than 0.8 for each target feature after adding differential privacy, which is not much different from the original random forest algorithm. A distributed data storage scenario shows that with the increase of privacy budget, the R2 of each target property prediction gradually increases. Comparing the effect of different tree depths in DPRF, it is shown that the overall R2 of the target prediction tends to increase and then later decrease .as the maximum depth of the tree increases. Overall, the best prediction accuracy is achieved when the maximum depth of the tree is set at 5. In summary, DPRF has good prediction accuracy in terms of achieving differential privacy protection of random forests. Specifically, in a distributed and decentralized data environment, DPRF can strike a balance between privacy-preserving strength and prediction accuracy by setting privacy budgets, tree depth, etc., which shows a wide range of application prospects of our algorithm.

     

  • loading
  • [1]
    周水庚, 李豐, 陶宇飛, 等. 面向數據庫應用的隱私保護研究綜述. 計算機學報, 2009, 32(5):847 doi: 10.3724/SP.J.1016.2009.00847

    Zhou S G, Li F, Tao Y F, et al. Privacy preservation in database applications: A survey. Chin J Comput, 2009, 32(5): 847 doi: 10.3724/SP.J.1016.2009.00847
    [2]
    Sweeney L. k-anonymity: A model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst, 2002, 10(5): 557 doi: 10.1142/S0218488502001648
    [3]
    Du W L, Atallah M J. Secure multi-party computation problems and their applications: A review and open problems//Proceedings of the 2001 Workshop on New Security Paradigms. Cloudcroft, 2001: 13
    [4]
    Konečný J, McMahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency [J/OL]. ArXiv Preprint (2017-10-30) [2022-5-29]. https://arxiv.org/abs/1610.05492
    [5]
    Dwork C. Differential privacy//Proceedings of the 33rd International Conference on Automata, Languages and Programming. New York, 2006: 1
    [6]
    Xiong J, Zhang T Y, Shi S Q. Machine learning of mechanical properties of steels. Sci China Technol Sci, 2020, 63(7): 1247 doi: 10.1007/s11431-020-1599-5
    [7]
    Dai M Y, Hu J M. Field-free spin-orbit torque perpendicular magnetization switching in ultrathin nanostructures. Npj Comput Mater, 2020, 6: 78 doi: 10.1038/s41524-020-0347-0
    [8]
    Huber L, Hadian R, Grabowski B, et al. A machine learning approach to model solute grain boundary segregation. Npj Comput Mater, 2018, 4: 64 doi: 10.1038/s41524-018-0122-7
    [9]
    Choudhary K, Garrity K F, Sharma V, et al. High-throughput density functional perturbation theory and machine learning predictions of infrared, piezoelectric, and dielectric responses. Npj Comput Mater, 2020, 6: 64 doi: 10.1038/s41524-020-0337-2
    [10]
    Bartel C J, Trewartha A, Wang Q, et al. A critical examination of compound stability predictions from machine-learned formation energies. Npj Comput Mater, 2020, 6: 97 doi: 10.1038/s41524-020-00362-y
    [11]
    唐淑蘭, 孟勇, 王國強, 等. 結合多尺度分割和隨機森林的變質礦物提取. 工程科學學報, 2022, 44(2):170 doi: 10.3321/j.issn.1001-053X.2022.2.bjkjdxxb202202002

    Tang S L, Meng Y, Wang G Q, et al. Extraction of metamorphic minerals by multiscale segmentation combined with random forest. Chin J Eng, 2022, 44(2): 170 doi: 10.3321/j.issn.1001-053X.2022.2.bjkjdxxb202202002
    [12]
    陳亮, 付冬梅. 低合金鋼海水腐蝕監測中的雙率數據處理與建模. 工程科學學報, 2022, 44(1):95 doi: 10.3321/j.issn.1001-053X.2022.1.bjkjdxxb202201009

    Chen L, Fu D M. Processing and modeling dual-rate sampled data in seawater corrosion monitoring of low alloy steels. Chin J Eng, 2022, 44(1): 95 doi: 10.3321/j.issn.1001-053X.2022.1.bjkjdxxb202201009
    [13]
    Sigmund G, Gharasoo M, Hüffer T, et al. Deep learning neural network approach for predicting the sorption of ionizable and polar organic pollutants to a wide range of carbonaceous materials. Environ Sci Technol, 2020, 54(7): 4583 doi: 10.1021/acs.est.9b06287
    [14]
    Le T D, Noumeir R, Quach H L, et al. Critical temperature prediction for a superconductor: A variational Bayesian neural network approach. IEEE Trans Appl Supercond, 2020, 30(4): 1
    [15]
    魏孟, 王橋, 葉敏, 等. 基于NARX動態神經網絡的鋰離子電池剩余壽命間接預測. 工程科學學報, 2022, 44(3):380 doi: 10.3321/j.issn.1001-053X.2022.3.bjkjdxxb202203007

    Wei M, Wang Q, Ye M, et al. An indirect remaining useful life prediction of lithium-ion batteries based on a NARX dynamic neural network. Chin J Eng, 2022, 44(3): 380 doi: 10.3321/j.issn.1001-053X.2022.3.bjkjdxxb202203007
    [16]
    De Cock M, Dowsley R, Horst C, et al. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Trans Dependable Secure Comput, 2019, 16(2): 217 doi: 10.1109/TDSC.2017.2679189
    [17]
    Wu Y C, Cai S F, Xiao X K, et al. Privacy preserving vertical federated learning for tree-based models [J/OL]. ArXiv Preprint (2020-08-14) [2020-05-29]. https://arxiv.org/abs/2008.06170
    [18]
    Liu Y, Liu Y T, Liu Z J, et al. Federated forest. IEEE Trans Big Data, 2022, 8(3): 843 doi: 10.1109/TBDATA.2020.2992755
    [19]
    Cheng K W, Fan T, Jin Y L, et al. SecureBoost: A lossless federated learning framework. IEEE Intell Syst, 2021, 36(6): 87 doi: 10.1109/MIS.2021.3082561
    [20]
    Blum A, Dwork C, McSherry F, et al. Practical privacy: The SuLQ framework//Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Baltimore, 2005: 128
    [21]
    Friedman A, Schuster A. Data mining with differential privacy//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, 2010: 493
    [22]
    Patil A, Singh S. Differential private random forest//2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). Delhi, 2014: 2623
    [23]
    穆海蓉, 丁麗萍, 宋宇寧, 等. DiffPRFs: 一種面向隨機森林的差分隱私保護算法. 通信學報, 2016, 37(9):175 doi: 10.11959/j.issn.1000-436x.2016169

    Mu H R, Ding L P, Song Y N, et al. DiffPRFs: Random forest under differential privacy. J Commun, 2016, 37(9): 175 doi: 10.11959/j.issn.1000-436x.2016169
    [24]
    Breiman L. Random forests. Mach Learn, 2001, 45(1): 5 doi: 10.1023/A:1010933404324
    [25]
    Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis. J Priv Confidentiality, 2017, 7(3): 17 doi: 10.29012/jpc.v7i3.405
    [26]
    McSherry F, Talwar K. Mechanism design via differential privacy//48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07). Providence, 2007: 94
    [27]
    Kairouz P, Oh S, Viswanath P. The composition theorem for differential privacy. IEEE Trans Inf Theory, 2017, 63(6): 4037 doi: 10.1109/TIT.2017.2685505
    [28]
    Agrawal A, Choudhary A. An online tool for predicting fatigue strength of steel alloys based on ensemble data mining. Int J Fatigue, 2018, 113: 389 doi: 10.1016/j.ijfatigue.2018.04.017
  • 加載中

Catalog

    通訊作者: 陳斌, bchen63@163.com
    • 1. 

      沈陽化工大學材料科學與工程學院 沈陽 110142

    1. 本站搜索
    2. 百度學術搜索
    3. 萬方數據庫搜索
    4. CNKI搜索

    Figures(3)  / Tables(6)

    Article views (442) PDF downloads(59) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return
    <th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
    <progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
    <th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
    <progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
    <th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
    <progress id="5nh9l"><noframes id="5nh9l">
    259luxu-164