<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
  • 《工程索引》(EI)刊源期刊
  • 中文核心期刊
  • 中國科技論文統計源期刊
  • 中國科學引文數據庫來源期刊

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內容
驗證碼

深度神經網絡模型量化方法綜述

楊春 張睿堯 黃瀧 遆書童 林金輝 董志偉 陳松路 劉艷 殷緒成

楊春, 張睿堯, 黃瀧, 遆書童, 林金輝, 董志偉, 陳松路, 劉艷, 殷緒成. 深度神經網絡模型量化方法綜述[J]. 工程科學學報, 2023, 45(10): 1613-1629. doi: 10.13374/j.issn2095-9389.2022.12.27.004
引用本文: 楊春, 張睿堯, 黃瀧, 遆書童, 林金輝, 董志偉, 陳松路, 劉艷, 殷緒成. 深度神經網絡模型量化方法綜述[J]. 工程科學學報, 2023, 45(10): 1613-1629. doi: 10.13374/j.issn2095-9389.2022.12.27.004
YANG Chun, ZHANG Ruiyao, HUANG Long, TI Shutong, LIN Jinhui, DONG Zhiwei, CHEN Songlu, LIU Yan, YIN Xucheng. A survey of quantization methods for deep neural networks[J]. Chinese Journal of Engineering, 2023, 45(10): 1613-1629. doi: 10.13374/j.issn2095-9389.2022.12.27.004
Citation: YANG Chun, ZHANG Ruiyao, HUANG Long, TI Shutong, LIN Jinhui, DONG Zhiwei, CHEN Songlu, LIU Yan, YIN Xucheng. A survey of quantization methods for deep neural networks[J]. Chinese Journal of Engineering, 2023, 45(10): 1613-1629. doi: 10.13374/j.issn2095-9389.2022.12.27.004

深度神經網絡模型量化方法綜述

doi: 10.13374/j.issn2095-9389.2022.12.27.004
基金項目: 國家新一代人工智能(2030)重大項目(2020AAA0109701); 國家自然科學基金資助項目(62076024, 62006018);中央高校基本科研業務費資助項目(FRF-IDRY-21-018)
詳細信息
    通訊作者:

    E-mail: xuchengyin@ustb.edu.cn

  • 中圖分類號: TP183

A survey of quantization methods for deep neural networks

More Information
  • 摘要: 近年來,利用大型預訓練模型來提高深度神經網絡在計算機視覺以及自然語言處理等具體任務下的泛化能力和性能,逐漸成為基于深度學習的人工智能技術與應用的發展趨勢. 雖然這些深度神經網絡模型表現優異,但是由于模型的結構復雜、參數量龐大與計算成本極高,使得它們仍然難以被部署在如家電或智能手機等資源受限的邊緣及端側硬件平臺上,這很大程度上阻礙了人工智能技術的應用. 因此,模型壓縮與加速技術一直都是深度神經網絡模型大規模商業化應用推廣的關鍵問題之一. 當前在多種模型壓縮與加速方案中,模型量化是其中主要的有效方法之一. 模型量化技術可以通過減少深度神經網絡模型參數的位寬和中間過程特征圖的位寬,從而達到壓縮加速深度神經網絡的目的,使量化后的網絡能夠部署在資源有限的邊緣設備上,然而,由于量化會導致信息的大量丟失,如何在保證模型任務精度條件下實現模型量化已經成為熱點問題. 另外,因硬件設備以及應用場景的不同,模型量化技術已經發展成為一個多分支的研究問題. 通過全面地調研不同角度下模型量化相關技術現狀,并且深入地總結歸納不同方法的優缺點,可以發現量化技術目前仍然存在的問題,并為未來可能的發展指明方向.

     

  • 圖  1  矩陣乘法示意圖. (a) 全精度矩陣乘法運算; (b) 量化后的低精度矩陣乘法運算

    Figure  1.  Illustration of matrix multiplication (MatMul): (a) full-precision matrix multiplication; (b) quantized low-bit matrix multiplication

    圖  2  對稱與非對稱量化映射示意圖. (a) 對稱量化; (b) 非對稱量化

    Figure  2.  Illustration of symmetric and asymmetric quantization mapping: (a) symmetric quantization; (b) asymmetric quantization

    圖  3  均勻與非均勻量化的函數圖象. (a) 均勻量化; (b) 對數量化; (c) 自適應量化

    Figure  3.  Graphs of uniform and non-uniform quantization functions: (a) uniform quantization; (b) logarithmic quantization; (c) adaptive non-uniform quantization

    圖  4  量化感知訓練與后訓練量化示意圖

    Figure  4.  Illustration of quantization-aware training and post-training quantization

    圖  5  生成數據量化方法示意圖. (a) 基于蒸餾的方法; (b) 基于生成對抗網絡的方法

    Figure  5.  Illustration of generative data-free quantization: (a) distillation-based method; (b) GAN-based method

    圖  6  混合精度量化示意圖

    Figure  6.  Illustration of mixed-precision quantization

    <th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
    <progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
    <th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
    <progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"><noframes id="5nh9l">
    <span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
    <th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
    <progress id="5nh9l"><noframes id="5nh9l">
    259luxu-164
  • [1] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks // Advances in Neural Information Processing Systems. Lake Tahoe, 2012: 1106
    [2] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge // Int J Comput Vis, 2015, 115(3): 211
    [3] Yu J H, Wang Z, Vasudevan V, et al. CoCa: Contrastive captioners are image-text foundation models [J/OL]. arXiv preprint (2022-06-15) [2022-12-27]. https://arxiv.org/abs/2205.01917
    [4] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, 2009: 248
    [5] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [J/OL]. arXiv preprint (2020-10-22) [2022-12-27]. https://arxiv.org/abs/2010.11929
    [6] Krizhevsky A. Learning multiple layers of features from tiny images [J/OL]. Sciencepaper Online (2009-04-08) [2022-12-27]. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
    [7] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need // Advances in Neural Information Processing Systems. Long Beach, 2017: 5998
    [8] Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners // Advances in Neural Information Processing Systems. Vancouver, 2020: 1877
    [9] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 2016: 770
    [10] Dong X, Chen S, Pan S. Learning to prune deep neural networks via layer-wise optimal brain surgeon // Advances in Neural Information Processing Systems. Long Beach, 2017: 4857
    [11] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming // Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017: 2755
    [12] He Y H, Lin J, Liu Z J, et al. Amc: Automl for model compression and acceleration on mobile devices // Proceedings of the European Conference on Computer Vision. Munich, 2018: 815
    [13] Romero A, Ballas N, Kahou S E, et al. Fitnets: Hints for thin deep nets [J/OL]. arXiv preprint (2014-12-19) [2022-12-27]. https://arxiv.org/abs/1412.6550
    [14] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network [J/OL]. arXiv preprint (2014-12-19) [2022-12-27]. https://arxiv.org/abs/1503.02531
    [15] Ahn S, Hu S X, Damianou A, et al. Variational information distillation for knowledge transfer // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019: 9163
    [16] Bridle J S. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters // Advances in Neural Information Processing Systems. Denver, 1989: 211
    [17] Zoph B, Le Q V. Neural architecture search with reinforcement learning [J/OL]. arXiv preprint (2016-11-05) [2022-12-27]. https://arxiv.org/abs/1611.01578
    [18] Yu H B, Han Q, Li J B, et al. Search what you want: Barrier panelty NAS for mixed precision quantization // Proceedings of the European Conference on Computer Vision. Glasgow, 2020: 1
    [19] Prabhu A, Farhadi A, Rastegari M. Butterfly transform: An efficient FFT based neural architecture design // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2020: 12021
    [20] Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions // 25th British Machine Vision Conference. Nottingham, 2014
    [21] Kim Y D, Park E, Yoo S, et al. Compression of deep convolutional neural networks for fast and low power mobile applications [J/OL]. arXiv preprint (2015-11-20) [2022-12-27]. https://arxiv.org/abs/1511.06530
    [22] Gusak J, Kholyavchenko M, Ponomarev E, et al. Automated multi-stage compression of neural networks // Proceedings of the IEEE International Conference on Computer Vision. Seoul, 2020: 2501
    [23] Lebedev V, Ganin Y, Rakhuba M, et al. Speeding-up convolutional neural networks using fine-tuned cp-decomposition [J/OL]. arXiv preprint (2014-12-19) [2022-12-27]. https://arxiv.org/abs/1412.6553
    [24] Phan A H, Sobolev K, Sozykin K, et al. Stable low-rank tensor decomposition for compression of convolutional neural network // 16th European Conference Computer Vision. Glasgow, 2020: 522
    [25] Deng C H, Sun F X, Qian X H, et al. Tie: energy-efficient tensor train-based inference engine for deep neural network. // Proceedings of the 46th International Symposium on Computer Architecture. Phoenix, 2020: 264
    [26] Huang H T, Ni L B, Yu H. LTNN: An energy efficient machine learning accelerator on 3D CMOS-RRAM for layer-wise tensorized neural network // 30th IEEE International System-on-Chip Conference. Munich, 2017: 280
    [27] Cheng Y, Li G Y, Wong N, et al. Deepeye: A deeply tensor-compressed neural network hardware accelerator // Proceedings of the International Conference on Computer-Aided Design. Westminster, 2019: 1
    [28] Kao C C, Hsieh Y Y, Chen C H, et al. Hardware acceleration in large-scale tensor decomposition for neural network compression // 65th IEEE International Midwest Symposium on Circuits and Systems. Fukuoka, 2022: 1
    [29] Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper [J/OL]. arXiv preprint (2018-06-21) [2022-12-27]. https://arxiv.org/abs/1806.08342
    [30] Nagel M, Fournarakis M, Amjad R A, et al. A white paper on neural network quantization [J/OL]. arXiv preprint (2021-06-15) [2022-12-27]. https://arxiv.org/abs/2106.08295
    [31] Jacob B, Kligys S, Chen B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 2704
    [32] Reed J K, Devito Z, He H, et al. Torch. fx: Practical program capture and transformation for deep learning in Python [J/OL]. arXiv preprint (2021-12-15) [2022-12-27]. https://arxiv.org/abs/2112.08429
    [33] Siddegowda S, Fournarakis M, Nagel M, et al. Neural network quantization with AI model efficiency toolkit (AIMET) [J/OL]. arXiv preprint (2022-1-20) [2022-12-27]. https://arxiv.org/abs/2201.08442
    [34] Han S, Mao H Z, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [J/OL]. arXiv preprint (2022-01-20) [2022-12-27]. https://arxiv.org/abs/1510.00149
    [35] Miyashita D, Lee E H, Murmann B. Convolutional neural networks using logarithmic data representation [J/OL]. arXiv preprint (2016-3-3) [2022-12-27]. https://arxiv.org/abs/1603.01025
    [36] Zhou A J, Yao A B, Guo Y W, et al. Incremental network quantization: Towards lossless CNNs with low-precision weights [J/OL]. arXiv preprint (2017-02-10) [2022-12-27]. https://arxiv.org/abs/1702.03044
    [37] Li Y H, Dong X, Wang W. Additive Powers-of-two quantization: An efficient non-uniform discretization for neural networks [J/OL]. arXiv preprint (2019-09-28) [2022-12-27]. https://arxiv.org/abs/1909.13144
    [38] Liu X C, Ye M, Zhou D Y, et al. Post-training quantization with multiple points: Mixed precision without mixed precision // Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event, 2021: 8697
    [39] Xu C, Yao J Q, Lin Z C, et al. Alternating multi-bit quantization for recurrent neural networks [J/OL]. arXiv preprint (2018-02-01) [2022-12-27]. https://arxiv.org/abs/1802.00150
    [40] Zhang D Q, Yang J L, Ye D, et al. LQ-nets: Learned quantization for highly accurate and compact deep neural networks // Proceedings of the European Conference on Computer Vision. Munich, 2018: 373
    [41] Jung S, Son C, Lee S, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019: 4345
    [42] Yamamoto K. Learnable companding quantization for accurate low-bit neural networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville, 2021: 5027
    [43] Zhang Z Y, Shao W Q, Gu J W, et al. Differentiable dynamic quantization with mixed precision and adaptive resolution // International Conference on Machine Learning. Vienna, 2021: 12546
    [44] Liu Z C, Cheng K T, Huang D, et al. Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Baltimore, 2022: 4932
    [45] Choi J, Wang Z, Venkataramani S, et al. PACT: Parameterized clipping activation for quantized neural networks [J/OL]. arXiv preprint (2018-05-16) [2022-12-27]. https://arxiv.org/abs/1805.06085
    [46] Zhou S C, Wu Y X, Ni Z K, et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients [J/OL]. arXiv preprint (2016-06-20) [2022-12-27]. https://arxiv.org/abs/1606.06160
    [47] Esser S K, McKinstry J L, Bablani D, et al. Learned step size quantization [J/OL]. arXiv preprint (2019-02-21) [2022-12-27]. https://arxiv.org/abs/1902.08153
    [48] Lee S, Kim H. Feature map-aware activation quantization for low-bit neural networks // 36th International Technical Conference on Circuits/Systems, Computers and Communications. Jeju, 2021: 1
    [49] Bengio Y, Léonard N, Courville A. Estimating or propagating gradients through stochastic neurons for conditional computation [J/OL]. arXiv preprint (2019-2-21) [2022-12-27]. https://arxiv.org/abs/1308.3432
    [50] Bhalgat Y, Lee J, Nagel M, et al. LSQ+: Improving low-bit quantization through learnable offsets and better initialization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Seattle, 2020: 2978
    [51] Liu Z G, Mattina M. Learning low-precision neural networks without straight-through estimator (STE) // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao, 2019: 3066
    [52] Yang J W, Shen X, Xing J, et al. Quantization networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019: 7300
    [53] Yang Z H, Wang Y H, Han K, et al. Searching for low-bit weights in quantized neural networks // Advances in Neural Information Processing Systems. Vancouver, 2020: 4091
    [54] Gong R H, Liu X L, Jiang S H, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks // Proceedings of the IEEE International Conference on Computer Vision. Seoul, 2019: 4851
    [55] Kim D, Lee J, Ham B. Distance-aware quantization // Proceedings of the IEEE International Conference on Computer Vision. Montreal, 2022: 5251
    [56] Lee J, Kim D, Ham B. Network quantization with element-wise gradient scaling // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Virtual Event, 2021: 6448
    [57] Hubara I, Nahshan Y, Hanani Y, et al. Improving post training neural quantization: Layer-wise calibration and integer programming [J/OL]. arXiv preprint (2020-6-14) [2022-12-27]. https://arxiv.org/abs/2006.10518
    [58] Choukroun Y, Kravchik E, Yang F, et al. Low-bit quantization of neural networks for efficient inference // Proceedings of the IEEE International Conference on Computer Vision Workshops. Seoul, 2019: 3009
    [59] Fang J, Shafiee A, Abdel-Aziz H, et al. Post-training piecewise linear quantization for deep neural networks // Proceedings of the European Conference on Computer Vision. Glasgow, 2020: 69
    [60] Kullback S, Leibler R A. On information and sufficiency. Ann Math Stat, 1951, 22(1): 79 doi: 10.1214/aoms/1177729694
    [61] Nagel M, Baalen M, Blankevoort T, et al. Data-free quantization through weight equalization and bias correction // Proceedings of the IEEE International Conference on Computer Vision. Seoul, 2020: 1325
    [62] Hubara I, Nahshan Y, Hanani Y, et al. Accurate post training quantization with small calibration sets // International Conference on Machine Learning. Vienna, 2021: 4466
    [63] He X Y, Cheng J. Learning compression from limited unlabeled data // Proceedings of the European Conference on Computer Vision. Munich, 2018: 778
    [64] Sakr C, Dai S, Venkatesan R, et al. Optimal clipping and magnitude-aware differentiation for improved quantization-aware training // International Conference on Machine Learning. Baltimore, 2022: 19123
    [65] Nagel M, Amjad R A, Van Baalen M, et al. Up or down? adaptive rounding for post-training quantization // International Conference on Machine Learning. Virtual Event, 2020: 7197
    [66] Wang P S, Chen Q, He X Y, et al. Towards accurate post-training network quantization via bit-split and stitching // International Conference on Machine Learning. Virtual Event, 2020: 9847
    [67] Wang P S, Hu Q H, Zhang Y F, et al. Two-step quantization for low-bit neural networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 4376
    [68] Diao H B, Li G Y, Xu S Y, et al. Attention round for post-training quantization [J/OL]. arXiv preprint (2022-07-07) [2022-12-27]. https://arxiv.org/abs/2207.03088
    [69] Li Y H, Gong R H, Tan X, et al. BRECQ: Pushing the limit of post-training quantization by block reconstruction [J/OL]. arXiv preprint (2021-02-10) [2022-12-27]. https://arxiv.org/abs/2102.05426
    [70] Wei X Y, Gong R H, Li Y H, et al. QDrop: Randomly dropping quantization for extremely low-bit post-training quantization [J/OL]. arXiv preprint (2022-03-11) [2022-12-27]. https://arxiv.org/abs/2203.05740
    [71] Yao H, Li P, Cao J, et al. Rapq: Rescuing accuracy for power-of-two low-bit post-training quantization [J/OL]. arXiv preprint (2022-04-26) [2022-12-27]. https://arxiv.org/abs/2204.12322
    [72] Jeon Y, Lee C, Cho E, et al. Mr. BiQ: Post-training non-uniform quantization based on minimizing the reconstruction error // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Baltimore, 2022: 12319
    [73] Li Z Y, Guo C, Zhu Z D, et al. Efficient activation quantization via adaptive rounding border for post-training quantization [J/OL]. arXiv preprint (2022-08-25) [2022-12-27]. https://arxiv.org/abs/2208.11945
    [74] Banner R, Nahshan Y, Hoffer E, et al. Post-training 4-bit quantization of convolution networks for rapid-deployment [J/OL]. arXiv preprint (2022-10-2) [2022-12-27]. https://arxiv.org/abs/1810.05723
    [75] Zhao R, Hu Y W, Dotzel J, et al. Improving neural network quantization without retraining using outlier channel splitting // International Conference on Machine Learning. Long Beach, 2019: 7543
    [76] Chikin V, Antiukh M. Data-free network compression via parametric non-uniform mixed precision quantization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Baltimore, 2022: 450
    [77] Guo C, Qiu Y X, Leng J W, et al. SQuant: On-the-fly data-free quantization via diagonal hessian approximation [J/OL]. arXiv preprint (2022-02-14) [2022-12-27]. https://arxiv.org/abs/2202.07471
    [78] Yvinec E, Dapogny A, Cord M, et al. SPIQ: Data-free per-channel static input quantization [J/OL]. arXiv preprint (2022-03-28) [2022-12-27]. https://arxiv.org/abs/2203.14642
    [79] Yvinec E, Dapgony A, Cord M, et al. REx: Data-free residual quantization error expansion [J/OL]. arXiv preprint (2022-03-28) [2022-12-27]. https://arxiv.org/abs/2203.14645
    [80] Yu H C, Yang L J, Shi H. Is In-domain data really needed? A pilot study on cross-domain calibration for network quantization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville, 2021: 3037
    [81] Yao Z W, Aminabadi R Y, Zhang M J, et al. ZeroQuant: Efficient and affordable post-training quantization for large-scale transformers [J/OL]. arXiv preprint (2022-06-4) [2022-12-27]. https://arxiv.org/abs/2206.01861
    [82] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets // Advances in Neural Information Processing Systems. Montreal, 2014: 2672
    [83] Cai Y H, Yao Z W, Dong Z, et al. ZeroQ: A novel zero shot quantization framework // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2020: 13166
    [84] Haroush M, Hubara I, Hoffer E, et al. The knowledge within: Methods for data-free model compression // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2020: 8491
    [85] Zhang X G, Qin H T, Ding Y F, et al. Diversifying sample generation for accurate data-free quantization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville, 2021: 15653
    [86] Li Y H, Zhu F, Gong R H, et al. MixMix: All You need for data-free compression are feature and data mixing // 2021 Proceedings of the IEEE International Conference on Computer Vision. Montreal, 2022: 4390
    [87] He X Y, Lu J H, Xu W X, et al. Generative zero-shot network quantization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville, 2021: 2994
    [88] Horton M, Jin Y Z, Farhadi A, et al. Layer-wise data-free CNN compression [J/OL]. arXiv preprint (2020-11-18) [2022-12-27]. https://arxiv.org/abs/2011.09058
    [89] Li Z K, Ma L P, Chen M J, et al. Patch similarity aware data-free quantization for vision transformers // Proceedings of the European Conference on Computer Vision. Tel Aviv, 2022: 154
    [90] Li Z K, Chen M J, Xiao J R, et al. PSAQ-ViT V2: Towards accurate and general data-free quantization for vision transformers[J/OL]. arXiv preprint (2022-09-13) [2022-12-27]. https://arxiv.org/abs/2209.05687
    [91] Xu S K, Li H K, Zhuang B H, et al. Generative Low-bitwidth Data Free Quantization // Proceedings of the European Conference on Computer Vision. Glasgow, 2020: 1
    [92] Choi Y, Choi J, El-Khamy M, et al. Data-free network quantization with adversarial knowledge distillation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Seattle, 2020: 3047
    [93] Li B W, Huang K, Chen S A, et al. DFQF: Data free quantization-aware fine-tuning // Asian Conference on Machine Learning. Bangkok, 2020: 289
    [94] Liu Y A, Zhang W, Wang J. Zero-shot adversarial quantization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville, 2021: 1512
    [95] Choi K, Hong D, Park N, et al. Qimera: Data-free quantization with synthetic boundary supporting samples // Advances in Neural Information Processing Systems. Virtual Event, 2021: 14835
    [96] Zhong Y S, Lin M B, Nan G R, et al. IntraQ: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Baltimore, 2022: 12329
    [97] Gao Y C, Zhang Z, Hong R C, et al. Towards feature distribution alignment and diversity enhancement for data-free quantization [J/OL]. arXiv preprint (2020-11-18) [2022-12-27]. https://arxiv.org/abs/2205.00179
    [98] Choi K, Lee H Y, Hong D, et al. It's all in the teacher: Zero-shot quantization brought closer to the teacher // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Baltimore, 2022: 8301
    [99] Wu B C, Wang Y H, Zhang P Z, et al. Mixed precision quantization of convnets via differentiable neural architecture search [J/OL]. arXiv preprint (2018-11-30) [2022-12-27]. https://arxiv.org/abs/1812.00090
    [100] Khoram S, Li J. Adaptive quantization of neural networks // 6th International Conference on Learning Representations. Vancouver, 2018: 1
    [101] Dong Z, Yao Z W, Gholami A, et al. HAWQ: Hessian aware quantization of neural networks with mixed-precision // 2019 Proceedings of the IEEE International Conference on Computer Vision. Seoul, 2020: 293
    [102] Dong Z, Yao Z W, Arfeen D, et al. HAWQ-v2: Hessian aware trace-weighted quantization of neural networks. Advances in Neural Information Processing Systems. 2020, 33: 18518
    [103] Wang K, Liu Z J, Lin Y J, et al. HAQ: Hardware-aware automated quantization with mixed precision // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, 2019: 8604
    [104] Yao Z, Dong Z, Zheng Z, et al. HAWQ-v3: Dyadic neural network quantization // International Conference on Machine Learning. Vienna, 2021: 11875
    [105] Li Z F, Ni B B, Zhang W, et al. Performance guaranteed network acceleration via high-order residual quantization // Proceedings of the IEEE International Conference on Computer Vision. Venice, 2017: 2603
    [106] Li Y, Ding W R, Liu C L, et al. TRQ: Ternary neural networks with residual quantization // Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event, 2021: 8538
    [107] Li Z F, Ni B B, Yang X K, et al. Residual quantization for low bit-width neural networks. IEEE Trans Multimed, 2023, 25: 214 doi: 10.1109/TMM.2021.3124095
    [108] Naumov M, Diril U, Park J, et al. On periodic functions as regularizers for quantization of neural networks [J/OL]. arXiv preprint (2018-11-24) [2022-12-27]. https://arxiv.org/abs/1811.09862
    [109] Zhou Y R, Moosavi-Dezfooli S M, Cheung N M, et al. Adaptive quantization for deep neural network // Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, 2018: 4596
    [110] Wang T Z, Wang K, Cai H, et al. APQ: Joint search for network architecture, pruning and quantization policy // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, 2020: 2075
    [111] Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations // Advances in Neural Information Processing Systems, 2015, 28: 3123
    [112] Courbariaux M, Hubara I, Soudry D, et al. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or-1[J/OL]. arXiv preprint (2018-11-24) [2022-12-27]. https://arxiv.org/abs/1602.02830
    [113] Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks // Proceedings of the European Conference on Computer Vision. Amsterdam, 2016: 525
    [114] Lin Z H, Courbariaux M, Memisevic R, et al. Neural networks with few multiplications [J/OL]. arXiv preprint (2015-10-11) [2022-12-27]. https://arxiv.org/abs/1510.03009
    [115] Li F F, Liu B, Wang X X. Ternary weight networks [J/OL]. arXiv preprint (2016-05-16) [2022-12-27]. https://arxiv.org/abs/1605.04711
    [116] Wan D W, Shen F M, Liu L, et al. TBN: Convolutional neural network with ternary inputs and binary weights // Proceedings of the European Conference on Computer Vision. Munich, 2018: 322
    [117] Xu Z, Cheung R C C. Accurate and compact convolutional neural networks with trained binarization [J/OL]. arXiv preprint (2019-09-25) [2022-12-27]. https://arxiv.org/abs/1909.11366
    [118] Bulat A, Tzimiropoulos G. XNOR-net++: Improved binary neural networks [J/OL]. arXiv preprint (2019-09-30) [2022-12-27]. https://arxiv.org/abs/1909.13863
    [119] Hou L, Yao Q, Kwok J T. Loss-aware binarization of deep networks [J/OL]. arXiv preprint (2016-11-05) [2022-12-27]. https://arxiv.org/abs/1611.01600
    [120] Mishra A, Marr D. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy [J/OL]. arXiv preprint (2017-11-15) [2022-12-27]. https://arxiv.org/abs/1711.05852
    [121] Martinez B, Yang J, Bulat A, et al. Training binary neural networks with real-to-binary convolutions [J/OL]. arXiv preprint (2020-03-25) [2022-12-27]. https://arxiv.org/abs/2003.11535
    [122] Kingma D, Ba J. Adam: A Method for Stochastic Optimization [J/OL]. arXiv preprint (2014-12-22) [2022-12-27]. https://arxiv.org/abs/1412.6980
    [123] Lei J, Gao X, Song J, et al. Survey of deep neural network model compression. J Softw, 2018, 29(2): 251 doi: 10.13328/j.cnki.jos.005428

    雷杰, 高鑫, 宋杰, 等. 深度網絡模型壓縮綜述. 軟件學報, 2018, 29(2):251 doi: 10.13328/j.cnki.jos.005428
    [124] Li J Y, Zhao Y K, Xue Z E, et al. A survey of model compression for deep neural networks. Chin J Eng, 2019, 41(10): 1229

    李江昀, 趙義凱, 薛卓爾, 等. 深度神經網絡模型壓縮綜述. 工程科學學報, 2019, 41(10):1229
    [125] Gao H, Tian Y L, Xu F Y, et al. Survey of deep learning model compression and acceleration. J Softw, 2021, 32(1): 68 doi: 10.13328/j.cnki.jos.006096

    高晗, 田育龍, 許封元, 等. 深度學習模型壓縮與加速綜述. 軟件學報, 2021, 32(1):68 doi: 10.13328/j.cnki.jos.006096
    [126] Zhang C, Tian J, Wang Y S, et al. Review of neural network model compression methods // Proceedings of the 22nd Annual Conference on New Network Technology and Application of Network Application of China Computer Users Association in 2018. Suzhou, 2018: 7

    張弛, 田錦, 王永森, 等. 神經網絡模型壓縮方法綜述 // 中國計算機用戶協會網絡應用分會2018年第二十二屆網絡新技術與應用年會論文集. 蘇州, 2018: 7
    [127] Ji R R, Lin S H, Chao F, et al. Deep neural network compression and acceleration: A review. J Comput Res Dev, 2018, 55(9): 1871 doi: 10.7544/issn1000-1239.2018.20180129

    紀榮嶸, 林紹輝, 晁飛, 等. 深度神經網絡壓縮與加速綜述. 計算機研究與發展, 2018, 55(9):1871 doi: 10.7544/issn1000-1239.2018.20180129
    [128] Tang W H, Dong B, Chen H, et al. Survey of model compression methods for deep neural networks. Intelligent IoT AI, 2021, 4(6): 1

    唐武海, 董博, 陳華, 等. 深度神經網絡模型壓縮方法綜述. 智能物聯技術, 2021, 4(6):1
    [129] Sandler M, Howard A, Zhu M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 4510
    [130] Howard A G, Zhu M L, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications [J/OL]. arXiv preprint (2017-4-17) [2022-12-27]. https://arxiv.org/abs/1704.04861
    [131] Bondarenko Y, Nagel M, Blankevoort T. Understanding and overcoming the challenges of efficient transformer quantization [J/OL]. arXiv preprint (2017-4-17) [2022-12-27]. https://arxiv.org/abs/2109.12948
    [132] He X, Zhao K Y, Chu X W. AutoML: A survey of the state-of-the-art. Knowl Based Syst, 2021, 212: 106622 doi: 10.1016/j.knosys.2020.106622
  • 加載中
圖(6)
計量
  • 文章訪問數:  348
  • HTML全文瀏覽量:  69
  • PDF下載量:  127
  • 被引次數: 0
出版歷程
  • 收稿日期:  2022-12-27
  • 網絡出版日期:  2023-03-24
  • 刊出日期:  2023-10-25

目錄

    /

    返回文章
    返回