基于FPGA的语义信息处理加速器设计
DOI:
CSTR:
作者:
作者单位:

1.广东工业大学自动化学院 广州 510006;2.广东工业大学集成电路学院 广州 510006

作者简介:

通讯作者:

中图分类号:

TN46

基金项目:

国家自然科学基金重点项目(U22A2054)资助


Design of a semantic information processing accelerator based on FPGA
Author:
Affiliation:

1.School of Automation, Guangdong University of Technology,Guangzhou 510006,China; 2.School of Integrated Circuit, Guangdong University of Technology,Guangzhou 510006,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在语义通信中,图像语义信息处理高度依赖于计算复杂度高的卷积神经网络,尤其在处理高分辨率图像时,对计算性能要求更高,这对语义通信在边缘场景中的应用提出了巨大挑战。为此,本文提出了一种基于FPGA的语义信息处理加速器,创新性地将卷积神经网络编码器和rANS编码融合在同一硬件加速器中。具体而言,加速器采用融合乘累加器的脉动阵列架构、循环分块策略和双缓存结构,以充分利用FPGA的并行计算能力与片上存储资源,提升数据传输效率与计算性能。每个处理单元集成多个乘累加单元,可在每个时钟周期完成两个INT8乘法并局部累加。最终,对输出特征采用rANS进行8路并行编码,进一步压缩特征数据。实验结果表明,在ZCU104平台上,本设计在处理1080P图像时达到300.5 GOPS的吞吐量,能效比为66.77 GOPS/W,处理速度比Intel CPU提升约6倍,比ARM CPU提升约58倍。与其他FPGA加速器相比,BRAM效率分别提升约730%、40%和63%,能效比分别提升约802%、60%和3%,DSP效率分别提升约476%、70%和133%。所提出的加速器在性能上具有显著优势,可高效处理图像语义信息,具有广泛的实际应用意义。

    Abstract:

    In semantic communication, image semantic information processing heavily relies on computationally intensive convolutional neural networks, which require higher computational performance, especially when handling high-resolution images. This presents a significant challenge for the application of semantic communication in edge scenarios. To address this, this paper proposes an FPGA-based semantic information processing accelerator, which innovatively integrates the convolutional neural network encoder and rANS encoding in the same hardware accelerator. Specifically, the accelerator adopts a systolic array architecture combined with multiplyaccumulate units, loop tiling strategy, and a dual-buffer structure to fully leverage the parallel computing capabilities and on-chip storage resources of the FPGA, improving data transmission efficiency and computational performance. Each processing unit integrates multiple multiply-accumulate units, capable of performing two INT8 multiplications and local accumulation in each clock cycle. Finally, rANS is used for 8-way parallel encoding of the output features, further compressing the feature data. Experimental results show that, on the ZCU104 platform, the design achieves a throughput of 300.5 GOPS with a power efficiency of 66.77 GOPS/W when processing 1080P images, providing a processing speed approximately 6 times faster than Intel CPUs and 58 times faster than ARM CPUs. Compared with other FPGA accelerators, the BRAM efficiency improves by approximately 730%, 40%, and 63%, the energy efficiency by approximately 802%, 60% and 3%, and the DSP efficiency by approximately 476%, 70% and 133%. The proposed accelerator demonstrates significant performance advantages and can efficiently process image semantic information, offering broad practical application potential.

    参考文献
    相似文献
    引证文献
引用本文

李俊锋,谭北海,郑宇凡,陈汉杰,余荣.基于FPGA的语义信息处理加速器设计[J].电子测量技术,2025,48(6):188-195

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-05-08
  • 出版日期:
文章二维码