基于多层注意力和度量学习的商品识别方法
DOI:
CSTR:
作者:
作者单位:

1.湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室 武汉 430068; 2.武汉大学遥感信息工程学院 武汉 430079; 3.武汉大学深圳研究院 深圳 518057

作者简介:

通讯作者:

中图分类号:

TN911.73; TN919.81; TP391.41

基金项目:

国家自然科学基金(42101440,42301515)、智能光电系统感知及应用四川省高校重点实验室开放基金(ZNGD2308)、湖北工业大学博士科研启动基金(XJ2021004501)、深圳市科技计划资助项目(JCYJ20230807090206013)、深圳市科技重大专项(KJZD20230923114611023)资助


Commodity recognition method combining multi-layer attention mechanism and metric learning
Author:
Affiliation:

1.Hubei Key Laboratory of Solar Energy Efficient Utilization and Energy Storage Operation Control, Hubei University of Technology, Wuhan 430068, China; 2.Institute of Remote Sensing and Information Engineering, Wuhan University,Wuhan 430079, China; 3.Shenzhen Institute, Wuhan University,Shenzhen 518057, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对自动售货柜场景中存在的复杂背景和商品包装高度相似导致的识别难题,提出了一种融合多尺度注意力机制和度量学习的商品识别方法。首先,基于ResNet层级结构引入多头自注意力,充分挖掘卷积神经网络(CNN)多尺度特征提取优势和Transformer全局建模能力,并设计一种新的多尺度空洞注意力,使模型关注到相似包装中商标形状和局部纹理等局部特征,以及上下文全局特征;其次设计降采样多尺度特征融合策略,有效提高算法的多尺度特征表达能力;最后采用ArcFace损失函数以增强模型的识别能力。为了验证所提出方法的有效性,构建了一个真实场景下的商品数据集,由自动售货柜的顶视摄像头采集。实验结果表明,该方法在Commodity 553数据集上的MAP@1准确率达到87.4%,优于当前的主流识别方法,可实现更精确的商品识别。

    Abstract:

    Aiming at the recognition problem caused by the complex background and the high similarity of commodity packaging in the vending machine scene, a commodity recognition method combining multi-scale attention mechanism and metric learning is proposed. Firstly, based on the ResNet hierarchical structure, multi-head self-attention is introduced to fully exploit the advantages of multi-scale feature extraction of convolutional neural network (CNN) and the global modeling ability of Transformer, and a new multi-scale hollow attention is designed to make the model focus on local features such as trademark shape and local texture in similar packaging, as well as context global features. Secondly, a down-sampling multi-scale feature fusion strategy is designed to effectively improve the multi-scale feature expression ability of the algorithm. Finally, ArcFace loss function is used to enhance the recognition ability of the model. In order to verify the effectiveness of the proposed method, a commodity data set in a real scene is constructed, which is collected by the top-view camera of the vending cabinet. The experimental results show that the MAP @ 1 accuracy of this method on the Commodity 553 dataset reaches 87.4%, which is better than the current mainstream recognition methods and can achieve more accurate commodity recognition.

    参考文献
    相似文献
    引证文献
引用本文

李婕,张新月,涂静敏,陈记文,李礼.基于多层注意力和度量学习的商品识别方法[J].电子测量技术,2025,48(1):137-144

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-02-24
  • 出版日期:
文章二维码