Abstract:Addressing the challenges in the domain of image inpainting, such as the high computational complexity, loss of information during feature extraction, and the blurring of textures in the inpainting images, this study proposed a image inpainting model that integrates multiscale hierarchical feature fusion with synergetic global-local Transformer. Initially, the multi-scale hierarchical feature fusion block was proposed as a means of effectively fusing deep and shallow features in detail, thereby reducing the loss of key information while expanding the sensory field. Subsequently, synergetic global-local Transformer blocks for global reasoning was proposed, featuring an integrated rectangle-window self-attention mechanism and local feed-forward neural networks. This design reduced computational complexity while enhancing the model′s macroscopic understanding of global context and microscopic grasp of local detail characteristics.The proposed method was validated on the CelebA-HQ and Places2 datasets, and the results demonstrated that it yielded improvements in PSNR by an average of 0.26~6.25 dB, SSIM by an average of 1.4%~19%, and L1 decreased by an average of 0.2%~5.66% compared to commonly used inpainting methods when dealing with 40%~50% masks. The experiments show that the inpainted images resulting from the proposed method exhibit a more realistic and natural visual effect, thereby providing further validation of the method′s effectiveness.