Abstract:In industrial settings, the acquisition and annotation of defective workpieces pose significant challenges, severely hindering defect detection efforts. While generating a large number of defective samples from limited real-world samples effectively mitigates the issue of sample scarcity, existing defect generation methods are often constrained by suboptimal visual authenticity and poor alignment with defect masks. To address these limitations, this study introduces AnomalyAlign, a novel controllable diffusion model designed to synthesize highly realistic industrial defect images with precise mask alignment. Leveraging the foundational knowledge of the text-to-image model Stable Diffusion, AnomalyAlign incorporates a semantic-aligned text prompt generator to produce text prompts that achieve closer semantic alignment with real images, thereby accelerating model convergence. Furthermore, the model integrates a defect alignment loss function, which enhances the spatial consistency between generated defect images and their corresponding masks. Extensive experimental validation on the MVTec-AD dataset demonstrates that AnomalyAlign generates defect images with superior realism and diversity, while significantly improving the performance of downstream defect detection tasks.