Harnessing Diffusion Models for Image Manipulation With Partial Sketches.
Controllable image structure editing has attracted increasing attention. While recent interactive point-based methods are convenient and realistic, they often lack fine-grained control over localized content. Partial sketches provide a simple yet expressive interface for local structure manipulation. However, existing partial-sketch-based manipulation methods relying on generative adversarial networks (GANs) suffer from limited generalization and fidelity. Moreover, although diffusion-based adapters excel at global conditioning (e.g., edge maps), localized editing with partial strokes remains challenging due to two key issues: effectively injecting sparse stroke conditions during denoising and preserving non-edited regions to avoid unintended changes. To address these challenges, we propose DiffStroke, a mask-free framework for localized image manipulation with partial sketches. We introduce trainable Image-Stroke Fusion (ISF) blocks to fuse source images and strokes at the feature level, enabling precise local shape control while maintaining appearance consistency. We further develop a self-supervised mask estimator to protect irrelevant regions without manual input. Specifically, we leverage Tweedie's formula to estimate a clean latent image from noisy latents, blend the denoised result with the source, and train the mask estimator by minimizing the error between the blended latent and the target latent. Experiments on natural and facial images demonstrate that DiffStroke outperforms state-of-the-art methods on both simple and complex stroke-based editing tasks. DiffStroke can also be combined with text prompts to produce diverse and creative results. Code is available at https://github.com/CMACH508/DiffStroke.