MagicQuill V2

Precise and Interactive Image Editing with Layered Visual Cues

1HKUST, 2Ant Group, 3NEU, 4ZJU, 5CUHK

Abstract

We propose MagicQuill V2, a novel system that introduces a layered composition paradigm to generative image editing, bridging the gap between the semantic power of diffusion models and the granular control of traditional graphics software. While diffusion transformers excel at holistic generation, their use of singular, monolithic prompts fails to disentangle distinct user intentions for content, position, and appearance. To overcome this, our method deconstructs creative intent into a stack of controllable visual cues: a content layer for what to create, a spatial layer for where to place it, a structural layer for how it is shaped, and a color layer for its palette. Our technical contributions include a specialized data generation pipeline for context-aware content integration, a unified control module to process all visual cues, and a fine-tuned spatial branch for precise local editing, including object removal. Extensive experiments validate that this layered approach effectively resolves the user intention gap, granting creators direct, intuitive control over the generative process.

Tutorial

System Overview

The Toolbar (A) features a new Local Edit Brush for defining the target editing area, along with tools from MagicQuill V1.

The Visual Cue Manager (B) holds all content layer visual cues (foreground props) that users can drag onto the canvas to define what to generate.

Users can refine these cues using the Image Segmentation Panel (C) by clicking the segment icon . This panel allows precise object extraction using dots or bounding boxes, powered by SAM.

MagicQuill V2 UI

Image Segmentation

Click the segment icon to enter the segmentation UI. Users can perform four operations:

  1. Add positive dots to indicate areas to include.
  2. Add negative dots to indicate areas to exclude.
  3. Add bounding box to bound the region of interest.
  4. Use eraser to refine and erase unwanted areas.

After segmentation, click the Save/Save as new prop button to add the foreground prop to the Visual Cue Manager, or fill with any brush.

Layer Operations

1. Content Layer

Users can click a foreground prop in the Visual Cue Manager to add it to the canvas for puzzle-like editing. Use the local edit brush to specify the edit location. The result will respect the user-provided foreground props.

2. Structural Layer

Users can use the add brush to sketch edges and the subtract brush to mask existing edges. The result will follow the user's edge map.

3. Color Layer

Use the color brush to overlay semi-transparent color to change the color of specific regions. The result will follow the user's color map.

4. Spatial Layer (Local Edit)

Use the local edit brush to paint an area for local editing. The edit will occur within the specified region.

5. Spatial Layer (Removal)

Use the subtract brush to paint over an object for precise removal.