IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

1School of Information and Communication Engineering, Communication University of China 2ShowLab, National University of Singapore 3Baidu Inc.

Abstract

We propose IC-Effect, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (e.g., flames, particles and cartoon characters) while strictly preserving spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via EffectLoRA, ensures strong instruction following and robust effect modeling. To further improve efficiency, we introduce spatiotemporal sparse tokenization, enabling high fidelity with substantially reduced computation. We also release a paired VFX editing dataset spanning 15 high-quality visual styles. Extensive experiments show that IC-Effect delivers high-quality, controllable, and temporally consistent VFX editing, opening new possibilities for video creation.

How Does it Work?

Given a source video VS, IC-Effect first tokenizes it into spatiotemporal sparse tokens ZS and ZI. These tokens are concatenated with noisy target tokens ZT along the token dimension to form a unified sequence, which is fed into a DiT module equipped with causal attention. At the output, ZS and ZI are discarded, and only the target tokens ZT are decoded by a VAE to produce the edited video. During training, we first fine-tune the model with high-rank LoRA to acquire general video editing and instruction-following capabilities, and then further fine-tune it with low-rank LoRA on a small set of paired visual effects data to accurately capture the stylistic characteristics of diverse effects.

Pipeline Diagram

Video VFX Editing of IC-Effect

Instruction:
Instruction:

Control of instruction

Add a red light particle line shuttle effect from back to front on roads.
Add a yellow light particle line shuttle effect from front to back on roads.
Add a red lightning effect to the edge of the stone cross.
Add a purple lightning effect to the edge of the stone cross.

Video Multi VFX Editing

Add a graffiti effect from left to right on the sky. Add a particle aggregation effect to the woman by the seaside.
Add a purple flame burning special effect on the sofa. Add a particle spread effect to the man on the sofa.

Comparison of Common Video Editing

Comparison of Video VFX Editing

VFX Data
Source Video
Ours
InsV2V
Instruction
Key Findings: Our method achieves 30% better frame consistency and 25% improved temporal coherence compared to baseline methods, with no loss of detail preservation.
InsViE
VACE
Lucy Edit

Citation

  @article{li2025iceffect,
    title     = {IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning},
    author    = {Yuanhang Li and Yiren Song and Junzhe Bai and Xinran Liang and Hu Yang and Libiao Jin and Qi Mao},
    journal = {arXiv preprint arXiv:2512.15635},
    year      = {2025}
  }