We propose a novel Masked Temporal Interpolation Diffusion (MTID) model for procedure planning in instructional videos. The paper is officially accepted at ICLR 2025.
Jan 23, 2025
MTID is a diffusion-based model for instructional video planning, incorporating latent temporal interpolation and task-aware masking for improved temporal reasoning.
Jan 1, 1970