Our proposed pipeline. A global audio projection layer and an audio cross-attention layers are added to the network's architecture. For V2V lip syncing, we noise tokens corresponding to the mouth area and task the model with spatio-temporally inpainting them. Purple and orange flame symbols correspond to LoRA and full training.
EditYourself supports transcript-based video editing, including the addition of new footage and seamless removal of unwanted segments.
Insertion of new content at arbitrary temporal locations, seamlessly adhering to surrounding boundary frames (when present) while lip-syncing to modified audio.
Deletion of existing content while smoothing the resulting temporal discontinuity to avoid visible jump cuts.
Selective re-rendering of video content over specified spatial and temporal regions, conditioned on updated audio and text prompt (e.g., correcting an awkward facial expression or regenerating a hand gesture).
Qualitative comparisons between EditYourself and leading open-source and proprietary solutions.
Qualitative comparisons between EditYourself and leading open-source and proprietary solutions.
EditYourself is capable of generating minutes-long videos without noticeable identity drift. This section also provide a comparison with open source and proprietary solutions: MultiTalk AI Avatar, Aurora, HunyuanAvatar, Kling AI Avatar v2 Pro, OmniHuman 1.5, StableAvatar.
Pipio is a human-first, AI powered video editor that keeps creators in control. While others replace, Pipio enhances. Seamlessly adjust your performance without ever having to re-record your footage.
EditYourself represents our latest advancement in audio-driven video generation and editing, bringing state-of-the-art diffusion transformer architectures to practical video production workflows.
@article{park2021nerfies,
author = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
title = {Nerfies: Deformable Neural Radiance Fields},
journal = {ICCV},
year = {2021},
}