Skip to content

suruoxi/HumanAIGC-arxiv-daily

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HumanAIGC Research Papers

Updated on 2025.12.09

Table of Contents
  1. Talking Face
  2. Image Animation
  3. Video Generation
  4. TryOn
  5. Visual Edit
  6. Others
  7. Music2Dance and Co-speech
  8. Speech and Interaction
  9. Post Training
Talking Face

Talking Face

Publish Date Title Authors PDF Code
2025-12-04 LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging Zhijian Shu et.al. 2512.04939 null
2025-12-04 Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild Yigui Feng et.al. 2512.04728 null
2025-12-02 DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions Yifan Zhou et.al. 2512.02727 null
2025-12-01 EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans Yingjie Zhou et.al. 2512.01340 null
2025-11-30 EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking Head Chang Liu et.al. 2512.05991 null
2025-11-30 TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model Alireza Javanmardi et.al. 2512.00909 null
2025-11-29 MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection Mengxue Hu et.al. 2512.00336 null
2025-11-28 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Zhizhou Zhong et.al. 2511.23475 null
2025-11-28 CoordSpeaker: Exploiting Gesture Captioning for Coordinated Caption-Empowered Co-Speech Gesture Generation Fengyi Fang et.al. 2511.22863 null
2025-11-27 AI killed the video star. Audio-driven diffusion model for expressive talking head generation Baptiste Chopin et.al. 2511.22488 null
2025-11-27 VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task Yuyue Wang et.al. 2511.22229 null
2025-11-27 IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer Bo Chen et.al. 2511.22167 null
2025-11-27 Lips-Jaw and Tongue-Jaw Articulatory Tradeoff in DYNARTmo Bernd J. KrΓΆger et.al. 2511.22155 null
2025-11-26 Passive Dementia Screening via Facial Temporal Micro-Dynamics Analysis of In-the-Wild Talking-Head Video Filippo Cenacchi et.al. 2511.13802 null
2025-11-24 Blinking Beyond EAR: A Stable Eyelid Angle Metric for Driver Drowsiness Detection and Data Augmentation Mathis Wolter et.al. 2511.19519 null
2025-11-24 Assessing the alignment between infants' visual and linguistic experience using multimodal language models Alvin Wei Ming Tan et.al. 2511.18824 null
2025-11-23 SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model Kaidi Wang et.al. 2512.05126 null
2025-11-23 The Locally Deployable Virtual Doctor: LLM Based Human Interface for Automated Anamnesis and Database Conversion Jan Benedikt Ruhland et.al. 2511.18632 null
2025-11-23 RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data Wenchao Ma et.al. 2511.18601 null
2025-11-22 A superpersuasive autonomous policy debating system Allen Roush et.al. 2511.17854 null
2025-11-21 Investigating self-supervised representations for audio-visual deepfake detection Dragos-Alexandru Boldisor et.al. 2511.17181 null
2025-11-20 Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions Takuya Igaue et.al. 2511.16711 null
2025-11-19 StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model Yifan Yang et.al. 2511.14223 null
2025-11-18 Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection Xiaolin Wang et.al. 2511.14371 null
2025-11-18 Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning Rui Liu et.al. 2511.14249 null
2025-11-17 B2F: End-to-End Body-to-Face Motion Generation with Style Reference Bokyung Jang et.al. 2511.13988 null
2025-11-17 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Junyi Ma et.al. 2511.12878 null
2025-11-12 GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow Rui Wan et.al. 2511.09272 null
2025-11-11 Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation? Rui-Qing Sun et.al. 2511.07940 null
2025-11-10 LiveNeRF: Efficient Face Replacement Through Neural Radiance Fields Integration Tung Vu et.al. 2511.07552 null
2025-11-10 The Inner Kernel of the Classical Kuiper Belt Amir Siraj et.al. 2511.07512 null
2025-11-10 ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search Zhenjie Liu et.al. 2511.06833 null
2025-11-08 DiLO: Disentangled Latent Optimization for Learning Shape and Deformation in Grouped Deforming 3D Objects Mostofa Rafid Uddin et.al. 2511.06115 null
2025-11-08 Reperio-rPPG: Relational Temporal Graph Neural Networks for Periodicity Learning in Remote Physiological Measurement Ba-Thinh Nguyen et.al. 2511.05946 null
2025-11-07 Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis Dogucan Yaman et.al. 2511.05432 null
2025-11-07 THEval. Evaluation Framework for Talking Head Video Generation Nabyl Quignon et.al. 2511.04520 null
2025-11-05 Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework Dogucan Yaman et.al. 2511.08613 null
2025-11-05 Laugh, Relate, Engage: Stylized Comment Generation for Short Videos Xuan Ouyang et.al. 2511.03757 null
2025-11-05 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Guozhen Zhang et.al. 2511.03334 null
2025-11-04 Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks Dmitrii Pozdeev et.al. 2511.02830 null
2025-11-01 Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse Fan Yang et.al. 2510.26082 null
2025-11-01 Audio Driven Real-Time Facial Animation for Social Telepresence Jiye Lee et.al. 2510.01176 null
2025-10-29 Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation Yuxiang Mao et.al. 2510.25234 null
2025-10-28 See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement Jinting Wang et.al. 2510.26819 null
2025-10-28 The Divine Software Engineering Comedy -- Inferno: The Okinawa Files Michele Lanza et.al. 2510.24483 null
2025-10-28 GenTrack: A New Generation of Multi-Object Tracking Toan Van Nguyen et.al. 2510.24399 null
2025-10-28 Variable Projected Augmented Lagrangian Methods for Generalized Lasso Problems Stefano Aleotti et.al. 2510.24140 null
2025-10-27 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation Junyoung Seo et.al. 2510.23581 null
2025-10-27 Revising Second Order Terms in Deep Animation Video Coding Konstantin Schmidt et.al. 2510.23561 null
2025-10-26 MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control Fatemeh Nazarieh et.al. 2510.22810 null
2025-10-26 DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection Kangran Zhao et.al. 2510.22622 null
2025-10-24 Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing Danial Samadi Vahdati et.al. 2510.03548 null
2025-10-23 LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation Xin Lu et.al. 2510.21864 null
2025-10-16 PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis Soumyya Kanti Datta et.al. 2510.14241 null
2025-10-14 Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback Xingpei Ma et.al. 2510.12089 null
2025-10-12 DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis Peiyin Chen et.al. 2510.10650 null
2025-10-11 VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework Donglin Huang et.al. 2510.10269 null
2025-10-11 SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation Zeyu Ling et.al. 2510.10069 null
2025-10-09 Paper2Video: Automatic Video Generation from Scientific Papers Zeyu Zhu et.al. 2510.05096 null
2025-10-08 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages Zibo Su et.al. 2510.06612 null
2025-10-03 EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation Tianheng Zhu et.al. 2510.08587 null
2025-10-02 Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation Beijia Lu et.al. 2510.02617 null
2025-09-30 3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation Balamurugan Thambiraja et.al. 2509.26233 null
2025-09-28 Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer Hyunsoo Cha et.al. 2509.04434 null
2025-09-26 StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing Liyang Chen et.al. 2509.21887 null
2025-09-25 Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos Sarmistha Das et.al. 2509.20961 null
2025-09-24 KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation Tianle Lyu et.al. 2509.20128 null
2025-09-24 Comparative Study of Subjective Video Quality Assessment Test Methods in Crowdsourcing for Varied Use Cases Babak Naderi et.al. 2509.20118 null
2025-09-24 SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding Phyo Thet Yee et.al. 2509.19965 null
2025-09-24 Talking Head Generation via AU-Guided Landmark Prediction Shao-Yu Chang et.al. 2509.19749 null
2025-09-24 EAI-Avatar: Emotion-Aware Interactive Talking Head Generation Haijie Yang et.al. 2508.18337 null
2025-09-23 Audio-Driven Universal Gaussian Head Avatars Kartik Teotia et.al. 2509.18924 null
2025-09-22 "I don't like my avatar": Investigating Human Digital Doubles Siyi Liu et.al. 2509.17748 null
2025-09-22 Stable Video-Driven Portraits Mallikarjun B. R. et.al. 2509.17476 null
2025-09-21 Beat on Gaze: Learning Stylized Generation of Gaze and Head Dynamics Chengwei Shi et.al. 2509.17168 null
2025-09-21 PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control Tianheng Zhu et.al. 2509.16922 null
2025-09-20 Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation Yue Ma et.al. 2509.16630 null
2025-09-17 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Yikang Ding et.al. 2509.09595 null
2025-09-16 A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis Javeria Amir et.al. 2509.12831 null
2025-09-15 AvatarSync: Rethinking Talking-Head Animation through Autoregressive Perspective Yuchen Deng et.al. 2509.12052 null
2025-09-10 Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video Xiao Li et.al. 2509.08376 null
2025-08-28 EmoCAST: Emotional Talking Portrait via Emotive Text Description Yiguo Jiang et.al. 2508.20615 null
2025-08-27 InfinityHuman: Towards Long-Term Audio-Driven Human Xiaodi Li et.al. 2508.20210 null
2025-08-27 Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning Stelios Mylonas et.al. 2508.19730 null
2025-08-26 OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation Jianwen Jiang et.al. 2508.19209 null
2025-08-26 Wan-S2V: Audio-Driven Cinematic Video Generation Xin Gao et.al. 2508.18621 null
2025-08-25 Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation Jianzhi Long et.al. 2509.00052 null
2025-08-22 Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars NVIDIA et.al. 2508.16401 null
2025-08-20 D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis Yuhang Guo et.al. 2508.14449 null
2025-08-20 Taming Transformer for Emotion-Controllable Talking Face Generation Ziqi Zhang et.al. 2508.14359 null
2025-08-19 TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis Shunian Chen et.al. 2508.13618 null
2025-08-19 EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis Shuai Tan et.al. 2508.13442 null
2025-08-18 Human Feedback Driven Dynamic Speech Emotion Recognition Ilya Fedorov et.al. 2508.14920 null
2025-08-17 CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation Kangyi Wu et.al. 2508.12368 null
2025-08-16 RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis Wenqing Wang et.al. 2508.12163 null
2025-08-16 SimInterview: Transforming Business Education through Large Language Model-Based Simulated Multilingual Interview Training System Truong Thanh Hung Nguyen et.al. 2508.11873 null
2025-08-15 FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation MengChao Wang et.al. 2508.11255 null
2025-08-14 HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis Shiyu Liu et.al. 2508.10566 null
2025-08-13 LIA-X: Interpretable Latent Portrait Animator Yaohui Wang et.al. 2508.09959 null
2025-08-12 Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos Chaoyi Wang et.al. 2508.08891 null
2025-08-11 Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation Hyung Kyu Kim et.al. 2507.20568 null
2025-08-10 KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features Ivan Kukanov et.al. 2508.07337 null
2025-08-08 MotionSwap Om Patil et.al. 2508.06430 null
2025-08-07 Evaluation of a Sign Language Avatar on Comprehensibility, User Experience & Acceptability Fenya Wasserroth et.al. 2508.05358 null
2025-08-07 RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer Fangyu Du et.al. 2508.05115 null
2025-08-07 UniTalker: Conversational Speech-Visual Synthesis Yifan Hu et.al. 2508.04585 null
2025-08-07 AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation Le Wang et.al. 2508.00733 null
2025-08-06 MienCap: Realtime Performance-Based Facial Animation with Live Mood Dynamics Ye Pan et.al. 2508.04687 null
2025-08-06 READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation Haotian Wang et.al. 2508.03457 null
2025-08-05 Multi-human Interactive Talking Dataset Zeyu Zhu et.al. 2508.03050 null
2025-08-04 X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio Chenxu Zhang et.al. 2508.02944 null
2025-08-04 Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering Xu Wang et.al. 2508.02362 null
2025-08-04 Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos Laura Pedrouzo-Rodriguez et.al. 2508.00748 null
2025-07-31 Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads Yingjie Zhou et.al. 2507.23343 null
2025-07-30 X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention Xiaochen Zhao et.al. 2507.23143 null
2025-07-30 Robust Deepfake Detection for Electronic Know Your Customer Systems Using Registered Images Takuma Amada et.al. 2507.22601 null
2025-07-29 DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation He Feng et.al. 2508.06511 null
2025-07-29 JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 Xinhan Di et.al. 2507.20987 null
2025-07-29 Versatile Multimodal Controls for Expressive Talking Human Animation Zheng Qin et.al. 2503.08714 null
2025-07-28 Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation Dogucan Yaman et.al. 2507.20953 null
2025-07-28 MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization Hyung Kyu Kim et.al. 2507.20562 null
2025-07-28 JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync Sungjoon Park et.al. 2507.20452 null
2025-07-25 Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation Fang Kang et.al. 2507.19225 null
2025-07-24 Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation Zhen Han et.al. 2507.18352 null
2025-07-24 Celeb-DF++: A Large-scale Challenging Video DeepFake Benchmark for Generalizable Forensics Yuezun Li et.al. 2507.18015 null
2025-07-24 MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding Chang Liu et.al. 2507.06071 null
2025-07-23 MoDA: Multi-modal Diffusion Architecture for Talking Head Generation Xinyang Li et.al. 2507.03256 null
2025-07-22 Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching Haiyang Liu et.al. 2507.18649 null
2025-07-22 Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model Mingtao Guo et.al. 2507.16341 null
2025-07-21 VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis Alexandre Symeonidis-Herzig et.al. 2507.06060 null
2025-07-18 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers Qiang Wang et.al. 2507.12956 null
2025-07-17 ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion Hoang-Son Vo et.al. 2507.12804 null
2025-07-17 Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation Hanlei Shi et.al. 2507.12761 null
2025-07-17 Cross-Modal Watermarking for Authentic Audio Recovery and Tamper Localization in Synthesized Audiovisual Forgeries Minyoung Kim et.al. 2507.12723 null
2025-07-16 AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation Hao Li et.al. 2507.12001 null
2025-07-14 M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation Kui Jiang et.al. 2507.08307 null
2025-07-11 Detecting Deepfake Talking Heads from Facial Biometric Anomalies Justin D. Norman et.al. 2507.08917 null
2025-07-10 GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation Wentao Hu et.al. 2506.21513 null
2025-07-07 MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation Yucheng Wang et.al. 2507.05092 null
2025-07-05 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation Rang Meng et.al. 2507.03905 null
2025-07-03 CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation Xiangyang Luo et.al. 2507.02691 null
2025-07-02 FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases Shuai Tan et.al. 2507.01390 null
2025-07-01 ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing Babak Naderi et.al. 2506.12269 link
2025-06-30 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching Mingi Kwon et.al. 2506.23552 null
2025-06-27 MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation Dechao Meng et.al. 2506.22065 null
2025-06-27 Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field Hong Nie et.al. 2506.22044 null
2025-06-27 RiverEcho: Real-Time Interactive Digital System for Ancient Yellow River Culture Haofeng Wang et.al. 2506.21865 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-23 Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions Vineet Kumar Rakesh et.al. 2507.02900 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-17 SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting Ziqiao Peng et.al. 2506.14742 null
2025-06-17 Compressed Video Super-Resolution based on Hierarchical Encoding Yuxuan Jiang et.al. 2506.14381 null
2025-06-16 Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos Riku Takahashi et.al. 2506.13419 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-03 NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results Xiaohong Liu et.al. 2506.02875 null
2025-06-02 Cocktail-Party Audio-Visual Speech Recognition Thai-Binh Nguyen et.al. 2506.02178 null
2025-06-02 Low-Rank Head Avatar Personalization with Registers Sai Tanmay Reddy Chakkera et.al. 2506.01935 null
2025-06-02 Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation Yuan Gan et.al. 2506.01591 link
2025-06-01 SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Zhengcong Fei et.al. 2506.00830 null
2025-05-30 TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection Xinqi Xiong et.al. 2505.24866 null
2025-05-29 Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation Jiahao Cui et.al. 2505.23525 link
2025-05-29 Video Editing for Audio-Visual Dubbing Binyamin Manela et.al. 2505.23406 link
2025-05-29 Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation Hao Li et.al. 2505.23290 link
2025-05-29 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation Siyuan Wang et.al. 2505.23120 link
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 Tell me Habibi, is it Real or Fake? Kartik Kuckreja et.al. 2505.22581 null
2025-05-28 Neural Face Skinning for Mesh-agnostic Facial Expression Cloning Sihun Cha et.al. 2505.22416 null
2025-05-28 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing Guanwen Feng et.al. 2505.22141 null
2025-05-28 RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling Long-Khanh Pham et.al. 2505.22024 null
2025-05-27 OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers Ziqiao Peng et.al. 2505.21448 null
2025-05-26 Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting Yizhou Zhao et.al. 2505.20582 null
2025-05-26 DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations Ziqiao Peng et.al. 2505.18096 null
2025-05-22 Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis Radek Daněček et.al. 2504.13386 null
2025-05-14 Test-Time Augmentation for Pose-invariant Face Recognition Jaemin Jung et.al. 2505.09256 null
2025-05-10 VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback Eason Chen et.al. 2505.06676 null
2025-05-10 OT-Talk: Animating 3D Talking Head with Optimal Transportation Xinmu Wang et.al. 2505.01932 null
2025-05-10 MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance Mengting Wei et.al. 2504.21497 link
2025-05-08 OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours Hanie Moghaddasi et.al. 2505.05531 null
2025-05-03 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting Anushka Agarwal et.al. 2505.01928 null
2025-05-02 Model See Model Do: Speech-Driven Facial Animation with Style Control Yifang Pan et.al. 2505.01319 null
2025-05-02 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing Gaoxiang Cong et.al. 2505.01263 null
2025-05-01 KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution Antoni Bigata et.al. 2505.00497 null
2025-04-29 IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos Yuan Li et.al. 2504.19165 null
2025-04-27 Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions Mohammad Mahdi Abootorabi et.al. 2504.19056 link
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-25 Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation Weipeng Tan et.al. 2504.18087 null
2025-04-14 SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models Stathis Galanakis et.al. 2504.10716 null
2025-04-10 ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings Astitva Srivastava et.al. 2504.08022 null
2025-04-08 VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez Bello et.al. 2504.07146 null
2025-04-08 SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity Yihuan Huang et.al. 2504.05803 null
2025-04-08 Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation Zhihua Xu et.al. 2504.05746 null
2025-04-08 Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation Tianshui Chen et.al. 2504.05672 null
2025-04-07 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-06 FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency Shiyan Liu et.al. 2504.04427 null
2025-04-04 A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations Abdul Mannan Mohammed et.al. 2504.03147 null
2025-04-03 OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication Zhongjian Wang et.al. 2504.02433 null
2025-04-03 VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models Kim Sung-Bin et.al. 2504.02386 null
2025-04-02 Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies Soumyya Kanti Datta et.al. 2504.01470 link
2025-04-02 EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters Xuli Shen et.al. 2503.19416 null
2025-04-01 Monocular and Generalizable Gaussian Talking Head Animation Shengjie Gong et.al. 2504.00665 null
2025-04-01 Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics Lee Chae-Yeon et.al. 2503.20308 null
2025-03-30 MoCha: Towards Movie-Grade Talking Character Synthesis Cong Wei et.al. 2503.23307 null
2025-03-29 STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing Zijun Ding et.al. 2503.23039 link
2025-03-28 Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis Shuai Shen et.al. 2503.22605 null
2025-03-28 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance Haijie Yang et.al. 2503.22225 null
2025-03-27 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Jinwei Qi et.al. 2503.21144 null
2025-03-26 Dual Audio-Centric Modality Coupling for Talking Head Generation Ao Fu et.al. 2503.22728 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation Yukang Lin et.al. 2503.19383 null
2025-03-25 HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation Zunnan Xu et.al. 2503.18860 null
2025-03-25 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-24 DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model Kangwei Liu et.al. 2503.19001 null
2025-03-24 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen et.al. 2503.18429 null
2025-03-23 DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation Peng Chen et.al. 2503.18159 link
2025-03-21 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Jianchuan Chen et.al. 2503.17032 null
2025-03-21 From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Ji-Hoon Kim et.al. 2503.16956 null
2025-03-20 UniSync: A Unified Framework for Audio-Visual Synchronization Tao Feng et.al. 2503.16357 null
2025-03-20 PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation Baiqin Wang et.al. 2503.14295 null
2025-03-19 DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis Yuming Gu et.al. 2503.15667 link
2025-03-19 KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation Antoni Bigata et.al. 2503.01715 null
2025-03-17 SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization Xulin Fan et.al. 2503.13371 null
2025-03-17 Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait Chaolong Yang et.al. 2503.12963 link
2025-03-14 Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control Hejia Chen et.al. 2503.14517 null
2025-03-14 EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models Yixuan Zhang et.al. 2503.11028 null
2025-03-12 StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation An Yang et.al. 2503.09852 null
2025-03-12 Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos Riku Takahashi et.al. 2503.09787 null
2025-03-09 Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter Yanyu Zhu et.al. 2503.06397 null
2025-03-07 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Hongwei Yi et.al. 2503.05978 null
2025-03-06 FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis Ziqi Ni et.al. 2503.04067 null
2025-03-02 FaceShot: Bring Any Character into Life Junyao Gao et.al. 2503.00740 null
2025-03-01 Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture Xuanchen Li et.al. 2503.00495 null
2025-02-28 Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints Masoumeh Chapariniya et.al. 2502.20803 null
2025-02-28 ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model Xuangeng Chu et.al. 2502.20323 null
2025-02-27 InsTaG: Learning Personalized 3D Talking Head from Few-Second Video Jiahe Li et.al. 2502.20387 link
2025-02-27 High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model Mingtao Guo et.al. 2502.19894 link
2025-02-26 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode Lingzhou Mu et.al. 2502.19455 null
2025-02-24 Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation Baptiste Chopin et.al. 2502.17198 null
2025-02-20 NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis Xiaoxing Liu et.al. 2502.14178 null
2025-02-18 AV-Flow: Transforming Text to Audio-Visual Human-like Interactions Aggelina Chatziagapi et.al. 2502.13133 null
2025-02-17 SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion Junxian Ma et.al. 2502.11515 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-13 Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model Fei Shen et.al. 2502.09533 null
2025-02-13 VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output Eason Chen et.al. 2502.04103 null
2025-02-11 Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion Xingpei Ma et.al. 2502.07203 null
2025-02-07 Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Han Zhang et.al. 2502.04976 null
2025-02-02 EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis Junuk Cha et.al. 2502.00654 null
2025-01-24 SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation Yujian Liu et.al. 2501.14646 null
2025-01-21 A Lightweight and Interpretable Deepfakes Detection Framework Muhammad Umar Farooq et.al. 2501.11927 null
2025-01-18 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Linrui Tian et.al. 2501.10687 null
2025-01-17 TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation Yixiang Zhuang et.al. 2501.09921 null
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2025-01-15 Make-A-Character 2: Animatable 3D Character Generation From a Single Image Lin Liu et.al. 2501.07870 null
2025-01-09 Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding Ji-Ha Park et.al. 2501.14790 null
2025-01-09 Identity-Preserving Video Dubbing Using Motion Warping Runzhen Liu et.al. 2501.04586 null
2025-01-09 MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation Huaize Liu et.al. 2501.01808 null
2025-01-07 Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools Arash Dehghani et.al. 2501.06227 null
2025-01-07 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2025-01-06 RDD4D: 4D Attention-Guided Road Damage Detection And Classification Asma Alkalbani et.al. 2501.02822 link
2025-01-06 Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study Mykola Maslych et.al. 2501.00168 null
2025-01-03 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing Qili Wang et.al. 2501.01798 link
2024-12-28 DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis Kaijun Deng et.al. 2412.20148 link
2024-12-26 UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control Wenzhang Sun et.al. 2412.19860 null
2024-12-26 Generating Editable Head Avatars with 3D Gaussian GANs Guohao Li et.al. 2412.19149 link
2024-12-23 FaceLift: Single Image to 3D Head with View Generation and GS-LRM Weijie Lyu et.al. 2412.17812 null
2024-12-22 FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation Tianyun Zhong et.al. 2412.16915 null
2024-12-18 Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters Steven Hogue et.al. 2412.14333 link
2024-12-18 GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection Xiaocan Chen et.al. 2412.13656 null
2024-12-18 Learning to Control an Android Robot Head for Facial Animation Marcel Heisler et.al. 2412.13641 null
2024-12-18 Real-time One-Step Diffusion-based Expressive Portrait Videos Generation Hanzhong Guo et.al. 2412.13479 link
2024-12-18 VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization Tao Liu et.al. 2412.09892 null
2024-12-16 Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content Rohit Kundu et.al. 2412.12278 null
2024-12-13 GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression Ziqi Zhou et.al. 2412.09296 link
2024-12-12 LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync Chunyu Li et.al. 2412.09262 link
2024-12-12 EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing Gaoxiang Cong et.al. 2412.08988 null
2024-12-12 PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis Yifan Xie et.al. 2412.08504 null
2024-12-10 PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation Fatemeh Nazarieh et.al. 2412.07754 null
2024-12-10 IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation Sejong Yang et.al. 2412.04000 null
2024-12-05 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Longtao Zheng et.al. 2412.04448 null
2024-12-05 Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks Jiahao Cui et.al. 2412.00733 link
2024-12-04 SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model Yan Li et.al. 2412.03430 null
2024-12-02 One Shot, One Talk: Whole-body Talking Avatar from a Single Image Jun Xiang et.al. 2412.01106 null
2024-12-01 Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation Shuling Zhao et.al. 2412.00719 null
2024-11-29 LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis Tianqi Li et.al. 2411.19525 null
2024-11-29 Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Tianqi Li et.al. 2411.19509 link
2024-11-29 V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow Jeongsoo Choi et.al. 2411.19486 link
2024-11-26 Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey Hong-Hanh Nguyen-Le et.al. 2411.17911 null
2024-11-25 Sonic: Shifting Focus to Global Audio Perception in Portrait Animation Xiaozhong Ji et.al. 2411.16331 null
2024-11-25 ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations Xulong Zhang et.al. 2411.13089 null
2024-11-24 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis Haojie Zhang et.al. 2411.16748 null
2024-11-23 EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion Haotian Wang et.al. 2411.16726 null
2024-11-23 ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance Haijie Yang et.al. 2411.15436 null
2024-11-20 Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis Pegah Salehi et.al. 2411.13209 link
2024-11-20 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation Xuyang Cao et.al. 2411.09209 link
2024-11-14 LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space Guanwen Feng et.al. 2411.09268 null
2024-11-06 Large Generative Model-assisted Talking-face Semantic Communication System Feibo Jiang et.al. 2411.03876 null
2024-11-05 SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation Changpeng Cai et.al. 2405.07257 null
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-29 Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing Haonan Tong et.al. 2410.22112 null
2024-10-24 Real-time 3D-aware Portrait Video Relighting Ziqi Cai et.al. 2410.18355 link
2024-10-21 Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions Malte Prinzler et.al. 2410.16395 null
2024-10-18 Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization Bin Lin et.al. 2410.14283 null
2024-10-18 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Hanbo Cheng et.al. 2410.13726 link
2024-10-16 MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting Yue Zhang et.al. 2410.10122 link
2024-10-15 Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck Fevziye Irem Eyiokur et.al. 2410.11434 null
2024-10-15 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes Zhenhui Ye et.al. 2410.06734 null
2024-10-14 Character-aware audio-visual subtitling in context Jaesung Huh et.al. 2410.11068 null
2024-10-14 Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads Federico Nocentini et.al. 2410.11041 null
2024-10-14 TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model Jiazhi Guan et.al. 2410.10696 null
2024-10-14 Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization Shanzhi Yin et.al. 2410.10171 null
2024-10-10 MMHead: Towards Fine-grained Multi-modal 3D Facial Animation Sijing Wu et.al. 2410.07757 null
2024-10-09 FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model Feng Qiu et.al. 2409.13180 null
2024-10-01 LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details Jian Yang et.al. 2410.00990 null
2024-09-29 Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation Jingyi Xu et.al. 2409.19501 null
2024-09-27 Diverse Code Query Learning for Speech-Driven Facial Animation Chunzhi Gu et.al. 2409.19143 null
2024-09-26 Stable Video Portraits Mirela Ostrek et.al. 2409.18083 null
2024-09-25 ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE Sichun Wu et.al. 2409.07966 link
2024-09-24 FastTalker: Jointly Generating Speech and Conversational Gestures from Text Zixin Guo et.al. 2409.16404 null
2024-09-23 FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset Donglin Di et.al. 2410.07151 null
2024-09-23 MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning Yue Han et.al. 2409.15179 null
2024-09-18 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Sai Tanmay Reddy Chakkera et.al. 2409.12156 null
2024-09-18 GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations Kartik Teotia et.al. 2409.11951 null
2024-09-17 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy Xuanmeng Sha et.al. 2409.10848 null
2024-09-16 DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis Fa-Ting Hong et.al. 2409.10281 null
2024-09-14 StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads Suzhen Wang et.al. 2409.09292 null
2024-09-11 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures Steven Hogue et.al. 2409.07649 null
2024-09-11 EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion Jian Zhang et.al. 2409.07255 link
2024-09-09 PersonaTalk: Bring Attention to Your Persona in Visual Dubbing Longhao Zhang et.al. 2409.05379 null
2024-09-09 KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation Hoang-Son Vo-Thanh et.al. 2409.05330 link
2024-09-05 SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing Lingyu Xiong et.al. 2409.03605 null
2024-09-05 SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model Weipeng Tan et.al. 2409.03270 null
2024-09-04 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation Jun Ling et.al. 2409.02657 null
2024-09-02 KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding Zhihao Xu et.al. 2409.01113 link
2024-08-28 Micro and macro facial expressions by driven animations in realistic Virtual Humans Rubens Halbig Montanha et.al. 2408.16110 null
2024-08-27 MegActor- $Ξ£$ : Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer Shurong Yang et.al. 2408.14975 null
2024-08-25 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation Jack Saunders et.al. 2408.13714 null
2024-08-23 G3FA: Geometry-guided GAN for Face Animation Alireza Javanmardi et.al. 2408.13049 null
2024-08-21 AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition Minheng Ni et.al. 2408.11564 null
2024-08-21 EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention Yihong Lin et.al. 2408.11518 null
2024-08-20 DEGAS: Detailed Expressions on Full-Body Gaussian Avatars Zhijing Shao et.al. 2408.10588 link
2024-08-18 FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model Ziyu Yao et.al. 2408.09384 null
2024-08-18 Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation Xukun Zhou et.al. 2408.09357 null
2024-08-18 S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis Dongze Li et.al. 2408.09347 null
2024-08-16 GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer Yihong Lin et.al. 2408.01826 null
2024-08-14 Content and Style Aware Audio-Driven Facial Animation Qingju Liu et.al. 2408.07005 null
2024-08-12 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation Jisoo Kim et.al. 2408.06010 null
2024-08-10 High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model Weizhi Zhong et.al. 2408.05416 null
2024-08-10 Style-Preserving Lip Sync via Audio-Aware Style Reference Weizhi Zhong et.al. 2408.05412 null
2024-08-09 DeepSpeak Dataset v1.0 Sarah Barrington et.al. 2408.05366 null
2024-08-06 ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer Jiazhi Guan et.al. 2408.03284 null
2024-08-03 Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation Jintao Tan et.al. 2408.01732 null
2024-08-03 JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model Farzaneh Jafari et.al. 2408.01627 null
2024-08-01 UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model Xiangyu Fan et.al. 2408.00762 null
2024-08-01 Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Manuel Kansy et.al. 2408.00458 null
2024-08-01 EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head Qianyun He et.al. 2408.00297 null
2024-07-31 Deformable 3D Shape Diffusion Model Dengsheng Chen et.al. 2407.21428 null
2024-07-26 LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement Rui Zhang et.al. 2407.18595 null
2024-07-24 A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation Jose Geraldo Fernandes et.al. 2407.17430 null
2024-07-24 The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions Rabab Algadhy et.al. 2407.17253 null
2024-07-22 PAV: Personalized Head Avatar from Unstructured Video Collection Akin Caliskan et.al. 2407.21047 null
2024-07-21 Anchored Diffusion for Video Face Reenactment Idan Kligvasser et.al. 2407.15153 null
2024-07-20 Text-based Talking Video Editing with Cascaded Conditional Diffusion Bo Han et.al. 2407.14841 null
2024-07-17 Universal Facial Encoding of Codec Avatars from VR Headsets Shaojie Bai et.al. 2407.13038 null
2024-07-17 EmoFace: Audio-driven Emotional 3D Face Animation Chang Liu et.al. 2407.12501 link
2024-07-13 Learning Online Scale Transformation for Talking Head Video Generation Fa-Ting Hong et.al. 2407.09965 null
2024-07-12 Real Face Video Animation Platform Xiaokai Chen et.al. 2407.18955 null
2024-07-12 One-Shot Pose-Driving Face Animation Platform He Feng et.al. 2407.08949 null
2024-07-12 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions Zhiyuan Chen et.al. 2407.08136 link
2024-07-08 MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices Jianwen Jiang et.al. 2407.05712 null
2024-07-08 Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN Jiacheng Su et.al. 2407.05577 null
2024-07-04 Compressed Skinning for Facial Blendshapes Ladislav Kavan et.al. 2406.11597 null
2024-07-03 LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control Jianzhu Guo et.al. 2407.03168 link
2024-07-02 Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert Han EunGi et.al. 2407.01034 null
2024-06-26 RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Xiaozhong Ji et.al. 2406.18284 null
2024-06-24 The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents Sinan Sonlu et.al. 2407.10993 null
2024-06-21 EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot Hao Fei et.al. 2406.15177 link
2024-06-20 MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset Kim Sung-Bin et.al. 2406.14272 null
2024-06-19 DF40: Toward Next-Generation Deepfake Detection Zhiyuan Yan et.al. 2406.13495 link
2024-06-19 AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models Ken Chen et.al. 2406.13272 null
2024-06-18 RITA: A Real-time Interactive Talking Avatars Framework Wuxinlin Cheng et.al. 2406.13093 null
2024-06-18 A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing Ming Meng et.al. 2406.10553 null
2024-06-17 NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation Niu Guanchen et.al. 2406.11259 null
2024-06-17 Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement Runyi Yu et.al. 2406.08096 null
2024-06-16 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation Mingwang Xu et.al. 2406.08801 null
2024-06-14 DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details Haitao Cao et.al. 2405.19688 null
2024-06-13 Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo et.al. 2406.09519 null
2024-06-13 DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Neha Sahipjohn et.al. 2406.08802 null
2024-06-12 Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation Jiadong Liang et.al. 2406.07895 null
2024-06-07 Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation Yue Ma et.al. 2406.01900 null
2024-06-05 Controllable Talking Face Generation by Implicit Facial Keypoints Editing Dong Zhao et.al. 2406.02880 link
2024-05-31 MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses Saif Mahmud et.al. 2405.21004 null
2024-05-31 MegActor: Harness the Power of Raw Video for Vivid Portrait Animation Shurong Yang et.al. 2405.20851 link
2024-05-30 Audio2Rig: Artist-oriented deep learning tool for facial animation Bastien Arcelin et.al. 2405.20412 null
2024-05-28 OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance Shuheng Ge et.al. 2405.14709 null
2024-05-24 InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Yuchi Wang et.al. 2405.15758 link
2024-05-22 Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling Yibo Wang et.al. 2405.13701 null
2024-05-21 Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Yue Han et.al. 2405.12970 null
2024-05-16 Faces that Speak: Jointly Synthesising Talking Face and Speech from Text Youngjoon Jang et.al. 2405.10272 null
2024-05-14 PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset Yang Hou et.al. 2405.08838 link
2024-05-10 NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior Gihoon Kim et.al. 2405.05749 null
2024-05-09 SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space Zeren Zhang et.al. 2405.05636 null
2024-05-08 Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention Ruijie Tao et.al. 2404.18501 link
2024-05-07 Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation Dogucan Yaman et.al. 2405.04327 null
2024-05-07 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding Tao Liu et.al. 2405.03121 null
2024-04-29 EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars Nikita Drobyshev et.al. 2404.19110 null
2024-04-29 GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting Bo Chen et.al. 2404.19040 null
2024-04-29 Embedded Representation Learning Network for Animating Styled Video Portrait Tianyong Wang et.al. 2404.19038 null
2024-04-29 CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation Xiangyu Liang et.al. 2404.18604 null
2024-04-28 GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting Hongyun Yu et.al. 2404.14037 null
2024-04-25 GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting Kyusun Cho et.al. 2404.16012 link
2024-04-23 TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting Jiahe Li et.al. 2404.15264 link
2024-04-19 Learn2Talk: 3D Talking Face Learns from 2D Talking Face Yixiang Zhuang et.al. 2404.12888 null
2024-04-16 VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Sicheng Xu et.al. 2404.10667 null
2024-04-15 FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features Andre Rochow et.al. 2404.09736 null
2024-04-13 THQA: A Perceptual Quality Assessment Database for Talking Heads Yingjie Zhou et.al. 2404.09003 link
2024-04-11 EFHQ: Multi-purpose ExtremePose-Face-HQ dataset Trung Tuan Dao et.al. 2312.17205 null
2024-04-09 Deepfake Generation and Detection: A Benchmark and Survey Gan Pei et.al. 2403.17881 link
2024-04-08 SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation Heyuan Li et.al. 2404.05680 null
2024-04-07 GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets Dongjing Shan et.al. 2404.04924 null
2024-04-07 Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation Renshuai Liu et.al. 2401.01207 null
2024-04-03 MI-NeRF: Learning a Single Face NeRF from Multiple Identities Aggelina Chatziagapi et.al. 2403.19920 null
2024-04-02 EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis Shuai Tan et.al. 2404.01647 null
2024-04-02 Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation Taekyung Ki et.al. 2404.00636 null
2024-04-02 Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation Se Jin Park et.al. 2305.19556 null
2024-04-01 FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio Chao Xu et.al. 2403.01901 link
2024-03-29 Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior Jaehoon Ko et.al. 2403.20153 link
2024-03-28 MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Seyeon Kim et.al. 2403.19144 link
2024-03-28 GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response Govind Mittal et.al. 2210.06186 link
2024-03-27 X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention You Xie et.al. 2403.15931 null
2024-03-26 Superior and Pragmatic Talking Face Generation with Teacher-Student Framework Chao Liang et.al. 2403.17883 null
2024-03-26 AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation Huawei Wei et.al. 2403.17694 link
2024-03-26 Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis Zhenhui Ye et.al. 2401.08503 null
2024-03-25 DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment Stella Bounareli et.al. 2403.17217 null
2024-03-25 AnimateMe: 4D Facial Expressions via Diffusion Models Dimitrios Gerogiannis et.al. 2403.17213 null
2024-03-25 Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework Ziyao Huang et.al. 2403.16510 link
2024-03-23 Adaptive Super Resolution For One-Shot Talking-Head Generation Luchuan Song et.al. 2403.15944 link
2024-03-22 LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example Soyeon Yoon et.al. 2403.15227 link
2024-03-22 Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing Juan Zhang et.al. 2403.11700 null
2024-03-19 EmoVOCA: Speech-Driven Emotional 3D Talking Heads Federico Nocentini et.al. 2403.12886 link
2024-03-19 ScanTalk: 3D Talking Heads from Unregistered Scans Federico Nocentini et.al. 2403.10942 link
2024-03-15 StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation Dongchan Min et.al. 2208.10922 null
2024-03-14 GAIA: Zero-shot Talking Avatar Generation Tianyu He et.al. 2311.15230 null
2024-03-13 Say Anything with Any Style Shuai Tan et.al. 2403.06363 null
2024-03-12 FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization Shuai Tan et.al. 2403.06375 null
2024-03-12 Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style Shuai Tan et.al. 2403.06365 null
2024-03-11 A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Weixia Zhang et.al. 2403.06421 link
2024-03-05 Memories are One-to-Many Mapping Alleviators in Talking Face Generation Anni Tang et.al. 2212.05005 null
2024-03-02 G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment Juan Zhang et.al. 2402.18122 null
2024-03-01 DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder Chenpeng Du et.al. 2303.17550 null
2024-02-29 Learning a Generalized Physical Face Model From Data Lingchen Yang et.al. 2402.19477 null
2024-02-28 Context-aware Talking Face Video Generation Meidai Xuanyuan et.al. 2402.18092 null
2024-02-27 EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Linrui Tian et.al. 2402.17485 null
2024-02-27 Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis Zicheng Zhang et.al. 2402.17364 link
2024-02-26 Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields Yifei Li et.al. 2402.16599 null
2024-02-25 AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation Yasheng Sun et.al. 2402.16124 null
2024-02-21 Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters Zechen Bai et.al. 2402.13724 link
2024-02-21 StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing Gaoxiang Cong et.al. 2402.12636 link
2024-02-12 StyleLipSync: Style-based Personalized Lip-sync Video Generation Taekyung Ki et.al. 2305.00521 null
2024-02-08 DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer Zhiyuan Ma et.al. 2402.05712 link
2024-02-05 One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space Stella Bounareli et.al. 2402.03553 null
2024-02-02 EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation Guanwen Feng et.al. 2402.01422 null
2024-01-31 MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Wenhao Guan et.al. 2312.10687 null
2024-01-30 Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Qingcheng Zhao et.al. 2401.15687 null
2024-01-28 Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes Weifeng Liu et.al. 2401.15668 link
2024-01-27 An Implicit Physical Face Model Driven by Expression and Style Lingchen Yang et.al. 2401.15414 null
2024-01-26 Implicit Neural Representation for Physics-driven Actuated Soft Bodies Lingchen Yang et.al. 2401.14861 null
2024-01-25 SAiD: Speech-driven Blendshape Facial Animation with Diffusion Inkyu Park et.al. 2401.08655 link
2024-01-23 NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis Chongke Bi et.al. 2401.12568 null
2024-01-19 Fast Registration of Photorealistic Avatars for VR Facial Animation Chaitanya Patel et.al. 2401.11002 null
2024-01-18 Exposing Lip-syncing Deepfakes from Mouth Inconsistencies Soumyya Kanti Datta et.al. 2401.10113 link
2024-01-18 Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models Jeongsoo Choi et.al. 2306.16003 null
2024-01-16 EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model Bingyuan Zhang et.al. 2401.08049 null
2024-01-12 DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder Tao Liu et.al. 2311.01811 link
2024-01-11 Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors Jack Saunders et.al. 2401.06126 null
2024-01-11 Jump Cut Smoothing for Talking Heads Xiaojuan Wang et.al. 2401.04718 null
2024-01-08 AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation Liyang Chen et.al. 2310.07236 null
2024-01-07 Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness Sicheng Yang et.al. 2401.03476 null
2024-01-04 Expressive Speech-driven Facial Animation with controllable emotions Yutong Chen et.al. 2301.02008 link
2023-12-23 TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation Xize Cheng et.al. 2312.15197 null
2023-12-22 DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation Chenxu Zhang et.al. 2312.13578 null
2023-12-20 FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability Linze Li et.al. 2312.03775 null
2023-12-19 Learning Dense Correspondence for NeRF-Based Face Reenactment Songlin Yang et.al. 2312.10422 null
2023-12-19 Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing Yushi Lan et.al. 2312.03763 null
2023-12-18 VectorTalker: SVG Talking Face Generation with Progressive Vectorisation Hao Hu et.al. 2312.11568 null
2023-12-18 AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis Dongze Li et.al. 2312.10921 null
2023-12-18 Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation Hui Fu et.al. 2312.10877 null
2023-12-15 DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models Yifeng Ma et.al. 2312.09767 link
2023-12-15 Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars Andre Rochow et.al. 2312.09750 null
2023-12-13 uTalk: Bridging the Gap Between Humans and AI Hussam Azzuni et.al. 2310.02739 null
2023-12-13 MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation Haozhe Wu et.al. 2303.09797 null
2023-12-12 GMTalker: Gaussian Mixture based Emotional talking video Portraits Yibo Xia et.al. 2312.07669 null
2023-12-12 GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance Haiming Zhang et.al. 2312.07385 null
2023-12-11 Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Georgios Milis et.al. 2312.06613 link
2023-12-11 Study of Non-Verbal Behavior in Conversational Agents Camila Vicari Maccari et.al. 2312.06530 null
2023-12-11 DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers Aaron Mir et.al. 2312.06400 null
2023-12-11 Audio-driven Talking Face Generation by Overcoming Unintended Information Flow Dogucan Yaman et.al. 2307.09368 null
2023-12-10 DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation Fa-Ting Hong et.al. 2305.06225 link
2023-12-09 R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning Zhiling Ye et.al. 2312.05572 null
2023-12-09 FT2TF: First-Person Statement Text-To-Talking Face Generation Xingjian Diao et.al. 2312.05430 null
2023-12-08 SingingHead: A Large-scale 4D Dataset for Singing Head Animation Sijing Wu et.al. 2312.04369 null
2023-12-07 VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior Xusen Sun et.al. 2312.01841 null
2023-12-05 PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features Tianshun Han et.al. 2312.02781 null
2023-12-05 MyPortrait: Morphable Prior-Guided Personalized Portrait Generation Bo Ding et.al. 2312.02703 null
2023-12-02 DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser Peng Chen et.al. 2311.16565 null
2023-12-01 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing Balamurugan Thambiraja et.al. 2312.00870 null
2023-11-30 Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data Yu Deng et.al. 2311.18729 null
2023-11-30 Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation Pramook Khungurn et.al. 2311.17409 null
2023-11-29 SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis Ziqiao Peng et.al. 2311.17590 link
2023-11-28 THInImg: Cross-modal Steganography for Presenting Talking Heads in Images Lin Zhao et.al. 2311.17177 null
2023-11-28 BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis Hao-Bin Duan et.al. 2311.05521 link
2023-11-28 Continuously Controllable Facial Expression Editing in Talking Face Videos Zhiyao Sun et.al. 2209.08289 null
2023-11-20 MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI Lifei Zheng et.al. 2311.14730 null
2023-11-15 CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding Jianzong Wang et.al. 2311.08673 null
2023-11-13 DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation Guinan Su et.al. 2311.04766 null
2023-11-12 ChatAnything: Facetime Chat with LLM-Enhanced Personas Yilin Zhao et.al. 2311.06772 null
2023-11-08 Synthetic Speaking Children -- Why We Need Them and How to Make Them Muhammad Ali Farooq et.al. 2311.06307 null
2023-11-06 RADIO: Reference-Agnostic Dubbing Video Synthesis Dongyeun Lee et.al. 2309.01950 null
2023-11-05 3D-Aware Talking-Head Video Motion Transfer Haomiao Ni et.al. 2311.02549 null
2023-11-03 Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading Songtao Luo et.al. 2310.05058 link
2023-11-02 LaughTalk: Expressive 3D Talking Head Generation with Laughter Kim Sung-Bin et.al. 2311.00994 null
2023-11-02 High-Fidelity and Freely Controllable Talking Head Video Generation Yue Gao et.al. 2304.10168 null
2023-10-31 Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape Wei Zhao et.al. 2310.20240 null
2023-10-29 On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models Marija Ivanovska et.al. 2307.05397 null
2023-10-25 Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control Elif Bozkurt et.al. 2310.17011 null
2023-10-23 The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills Qingxiao Zheng et.al. 2310.15112 null
2023-10-19 Gemino: Practical and Robust Neural Compression for Video Conferencing Vibhaalakshmi Sivaraman et.al. 2209.10507 null
2023-10-17 CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation Zhaojie Chu et.al. 2310.11295 null
2023-10-15 HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation Yaosen Chen et.al. 2310.05720 link
2023-10-12 CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity Abdullah Hayajneh et.al. 2310.07969 link
2023-10-12 Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation Yuan Gan et.al. 2309.04946 link
2023-10-08 GestSync: Determining who is speaking without a talking head Sindhu B Hegde et.al. 2310.05304 link
2023-09-30 DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models Zhiyao Sun et.al. 2310.00434 null
2023-09-28 OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions Jin Liu et.al. 2309.16148 null
2023-09-26 Emotional Speech-Driven Animation with Content-Emotion Disentanglement Radek Daněček et.al. 2306.08990 null
2023-09-20 FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion Stefan Stan et.al. 2309.11306 link
2023-09-20 Context-Aware Talking-Head Video Editing Songlin Yang et.al. 2308.00462 null
2023-09-18 That's What I Said: Fully-Controllable Talking Face Generation Youngjoon Jang et.al. 2304.03275 null
2023-09-15 Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech Junjie Li et.al. 2309.08408 link
2023-09-14 DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Yaoyu Su et.al. 2309.07752 null
2023-09-14 DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks Zipeng Qi et.al. 2309.07509 null
2023-09-14 HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods Yongyuan Li et.al. 2309.07495 link
2023-09-13 PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network Qinghua Liu et.al. 2309.06723 null
2023-09-12 DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention Aaditya Kharel et.al. 2309.06511 null
2023-09-12 Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos Ekta Prashnani et.al. 2305.03713 null
2023-09-11 ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment Yicheng Zhong et.al. 2308.14448 null
2023-09-10 MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment Tina Behrouzi et.al. 2309.05095 null
2023-09-09 Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video Xiuzhe Wu et.al. 2309.04814 link
2023-09-01 Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances Wolfgang Paier et.al. 2306.10006 null
2023-08-30 From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications Shreyank N Gowda et.al. 2308.16041 null
2023-08-30 SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces Ziqiao Peng et.al. 2306.10799 link
2023-08-30 Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models Antoni Bigata Casademunt et.al. 2305.08854 link
2023-08-29 Papeos: Augmenting Research Papers with Talk Videos Tae Soo Kim et.al. 2308.15224 null
2023-08-25 EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation Ziqiao Peng et.al. 2303.11089 link
2023-08-24 ToonTalker: Cross-Domain Face Reenactment Yuan Gong et.al. 2308.12866 null
2023-08-24 Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Jiahe Li et.al. 2307.09323 link
2023-08-23 DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion Se Jin Park et.al. 2310.05934 null
2023-08-21 Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis Tong Sha et.al. 2109.02081 null
2023-08-18 Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization Soumik Mukhopadhyay et.al. 2308.09716 link
2023-08-18 Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation Fa-Ting Hong et.al. 2307.09906 link
2023-08-17 A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation Li Liu et.al. 2308.08849 link
2023-08-16 Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions Yuqi Sun et.al. 2306.10813 null
2023-08-12 Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation Zhichao Wang et.al. 2308.06457 link
2023-08-12 DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation Yichao Yan et.al. 2203.07931 null
2023-08-11 Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space Haoyu Wang et.al. 2308.06076 link
2023-08-11 VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer Liyang Chen et.al. 2308.04830 null
2023-08-10 Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution Hyojoon Park et.al. 2305.03216 null
2023-08-02 Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis Zhenhui Ye et.al. 2306.03504 null
2023-07-29 Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation MichaΕ‚ StypuΕ‚kowski et.al. 2301.03396 null
2023-07-26 Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation Federico Nocentini et.al. 2306.01415 link
2023-07-20 HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces Stella Bounareli et.al. 2307.10797 link
2023-07-20 MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions Yunfei Liu et.al. 2307.10008 null
2023-07-19 Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline Zhigang Chang et.al. 2307.09821 null
2023-07-19 OPHAvatars: One-shot Photo-realistic Head Avatars Shaoxu Li et.al. 2307.09153 link
2023-07-18 FACTS: Facial Animation Creation using the Transfer of Styles Jack Saunders et.al. 2307.09480 null
2023-07-09 Predictive Coding For Animation-Based Video Compression Goluck Konuko et.al. 2307.04187 null
2023-07-08 FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction Ganglai Wang et.al. 2307.03990 null
2023-07-05 Interactive Conversational Head Generation Mohan Zhou et.al. 2307.02090 null
2023-07-04 A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation Louis Airale et.al. 2307.03270 link
2023-07-04 Generating Animatable 3D Cartoon Faces from Single Portraits Chuanyu Pan et.al. 2307.01468 null
2023-07-03 RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Neha Sahipjohn et.al. 2307.01233 null
2023-06-20 Audio-Driven 3D Facial Animation from In-the-Wild Videos Liying Lu et.al. 2306.11541 null
2023-06-13 Parametric Implicit Face Representation for Audio-Driven Facial Reenactment Ricong Huang et.al. 2306.07579 null
2023-06-13 AniFaceDrawing: Anime Portrait Exploration during Your Sketching Zhengyu Huang et.al. 2306.07476 null
2023-06-12 NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection Yu Chen et.al. 2306.06885 null
2023-06-10 StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles Yifeng Ma et.al. 2301.01081 link
2023-06-08 ReliableSwap: Boosting General Face Swapping Via Reliable Supervision Ge Yuan et.al. 2306.05356 link
2023-06-06 Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks Jianrong Wang et.al. 2306.03594 null
2023-06-05 Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions Shaoxu Li et.al. 2306.02903 link
2023-05-31 High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning Chao Xu et.al. 2305.02572 null
2023-05-23 CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation Jingning Xu et.al. 2305.13962 null
2023-05-22 RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Dongwei Pan et.al. 2305.13353 link
2023-05-19 UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui et.al. 2302.14337 null
2023-05-18 An Android Robot Head as Embodied Conversational Agent Marcel Heisler et.al. 2305.10945 null
2023-05-18 Audio-Visual Person-of-Interest DeepFake Detection Davide Cozzolino et.al. 2204.03083 link
2023-05-17 INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network Shuang Chen et.al. 2305.10589 null
2023-05-17 LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model Kwangho Lee et.al. 2305.10456 null
2023-05-15 Identity-Preserving Talking Face Generation with Landmark and Appearance Priors Weizhi Zhong et.al. 2305.08293 link
2023-05-09 Zero-shot personalized lip-to-speech synthesis with face image based voice control Zheng-Yan Sheng et.al. 2305.14359 null
2023-05-09 StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator Jiazhi Guan et.al. 2305.05445 null
2023-05-09 Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator Chao Xu et.al. 2305.02594 null
2023-05-01 StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video Lizhen Wang et.al. 2305.00942 link
2023-05-01 GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation Zhenhui Ye et.al. 2305.00787 null
2023-04-28 A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation Bo-Kyeong Kim et.al. 2304.00471 null
2023-04-27 Controllable One-Shot Face Video Synthesis With Semantic Aware Prior Kangning Liu et.al. 2304.14471 null
2023-04-25 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head Rongjie Huang et.al. 2304.12995 link
2023-04-24 VR Facial Animation for Immersive Telepresence Avatars Andre Rochow et.al. 2304.12051 null
2023-04-21 Implicit Neural Head Synthesis via Controllable Local Deformation Fields Chuhan Chen et.al. 2304.11113 null
2023-04-20 DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation Shuai Shen et.al. 2301.03786 link
2023-04-18 Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations Rongliang Wu et.al. 2304.08945 null
2023-04-17 Autoregressive GAN for Semantic Unconditional Head Motion Generation Louis Airale et.al. 2211.00987 link
2023-04-11 One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field Weichuang Li et.al. 2304.05097 null
2023-04-06 Face Animation with an Attribute-Guided Diffusion Model Bohan Zeng et.al. 2304.03199 link
2023-04-06 4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios Wei Chen et.al. 2304.02814 null
2023-04-03 CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior Jinbo Xing et.al. 2301.02379 link
2023-04-01 DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance Longwen Zhang et.al. 2304.03117 null
2023-04-01 TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles Yifeng Ma et.al. 2304.00334 null
2023-03-31 FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions Jin Liu et.al. 2303.17789 null
2023-03-31 Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert Jiadong Wang et.al. 2303.17480 null
2023-03-27 OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis Hongyi Xu et.al. 2303.15539 null
2023-03-27 Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms Stevo Racković et.al. 2302.04843 null
2023-03-27 MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation Bowen Zhang et.al. 2212.08062 link
2023-03-27 A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation Stevo Racković et.al. 2205.04289 null
2023-03-26 OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering Zhiyuan Ma et.al. 2303.14662 link
2023-03-26 Emotionally Enhanced Talking Face Generation Sahil Goyal et.al. 2303.11548 link
2023-03-26 Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation Stevo Racković et.al. 2303.06370 null
2023-03-24 Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement Siddarth Ravichandran et.al. 2209.01320 null
2023-03-23 PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 $^{\circ}$ Sizhe An et.al. 2303.13071 null
2023-03-22 Style Transfer for 2D Talking Head Animation Trong-Thang Pham et.al. 2303.09799 link
2023-03-22 MARLIN: Masked Autoencoder for facial video Representation LearnINg Zhixi Cai et.al. 2211.06627 link
2023-03-14 DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions Geumbyeol Hwang et.al. 2303.07697 link
2023-03-13 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Wenxuan Zhang et.al. 2211.12194 link
2023-03-09 FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning Kazi Injamamul Haque et.al. 2303.05416 link
2023-03-09 Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation Qi Chen et.al. 2303.05322 link
2023-03-07 DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video Zhimeng Zhang et.al. 2303.03988 link
2023-03-05 Cyber Vaccine for Deepfake Immunity Ching-Chun Chang et.al. 2303.02659 null
2023-03-04 High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors Yunpeng Bai et.al. 2211.15064 null
2023-03-01 DPE: Disentanglement of Pose and Expression for General Video Portrait Editing Youxin Pang et.al. 2301.06281 link
2023-02-27 Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Minsu Kim et.al. 2303.08670 null
2023-02-27 Memory-augmented Contrastive Learning for Talking Head Generation Jianrong Wang et.al. 2302.13469 link
2023-02-24 Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention Bin Liu et.al. 2302.12532 null
2023-02-16 OPT: One-shot Pose-Controllable Talking Head Generation Jin Liu et.al. 2302.08197 null
2023-02-14 Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space Trevine Oorloff et.al. 2203.14512 link
2023-01-31 GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis Zhenhui Ye et.al. 2301.13430 null
2023-01-23 Data standardization for robust lip sync Chun Wang et.al. 2202.06198 null
2023-01-20 Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes Nicolas Wagner et.al. 2212.14784 null
2023-01-15 Learning Audio-Driven Viseme Dynamics for 3D Face Animation Linchao Bao et.al. 2301.06059 null
2022-12-30 Imitator: Personalized Speech-driven 3D Facial Animation Balamurugan Thambiraja et.al. 2301.00023 null
2022-12-28 All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks Carina Geldhauser et.al. 2212.13810 null
2022-12-23 Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing William Brannon et.al. 2212.12137 null
2022-12-09 Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers Yasheng Sun et.al. 2212.04970 null
2022-12-07 Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors Zhentao Yu et.al. 2212.04248 null
2022-12-07 SPACE: Speech-driven Portrait Animation with Controllable Expression Siddharth Gururani et.al. 2211.09809 null
2022-11-30 Extracting Semantic Knowledge from GANs with Unsupervised Learning Jianjin Xu et.al. 2211.16710 null
2022-11-29 VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild Kun Cheng et.al. 2211.14758 null
2022-11-26 Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis Duomin Wang et.al. 2211.14506 link
2022-11-22 Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition Jiaxiang Tang et.al. 2211.12368 null
2022-11-10 On the role of Lip Articulation in Visual Speech Perception Zakaria Aldeneh et.al. 2203.10117 null
2022-11-04 SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory Se Jin Park et.al. 2211.00924 null
2022-10-21 Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection Alexandros Haliassos et.al. 2201.07131 link
2022-10-14 Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar Aolan Sun et.al. 2210.06877 null
2022-10-13 Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors Vladimir Iashin et.al. 2210.07055 link
2022-10-07 Compressing Video Calls using Synthetic Talking Heads Madhav Agarwal et.al. 2210.03692 null
2022-10-07 A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis Yichen Han et.al. 2210.03335 null
2022-10-06 Audio-Visual Face Reenactment Madhav Agarwal et.al. 2210.02755 link
2022-10-06 Finding Directions in GAN's Latent Space for Neural Face Reenactment Stella Bounareli et.al. 2202.00046 link
2022-10-04 Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale Aditya Agarwal et.al. 2208.09796 null
2022-09-29 Facial Landmark Predictions with Applications to Metaverse Qiao Han et.al. 2209.14698 link
2022-09-27 StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment Stella Bounareli et.al. 2209.13375 link
2022-09-23 EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model Xinya Ji et.al. 2205.15278 null
2022-09-21 FNeVR: Neural Volume Rendering for Face Animation Bohan Zeng et.al. 2209.10340 link
2022-09-19 AutoLV: Automatic Lecture Video Generator Wenbin Wang et.al. 2209.08795 null
2022-09-09 Talking Head from Speech Audio using a Pre-trained Image Generator Mohammed M. Alghamdi et.al. 2209.04252 null
2022-09-07 Restructurable Activation Networks Kartikeya Bhardwaj et.al. 2208.08562 link
2022-08-29 StableFace: Analyzing and Improving Motion Stability for Talking Face Generation Jun Ling et.al. 2208.13717 null
2022-08-17 Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors Sindhu B Hegde et.al. 2208.08118 link
2022-08-03 Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control Michail Christos Doukas et.al. 2208.02210 null
2022-08-02 Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer Ailin Huang et.al. 2206.12837 link
2022-08-01 A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip Shuang Chen et.al. 2208.01149 link
2022-07-27 A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing Goluck Konuko et.al. 2207.13530 null
2022-07-24 Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Shuai Shen et.al. 2207.11770 link
2022-07-22 Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos Panagiotis P. Filntisis et.al. 2207.11094 link
2022-07-20 NARRATE: A Normal Assisted Free-View Portrait Stylizer Youjia Wang et.al. 2207.00974 null
2022-07-20 VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection Joanna Hong et.al. 2206.07458 null
2022-07-20 Responsive Listening Head Generation: A Benchmark Dataset and Baseline Mohan Zhou et.al. 2112.13548 null
2022-07-13 FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqi Wang et.al. 2207.03800 link
2022-06-29 Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs Bo-Kyeong Kim et.al. 2206.14658 null
2022-06-09 Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos Alexander Waibel et.al. 2206.04523 null
2022-05-31 Text/Speech-Driven Full-Body Animation Wenlin Zhuang et.al. 2205.15573 null
2022-05-27 Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast Boqing Zhu et.al. 2204.14057 link
2022-05-26 One-Shot Face Reenactment on Megapixels Wonjun Kang et.al. 2205.13368 null
2022-05-24 Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts Debjoy Saha et.al. 2205.12194 link
2022-05-20 MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement Alexander Richard et.al. 2104.08223 link
2022-05-13 Talking Face Generation with Multilingual TTS Hyoung-Kyu Song et.al. 2205.06421 null
2022-05-02 Emotion-Controllable Generalized Talking Face Generation Sanjana Sinha et.al. 2205.01155 null
2022-05-02 A Novel Speech-Driven Lip-Sync Model with CNN and LSTM Xiaohong Li et.al. 2205.00916 null
2022-04-27 Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion Sen Chen et.al. 2204.12756 null
2022-04-25 Fast Facial Landmark Detection and Applications: A Survey Kostiantyn Khabarlak et.al. 2101.10808 null
2022-04-13 Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions Zipeng Ye et.al. 2204.06180 null
2022-04-12 Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild Ganglai Wang et.al. 2203.03984 null
2022-04-06 Transformer-S2A: Robust and Efficient Speech-to-Animation Liyang Chen et.al. 2111.09771 null
2022-04-03 Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text Pulkit Tandon et.al. 2106.14014 link
2022-03-30 End to End Lip Synchronization with a Temporal AutoEncoder Yoav Shalev et.al. 2203.16224 link
2022-03-29 Thin-Plate Spline Motion Model for Image Animation Jian Zhao et.al. 2203.14367 link
2022-03-17 StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN Fei Yin et.al. 2203.04036 link
2022-03-17 FaceFormer: Speech-Driven 3D Facial Animation with Transformers Yingruo Fan et.al. 2112.05329 link
2022-03-16 Efficient conditioned face animation using frontally-viewed embedding Maxime Oquab et.al. 2203.08765 null
2022-03-15 Depth-Aware Generative Adversarial Network for Talking Head Video Generation Fa-Ting Hong et.al. 2203.06605 link
2022-03-10 An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection Ganglai Wang et.al. 2203.05178 null
2022-03-04 Multi-modality Deep Restoration of Extremely Compressed Face Videos Xi Zhang et.al. 2107.05548 null
2022-03-01 FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset Hasam Khalid et.al. 2108.05080 link
2022-02-25 FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment Yuval Nirkin et.al. 2202.12972 null
2022-02-22 Thinking the Fusion Strategy of Multi-reference Face Reenactment Takuya Yashima et.al. 2202.10758 null
2022-01-24 Selective Listening by Synchronizing Speech with Lips Zexu Pan et.al. 2106.07150 link
2022-01-22 Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary Sibo Zhang et.al. 2104.14631 null
2022-01-21 Stitch it in Time: GAN-Based Facial Editing of Real Videos Rotem Tzaban et.al. 2201.08361 link
2022-01-17 Towards Realistic Visual Dubbing with Heterogeneous Sources Tianyi Xie et.al. 2201.06260 null
2022-01-16 Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels Zipeng Ye et.al. 2201.05986 null
2022-01-03 DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering Shunyu Yao et.al. 2201.00791 null
2021-12-20 Parallel and High-Fidelity Text-to-Lip Generation Jinglin Liu et.al. 2107.06831 link
2021-12-19 Initiative Defense against Facial Manipulation Qidong Huang et.al. 2112.10098 link
2021-12-07 Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation Yingruo Fan et.al. 2112.02214 null
2021-12-06 One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning Suzhen Wang et.al. 2112.02749 null
2021-11-29 Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates Shenhan Qian et.al. 2108.08020 link
2021-11-04 FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation Wei Gan et.al. 2111.02751 null
2021-11-02 BiosecurID: a multimodal biometric database Julian Fierrez et.al. 2111.03472 null
2021-10-30 Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis Haozhe Wu et.al. 2111.00203 link
2021-10-26 Emotion recognition in talking-face videos using persistent entropy and neural networks Eduardo Paluzo-Hidalgo et.al. 2110.13571 link
2021-10-26 ViDA-MAN: Visual Dialog with Digital Humans Tong Shen et.al. 2110.13384 null
2021-10-22 Invertible Frowns: Video-to-Video Facial Emotion Translation Ian Magnusson et.al. 2109.08061 null
2021-10-19 Talking Head Generation with Audio and Speech Related Facial Action Units Sen Chen et.al. 2110.09951 null
2021-10-16 Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor Anchit Gupta et.al. 2110.08580 null
2021-10-12 Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment Haichao Zhang et.al. 2110.04708 null
2021-10-07 Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution Yangyang Shi et.al. 2110.05241 null
2021-09-24 Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu et.al. 2109.10595 null
2021-09-20 Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach Stevo Rackovic et.al. 2109.08356 null
2021-09-17 Detection of GAN-synthesized street videos Omran Alamayreh et.al. 2109.04991 null
2021-08-30 Audiovisual Speech Synthesis using Tacotron2 Ahmed Hussen Abdelaziz et.al. 2008.00620 null
2021-08-23 KoDF: A Large-scale Korean DeepFake Detection Dataset Patrick Kwon et.al. 2103.10094 null
2021-08-23 HeadGAN: One-shot Neural Head Synthesis and Editing Michail Christos Doukas et.al. 2012.08261 null
2021-08-19 AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis Yudong Guo et.al. 2103.11078 link
2021-08-18 DeepFake MNIST+: A DeepFake Facial Animation Dataset Jiajun Huang et.al. 2108.07949 link
2021-08-18 FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning Chenxu Zhang et.al. 2108.07938 link
2021-08-12 UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing Meng Cao et.al. 2108.05650 null
2021-08-11 AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang et.al. 2108.04325 null
2021-08-06 SofGAN: A Portrait Image Generator with Dynamic Styling Anpei Chen et.al. 2007.03780 link
2021-07-27 Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations Laurent Benaroya et.al. 2107.12346 null
2021-07-21 Speech Driven Talking Face Generation from a Single Image and an Emotion Condition Sefik Emre Eskimez et.al. 2008.03592 link
2021-07-20 Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion Suzhen Wang et.al. 2107.09293 link
2021-07-10 Speech2Video: Cross-Modal Distillation for Speech to Video Generation Shijing Si et.al. 2107.04806 null
2021-07-07 Egocentric Videoconferencing Mohamed Elgharib et.al. 2107.03109 null
2021-06-09 LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization Avisek Lahiri et.al. 2106.04185 null
2021-05-20 Audio-Driven Emotional Video Portraits Xinya Ji et.al. 2104.07452 null
2021-05-07 Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation Lincheng Li et.al. 2104.07995 link
2021-05-05 A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors Ruobing Zheng et.al. 2002.08700 null
2021-04-29 Learned Spatial Representations for Few-shot Talking-Head Synthesis Moustafa Meshry et.al. 2104.14557 null
2021-04-26 One-shot Face Reenactment Using Appearance Adaptive Normalization Guangming Yao et.al. 2102.03984 null
2021-04-25 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head Qianyun Wang et.al. 2104.12051 null
2021-04-23 Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation Hang Zhou et.al. 2104.11116 null
2021-04-07 Single Source One Shot Reenactment using Weighted motion From Paired Feature Points Soumya Tripathy et.al. 2104.03117 null
2021-04-07 Everything's Talkin': Pareidolia Face Reenactment Linsen Song et.al. 2104.03061 link
2021-04-07 LI-Net: Large-Pose Identity-Preserving Face Reenactment Network Jin Liu et.al. 2104.02850 null
2021-04-02 One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing Ting-Chun Wang et.al. 2011.15126 null
2021-03-20 Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization Komal Chugh et.al. 2005.14405 link
2021-03-19 End-to-End Lip Synchronisation Based on Pattern Classification You Jin Kim et.al. 2005.08606 null
2021-03-05 Real-time RGBD-based Extended Body Pose Estimation Renat Bashirov et.al. 2103.03663 link
2021-03-03 Estimating Uniqueness of I-Vector Representation of Human Voice Erkam Sinan Tandogan et.al. 2008.11985 null
2021-02-25 MakeItTalk: Speaker-Aware Talking-Head Animation Yang Zhou et.al. 2004.12992 null
2021-02-19 One Shot Audio to Animated Video Generation Neeraj Kumar et.al. 2102.09737 null
2021-02-18 AudioVisual Speech Synthesis: A brief literature review Efthymios Georgiou et.al. 2103.03927 null
2020-12-14 Robust One Shot Audio to Video Generation Neeraj Kumar et.al. 2012.07842 null
2020-12-14 Multi Modal Adaptive Normalization for Audio to Video Generation Neeraj Kumar et.al. 2012.07304 null
2020-11-30 Adaptive Compact Attention For Few-shot Video-to-video Translation Risheng Huang et.al. 2011.14695 null
2020-11-21 Stochastic Talking Face Generation Using Latent Distribution Matching Ravindra Yadav et.al. 2011.10727 link
2020-11-21 Iterative Text-based Editing of Talking-heads Using Neural Retargeting Xinwei Yao et.al. 2011.10688 null
2020-11-09 FACEGAN: Facial Attribute Controllable rEenactment GAN Soumya Tripathy et.al. 2011.04439 null
2020-11-06 Large-scale multilingual audio visual dubbing Yi Yang et.al. 2011.03530 null
2020-11-02 Facial Keypoint Sequence Generation from Audio Prateek Manocha et.al. 2011.01114 null
2020-10-25 APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment Jiangning Zhang et.al. 2010.13017 link
2020-10-12 Intuitive Facial Animation Editing Based On A Generative RNN Framework EloΓ―se Berson et.al. 2010.05655 null
2020-10-05 SMILE: Semantically-guided Multi-attribute Image and Layout Editing AndrΓ©s Romero et.al. 2010.02315 link
2020-10-05 Dynamic Facial Asset and Rig Generation from a Single Scan Jiaman Li et.al. 2010.00560 null
2020-09-20 An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management Ruturaj Raval et.al. 2009.09354 null
2020-09-18 Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks Guangming Yao et.al. 2008.07783 null
2020-09-12 DualLip: A System for Joint Lip Reading and Generation Weicong Chen et.al. 2009.05784 null
2020-09-02 Seeing wake words: Audio-visual Keyword Spotting Liliane Momeni et.al. 2009.01225 null
2020-08-29 "It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok Daniel Klug et.al. 2008.13040 null
2020-08-25 A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild K R Prajwal et.al. 2008.10010 null
2020-08-11 Audio- and Gaze-driven Facial Animation of Codec Avatars Alexander Richard et.al. 2008.05023 null
2020-08-04 Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract TamΓ‘s GΓ‘bor CsapΓ³ et.al. 2008.02098 link
2020-08-04 Real-Time Cleaning and Refinement of Facial Animation Signals EloΓ―se Berson et.al. 2008.01332 null
2020-08-02 Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos Yanhui Guo et.al. 2008.01652 null
2020-07-29 Neural Voice Puppetry: Audio-driven Facial Reenactment Justus Thies et.al. 1912.05566 link
2020-07-20 Deformable Style Transfer Sunnie S. Y. Kim et.al. 2003.11038 link
2020-07-18 A Robust Interactive Facial Animation Editing System EloΓ―se Berson et.al. 2007.09367 null
2020-07-16 Talking-head Generation with Rhythmic Head Motion Lele Chen et.al. 2007.08547 link
2020-07-08 Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision Abhinav Shukla et.al. 2007.04134 null
2020-06-20 Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams Huirong Huang et.al. 2006.11610 null
2020-05-27 Modality Dropout for Improved Performance-driven Talking Faces Ahmed Hussen Abdelaziz et.al. 2005.13616 null
2020-05-25 Identity-Preserving Realistic Talking Face Generation Sanjana Sinha et.al. 2005.12318 null
2020-05-22 Head2Head: Video-based Neural Head Synthesis Mohammad Rami Koujan et.al. 2005.10954 null
2020-05-16 FReeNet: Multi-Identity Face Reenactment Jiangning Zhang et.al. 1905.11805 null
2020-05-13 FaR-GAN for One-Shot Face Reenactment Hanxiang Hao et.al. 2005.06402 null
2020-05-13 Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning Hao Zhu et.al. 1812.06589 null
2020-05-11 Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok Juan Carlos Medina Serrano et.al. 2004.05478 link
2020-05-07 What comprises a good talking-head video generation?: A Survey and Benchmark Lele Chen et.al. 2005.03201 link
2020-05-04 Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani et.al. 2002.08742 null
2020-04-30 APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals Jiangning Zhang et.al. 2004.14569 null
2020-03-30 ActGAN: Flexible and Efficient One-shot Face Reenactment Ivan Kosarevych et.al. 2003.13840 null
2020-03-29 Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose Xianfang Zeng et.al. 2003.12957 null
2020-03-26 High-Accuracy Facial Depth Models derived from 3D Synthetic Data Faisal Khan et.al. 2003.06211 null
2020-03-06 Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose Ran Yi et.al. 2002.10137 null
2020-03-05 Talking-Heads Attention Noam Shazeer et.al. 2003.02436 link
2020-03-01 Towards Automatic Face-to-Face Translation Prajwal K R et.al. 2003.00418 link
2020-02-19 Speech-driven facial animation using polynomial fusion of features Triantafyllos Kefalas et.al. 1912.05833 null
2020-01-17 ICface: Interpretable and Controllable Face Reenactment Using GANs Soumya Tripathy et.al. 1904.01909 null
2019-12-20 Disentangling Style and Content in Anime Illustrations Sitao Xiang et.al. 1905.10742 null
2019-11-21 FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis Kuangxiao Gu et.al. 1911.09224 null
2019-11-19 MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets Sungjoo Ha et.al. 1911.08139 null
2019-10-28 Few-shot Video-to-Video Synthesis Ting-Chun Wang et.al. 1910.12713 null
2019-10-19 Real-Time Lip Sync for Live 2D Animation Deepali Aneja et.al. 1910.08685 link
2019-10-16 Designing Style Matching Conversational Agents Deepali Aneja et.al. 1910.07514 null
2019-10-15 A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities Deepali Aneja et.al. 1909.08766 link
2019-10-09 EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos Haipeng Zeng et.al. 1907.12918 null
2019-10-02 Animating Face using Disentangled Audio Representations Gaurav Mittal et.al. 1910.00726 null
2019-09-25 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Egor Zakharov et.al. 1905.08233 null
2019-09-06 Neural Style-Preserving Visual Dubbing Hyeongwoo Kim et.al. 1909.02518 null
2019-08-29 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach Ngoc-Trung Tran et.al. 1908.11039 null
2019-08-20 Prosodic Phrase Alignment for Machine Dubbing Alp Γ–ktem et.al. 1908.07226 link
2019-08-16 FSGAN: Subject Agnostic Face Swapping and Reenactment Yuval Nirkin et.al. 1908.05932 link
2019-08-11 Emotion Dependent Facial Animation from Affective Speech Rizwan Sadiq et.al. 1908.03904 null
2019-08-05 One-shot Face Reenactment Yunxuan Zhang et.al. 1908.03251 link
2019-07-25 Talking Face Generation by Conditional Recurrent Adversarial Network Yang Song et.al. 1804.04786 link
2019-07-24 Data-Driven Physical Face Inversion Yeara Kozlov et.al. 1907.10402 null
2019-07-23 A system for efficient 3D printed stop-motion face animation Rinat Abdrashitov et.al. 1907.10163 null
2019-06-14 Realistic Speech-Driven Facial Animation with GANs Konstantinos Vougioukas et.al. 1906.06337 null
2019-06-04 Text-based Editing of Talking-head Video Ohad Fried et.al. 1906.01524 null
2019-05-27 Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks Guanzhong Tian et.al. 1905.11142 null
2019-05-09 Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss Lele Chen et.al. 1905.03820 link
2019-05-08 Capture, Learning, and Synthesis of 3D Speaking Styles Daniel Cudeiro et.al. 1905.03079 link
2019-04-23 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation Hang Zhou et.al. 1807.07860 null
2019-04-02 FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation Yanfu Yan et.al. 1904.01509 null
2019-03-13 Animating an Autonomous 3D Talking Avatar Dominik Borer et.al. 1903.05448 null
2018-12-22 Deep Audio-Visual Speech Recognition Triantafyllos Afouras et.al. 1809.02108 null
2018-12-20 DeepFakes: a New Threat to Face Recognition? Assessment and Detection Pavel Korshunov et.al. 1812.08685 null
2018-11-22 Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos Ying Tai et.al. 1811.00342 link
2018-11-16 Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters Maartje M. E. Hendrikse et.al. 1812.02088 null
2018-08-28 GANimation: Anatomically-aware Facial Animation from a Single Image Albert Pumarola et.al. 1807.09251 link
2018-08-19 Dynamic Temporal Alignment of Speech to Lips Tavi Halperin et.al. 1808.06250 link
2018-07-29 ReenactGAN: Learning to Reenact Faces via Boundary Transfer Wayne Wu et.al. 1807.11079 link
2018-07-26 Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani et.al. 1805.00833 null
2018-07-19 End-to-End Speech-Driven Facial Animation with Temporal GANs Konstantinos Vougioukas et.al. 1805.09313 null
2018-05-29 Deep Video Portraits Hyeongwoo Kim et.al. 1805.11714 null
2018-05-24 VisemeNet: Audio-Driven Animator-Centric Speech Animation Yang Zhou et.al. 1805.09488 null
2018-05-21 Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks Sitao Xiang et.al. 1805.07997 null
2018-04-23 Generating Talking Face Landmarks from Speech Sefik Emre Eskimez et.al. 1803.09803 null
2018-03-28 Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network Hai X. Pham et.al. 1803.07716 null
2018-03-20 Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks Seyed Ali Jalalifar et.al. 1803.07461 null
2017-12-07 End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech Hai X. Pham et.al. 1710.00920 null
2017-12-06 ObamaNet: Photo-realistic lip-sync from text Rithesh Kumar et.al. 1801.01442 null
2017-07-30 Kernel Projection of Latent Structures Regression for Facial Animation Retargeting Christos Ouzounis et.al. 1707.09629 null
2017-07-26 Fast Deep Matting for Portrait Animation on Mobile Phone Bingke Zhu et.al. 1707.08289 null
2017-07-21 Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking Rahul Sharma et.al. 1707.06830 null
2017-07-18 You said that? Joon Son Chung et.al. 1705.02966 null
2017-01-30 Lip Reading Sentences in the Wild Joon Son Chung et.al. 1611.05358 link
2016-10-28 Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei Johannes Buchner et.al. 1610.09380 link
2016-07-11 Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video Shaoshi Yang et.al. 1601.06684 null
2016-05-22 Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression David Rim et.al. 1512.08212 null
2016-02-08 Automatic Face Reenactment Pablo Garrido et.al. 1602.02651 null
2015-11-20 ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication Ali Mollahosseini et.al. 1511.06502 null
2014-09-03 Visual Speech Recognition Ahmad B. A. Hassanat et.al. 1409.1411 null
2012-09-22 Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis Ingmar Steiner et.al. 1209.4982 null
2012-03-30 Face Expression Recognition and Analysis: The State of the Art Vinay Bettadapura et.al. 1203.6722 null
2012-01-19 Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis Ingmar Steiner et.al. 1201.4080 null
2010-03-01 Re-verification of a Lip Synchronization Protocol using Robust Reachability Piotr Kordy et.al. 1003.0431 null

(back to top)

Image Animation

Image Animation

Publish Date Title Authors PDF Code
2025-12-05 SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations Wenhao Yan et.al. 2512.05905 null
2025-12-05 Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer Rong Wang et.al. 2512.05593 null
2025-12-04 ShadowDraw: From Any Object to Shadow-Drawing Compositional Art Rundong Luo et.al. 2512.05110 null
2025-12-04 Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex Zhizhen Wu et.al. 2512.04556 null
2025-12-03 Artificial Microsaccade Compensation: Stable Vision for an Ornithopter Levi Burner et.al. 2512.03995 null
2025-12-02 PPTArena: A Benchmark for Agentic PowerPoint Editing Michael Ofengenden et.al. 2512.03042 null
2025-12-01 Know Thyself by Knowing Others: Learning Neuron Identity from Population Context Vinam Arora et.al. 2512.01199 null
2025-12-01 One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer Shijun Shi et.al. 2511.22940 null
2025-11-30 TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model Alireza Javanmardi et.al. 2512.00909 null
2025-11-29 Astro-Animation -- How Artists and Scientists Envision the Universe Laurence Arcadias et.al. 2512.00535 null
2025-11-28 MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation Yuta Oshima et.al. 2511.22989 null
2025-11-28 OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild Yuncheng Guo et.al. 2511.08423 null
2025-11-27 A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization Janak Kapuriya et.al. 2511.22576 null
2025-11-27 INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts Anshul Bagaria et.al. 2511.22351 null
2025-11-25 MotionV2V: Editing Motion in a Video Ryan Burgert et.al. 2511.20640 null
2025-11-25 New York Smells: A Large Multimodal Dataset for Olfaction Ege Ozguroglu et.al. 2511.20544 null
2025-11-24 SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Jiaming Zhang et.al. 2511.19320 null
2025-11-22 AnimAgents: Coordinating Multi-Stage Animation Pre-Production with Human-Multi-Agent Collaboration Wen-Fan Wang et.al. 2511.17906 null
2025-11-20 Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions Takuya Igaue et.al. 2511.16711 null
2025-11-20 Integrating Deep Learning and Spatial Statistics in Marine Ecosystem Monitoring Gian Mario Sangiovanni et.al. 2511.16447 null
2025-11-20 How Robot Dogs See the Unseeable Oliver Bimber et.al. 2511.16262 null
2025-11-18 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos Dianbing Xi et.al. 2511.12935 null
2025-11-16 Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction Li Wang et.al. 2510.26196 null
2025-11-14 EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation Zongyang Qiu et.al. 2511.11002 null
2025-11-11 oboro: Text-to-Image Synthesis on Limited Data using Flow-based Diffusion Transformer with MMH Attention Ryusuke Mizutani et.al. 2511.08168 null
2025-11-11 Beyond the Pixels: VLM-based Evaluation of Identity Preservation in Reference-Guided Synthesis Aditi Singhania et.al. 2511.08087 null
2025-11-09 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Assaf Singer et.al. 2511.08633 null
2025-11-04 Video Text Preservation with Synthetic Text-Rich Videos Ziyang Liu et.al. 2511.05573 null
2025-11-03 FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion Chuhao Chen et.al. 2510.25765 null
2025-11-02 A Hybrid YOLOv5-SSD IoT-Based Animal Detection System for Durian Plantation Protection Anis Suttan Shahrir et.al. 2511.00777 null
2025-10-31 DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model Yucheng Xing et.al. 2510.27169 null
2025-10-29 4-Doodle: Text to 3D Sketches that Move! Hao Chen et.al. 2510.25319 null
2025-10-28 DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery Zan Wang et.al. 2510.24117 null
2025-10-27 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation Junyoung Seo et.al. 2510.23581 null
2025-10-27 Revising Second Order Terms in Deep Animation Video Coding Konstantin Schmidt et.al. 2510.23561 null
2025-10-26 Cross-Species Transfer Learning in Agricultural AI: Evaluating ZebraPose Adaptation for Dairy Cattle Pose Estimation Mackenzie Tapp et.al. 2510.22618 null
2025-10-26 DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss Jing Yang et.al. 2510.22473 null
2025-10-20 From Volume Rendering to 3D Gaussian Splatting: Theory and Applications Vitor Pereira Matias et.al. 2510.18101 null
2025-10-16 Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation Shaowei Liu et.al. 2510.14976 null
2025-10-16 Zero-Shot Wildlife Sorting Using Vision Transformers: Evaluating Clustering and Continuous Similarity Ordering Hugo Markoff et.al. 2510.14596 null
2025-10-16 Hierarchical Re-Classification: Combining Animal Classification Models with Vision Transformers Hugo Markoff et.al. 2510.14594 null
2025-10-16 Evaluating plastic scintillator performance as a substitute of LYSO in SiPM based animal PET scanners: A GEANT4 simulation analysis Davinder Siwal et.al. 2510.14437 null
2025-10-16 Multi-identity Human Image Animation with Structural Video Diffusion Zhenzhi Wang et.al. 2504.04126 null
2025-09-19 TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection Wenkui Yang et.al. 2505.08437 null
2025-09-09 LINR Bridge: Vector Graphic Animation via Neural Implicits and Video Diffusion Priors Wenshuo Gao et.al. 2509.07484 null
2025-08-23 AnimateAnywhere: Rouse the Background in Human Image Animation Xiaoyu Liu et.al. 2504.19834 null
2025-08-13 Animate-X++: Universal Character Image Animation with Dynamic Backgrounds Shuai Tan et.al. 2508.09454 null
2025-08-10 Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers Xin Ma et.al. 2508.07246 null
2025-07-20 StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation Shuyuan Tu et.al. 2507.15064 null
2025-07-11 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-07-01 DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution Zhe Kong et.al. 2507.01012 null
2025-07-01 Recomposed realities: animating still images via patch clustering and randomness Markus Juvonen et.al. 2506.22556 null
2025-05-30 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 null
2025-05-29 HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Shuolin Xu et.al. 2505.22977 null
2025-05-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-05-18 DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation Haoyu Zhao et.al. 2503.21246 null
2025-04-20 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance Yuxuan Luo et.al. 2504.01724 null
2025-04-15 UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer Xiang Wang et.al. 2504.11289 null
2025-04-15 Taming Consistency Distillation for Accelerated Human Image Animation Xiang Wang et.al. 2504.11143 null
2025-04-04 Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images In-Hwan Jin et.al. 2504.05458 null
2025-04-01 VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer Xinyu Liu et.al. 2502.05979 null
2025-03-23 MotiF: Making Text Count in Image Animation with Motion Focal Loss Shijie Wang et.al. 2412.16153 null
2025-03-13 Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer Jiahao Cui et.al. 2412.00733 link
2025-03-10 Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation Yingjie Chen et.al. 2501.05020 null
2025-02-25 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Hongxiang Li et.al. 2412.09349 link
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 null
2025-02-10 Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance Li Hu et.al. 2502.06145 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-02-03 Every Image Listens, Every Image Dances: Music-Driven Image Animation Zhikang Dong et.al. 2501.18801 null
2025-01-20 X-Dyna: Expressive Dynamic Human Image Animation Di Chang et.al. 2501.10021 null
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2024-12-12 Animate-X: Universal Character Image Animation with Enhanced Motion Representation Shuai Tan et.al. 2410.10306 null
2024-12-04 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Taekyung Ki et.al. 2412.01064 null
2024-11-30 DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses Yatian Pang et.al. 2412.00397 null
2024-11-28 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation Xuyang Cao et.al. 2411.09209 link
2024-11-27 StableAnimator: High-Quality Identity-Preserving Human Image Animation Shuyuan Tu et.al. 2411.17697 link
2024-11-24 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis Haojie Zhang et.al. 2411.16748 null
2024-11-22 HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Zhenzhi Wang et.al. 2407.17438 null
2024-10-31 TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation Sunjae Yoon et.al. 2410.24037 null
2024-10-20 FrameBridge: Improving Image-to-Video Generation with Bridge Models Yuji Wang et.al. 2410.15371 null
2024-10-14 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Jiahao Cui et.al. 2410.07718 link
2024-09-30 Illustrious: an Open Advanced Illustration Model Sang Hyun Park et.al. 2409.19946 null
2024-09-29 High Quality Human Image Animation using Regional Supervision and Motion Blur Condition Zhongcong Xu et.al. 2409.19580 null
2024-09-22 Dormant: Defending against Pose-driven Human Image Animation Jiachen Zhou et.al. 2409.14424 link
2024-07-23 Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Xin Ma et.al. 2407.15642 link
2024-07-12 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Jeongho Kim et.al. 2407.09012 null
2024-07-12 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions Zhiyuan Chen et.al. 2407.08136 link
2024-07-11 MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Muyao Niu et.al. 2405.20222 link
2024-06-16 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation Mingwang Xu et.al. 2406.08801 null
2024-06-14 Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation Li Hu et.al. 2311.17117 null
2024-06-13 Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control Jingyun Xue et.al. 2406.03035 null
2024-06-03 UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation Xiang Wang et.al. 2406.01188 null
2024-06-01 Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance Shenhao Zhu et.al. 2403.14781 link
2024-05-29 Evaluating the efectiveness of sonifcation in science education using Edukoi Lucrezia Guiotto Nai Fovino et.al. 2405.18908 null
2024-05-28 VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation Qilin Wang et.al. 2405.18156 null
2024-05-28 Controllable Longer Image Animation with Diffusion Models Qiang Wang et.al. 2405.17306 null
2024-03-26 PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models Yiming Zhang et.al. 2312.13964 null
2024-03-13 Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Yue Ma et.al. 2403.08268 link
2024-03-08 Audio-Synchronized Visual Animation Lin Zhang et.al. 2403.05659 link
2024-03-05 Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Weijie Li et.al. 2403.02827 null
2024-01-17 Continuous Piecewise-Affine Based Motion Model for Image Animation Hexiang Wang et.al. 2401.09146 link
2024-01-03 Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions David Junhao Zhang et.al. 2401.01827 link
2023-12-08 AnimateZero: Video Diffusion Models are Zero-Shot Image Animators Jiwen Yu et.al. 2312.03793 null
2023-12-06 AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance Zuozhuo Dai et.al. 2311.12886 null
2023-12-05 LivePhoto: Real Image Animation with Text-guided Motion Control Xi Chen et.al. 2312.02928 null
2023-11-30 Motion-Conditioned Image Animation for Video Editing Wilson Yan et.al. 2311.18827 null
2023-11-27 MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model Zhongcong Xu et.al. 2311.16498 null
2023-11-27 DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Jinbo Xing et.al. 2310.12190 link
2023-11-19 Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation Peirong Liu et.al. 2110.04658 null
2023-10-16 LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation Ruiqi Wu et.al. 2310.10769 link
2023-10-11 LEO: Generative Latent Image Animator for Human Video Synthesis Yaohui Wang et.al. 2305.03989 link
2023-09-26 Text-Guided Synthesis of Eulerian Cinemagraphs Aniruddha Mahapatra et.al. 2307.03190 link
2023-09-25 Automatic Animation of Hair Blowing in Still Portrait Photos Wenpeng Xiao et.al. 2309.14207 null
2023-07-10 AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Yuwei Guo et.al. 2307.04725 link
2023-07-09 Predictive Coding For Animation-Based Video Compression Goluck Konuko et.al. 2307.04187 null
2023-04-12 VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs Moayed Haji Ali et.al. 2304.06020 null
2023-03-10 3D Cinemagraphy from a Single Image Xingyi Li et.al. 2303.05724 null
2023-02-02 Dreamix: Video Diffusion Models are General Video Editors Eyal Molad et.al. 2302.01329 null
2023-01-27 Animating Still Images Kushagr Batra et.al. 2209.10497 null
2023-01-14 Continuous odor profile monitoring to study olfactory navigation in small animals Kevin S. Chen et.al. 2301.05905 null
2022-11-30 NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation Yu Yin et.al. 2211.17235 null
2022-10-05 Implicit Warping for Animation with Image Sets Arun Mallya et.al. 2210.01794 null
2022-09-28 Motion Transformer for Unsupervised Image Animation Jiale Tao et.al. 2209.14024 link
2022-07-19 Single Stage Virtual Try-on via Deformable Attention Flows Shuai Bai et.al. 2207.09161 link
2022-07-08 Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation Yucheng Suo et.al. 2207.03714 null
2022-06-11 Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification Mengdi Gao et.al. 2106.12284 link
2022-04-05 Neural Fields in Visual Computing and Beyond Yiheng Xie et.al. 2111.11426 null
2022-03-30 Image Animation with Perturbed Masks Yoav Shalev et.al. 2011.06922 null
2022-03-29 Thin-Plate Spline Motion Model for Image Animation Jian Zhao et.al. 2203.14367 link
2022-03-25 3D GAN Inversion for Controllable Portrait Image Animation Connor Z. Lin et.al. 2203.13441 null
2022-03-18 Latent Image Animator: Learning to Animate Images via Latent Space Navigation Yaohui Wang et.al. 2203.09043 null
2021-12-21 Image Animation with Keypoint Mask Or Toledano et.al. 2112.10457 link
2021-12-19 Move As You Like: Image Animation in E-Commerce Scenario Borun Xu et.al. 2112.13647 null
2021-12-17 AI-Empowered Persuasive Video Generation: A Survey Chang Liu et.al. 2112.09401 null
2021-12-01 Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation Yurui Ren et.al. 2008.12606 null
2021-10-28 Application of Time Separation Technique to Enhance C-arm CT Dynamic Liver Perfusion Imaging Hana Haseljić et.al. 2110.14318 null
2021-10-26 Incremental Learning for Animal Pose Estimation using RBF k-DPP Gaurav Kumar Nayak et.al. 2110.13598 null
2021-10-07 Enhancement of Anime Imaging Enlargement using Modified Super-Resolution CNN Tanakit Intaniyom et.al. 2110.02321 null
2021-09-06 Sparse to Dense Motion Transfer for Face Image Animation Ruiqi Zhao et.al. 2109.00471 null
2021-08-18 DeepFake MNIST+: A DeepFake Facial Animation Dataset Jiajun Huang et.al. 2108.07949 link
2021-06-23 Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 Adellia et.al. 2106.15342 null
2021-04-07 Single Source One Shot Reenactment using Weighted motion From Paired Feature Points Soumya Tripathy et.al. 2104.03117 null
2021-03-23 PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation Wai Ting Cheung et.al. 2103.11600 null
2020-12-01 Ultra-low bitrate video conferencing using deep image animation Goluck Konuko et.al. 2012.00346 null
2020-10-01 First Order Motion Model for Image Animation Aliaksandr Siarohin et.al. 2003.00196 link
2019-08-30 Animating Arbitrary Objects via Deep Motion Transfer Aliaksandr Siarohin et.al. 1812.08861 link
2019-07-01 Style Generator Inversion for Image Enhancement and Animation Aviv Gabbay et.al. 1906.11880 null
2018-10-09 3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping Guillaume Caron et.al. 1810.03956 null
2018-06-24 A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic Jiaming Lu et.al. 1806.09117 null
2018-01-31 RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes Thomas Bronzwaer et.al. 1801.10452 null
2017-10-23 Quasi-random Agents for Image Transition and Animation Aneta Neumann et.al. 1710.07421 null
2016-06-23 Gender and Interest Targeting for Sponsored Post Advertising at Tumblr Mihajlo Grbovic et.al. 1606.07189 null
2015-03-16 Use of Effective Audio in E-learning Courseware Kisor Ray et.al. 1503.04837 null
2015-02-04 Multimedia-Video for Learning Kah Hean Chua et.al. 1502.01090 null
2013-01-25 Measurements of Martian Dust Devil Winds with HiRISE David S. Choi et.al. 1301.6130 null
2010-01-04 Tutoring System for Dance Learning Rajkumar Kannan et.al. 1001.0440 null

(back to top)

Video Generation

Video Generation

Publish Date Title Authors PDF Code
2025-12-08 UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation Jiehui Huang et.al. 2512.07831 null
2025-12-08 WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling Shaoheng Fang et.al. 2512.07821 null
2025-12-08 OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Zhaochong An et.al. 2512.07802 null
2025-12-08 ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation Fan Yang et.al. 2512.07720 null
2025-12-08 Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism Zhiyuan Wu et.al. 2512.07350 null
2025-12-08 ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation Ziyang Mai et.al. 2512.07328 null
2025-12-08 Unified Camera Positional Encoding for Controlled Video Generation Cheng Zhang et.al. 2512.07237 null
2025-12-07 VideoVLA: Video Generators Can Be Generalizable Robot Manipulators Yichao Shen et.al. 2512.06963 null
2025-12-07 Scaling Zero-Shot Reference-to-Video Generation Zijian Zhou et.al. 2512.06905 null
2025-12-07 RunawayEvil: Jailbreaking the Image-to-Video Generative Models Songping Wang et.al. 2512.06674 null
2025-12-07 MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Ruicheng Zhang et.al. 2512.06628 null
2025-12-06 Are AI-Generated Driving Videos Ready for Autonomous Driving? A Diagnostic Evaluation Framework Xinhao Xiang et.al. 2512.06376 null
2025-12-05 Tracking-Guided 4D Generation: Foundation-Tracker Motion Priors for 3D Model Animation Su Sun et.al. 2512.06158 null
2025-12-05 AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement Munsif Ali et.al. 2512.05960 null
2025-12-05 World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty Zhiting Mei et.al. 2512.05927 null
2025-12-05 Bring Your Dreams to Life: Continual Text-to-Video Customization Jiahua Dong et.al. 2512.05802 null
2025-12-05 USV: Unified Sparsification for Accelerating Video Diffusion Models Xinjian Wu et.al. 2512.05754 null
2025-12-05 ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior Weikai Lu et.al. 2512.05745 null
2025-12-05 InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem Yeobin Hong et.al. 2512.05672 null
2025-12-05 ProPhy: Progressive Physical Alignment for Dynamic World Simulation Zijun Wang et.al. 2512.05564 null
2025-12-05 VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation Chinthani Sugandhika et.al. 2512.05524 null
2025-12-05 User Negotiations of Authenticity, Ownership, and Governance on AI-Generated Video Platforms: Evidence from Sora Bohui Shen et.al. 2512.05519 null
2025-12-05 WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field Qi Zhu et.al. 2512.05492 null
2025-12-05 Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability Shizhan Liu et.al. 2512.05394 null
2025-12-04 IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video Reconstruction Dmitrii Torbunov et.al. 2512.05240 null
2025-12-04 Invariance Co-training for Robot Visual Generalization Jonathan Yang et.al. 2512.05230 null
2025-12-04 Light-X: Generative 4D Video Rendering with Camera and Illumination Control Tianqi Liu et.al. 2512.05115 null
2025-12-04 NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation Yu Zeng et.al. 2512.05106 null
2025-12-04 TV2TV: A Unified Framework for Interleaved Language and Video Generation Xiaochuang Han et.al. 2512.05103 null
2025-12-04 From Generated Human Videos to Physically Plausible Robot Trajectories James Ni et.al. 2512.05094 null
2025-12-04 Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression Jung Yi et.al. 2512.05081 null
2025-12-04 Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints Minghan Zhu et.al. 2512.05079 null
2025-12-04 BulletTime: Decoupled Control of Time and Camera Pose for Video Generation Yiming Wang et.al. 2512.05076 null
2025-12-04 Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image Yanran Zhang et.al. 2512.05044 null
2025-12-04 Generative Neural Video Compression via Video Diffusion Prior Qi Mao et.al. 2512.05016 null
2025-12-04 Exploring YouTube's Political Communication Networks during the 2024 French Elections Caroline Violot et.al. 2512.04971 null
2025-12-04 Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing Maria-Paola Forte et.al. 2512.04862 null
2025-12-04 Multi Task Denoiser Training for Solving Linear Inverse Problems ClΓ©ment Bled et.al. 2512.04709 null
2025-12-04 Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation Yunhong Lu et.al. 2512.04678 null
2025-12-04 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Yubo Huang et.al. 2512.04677 null
2025-12-04 SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding Chang-Hsun Wu et.al. 2512.04643 null
2025-12-04 VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management Hongbo Jin et.al. 2512.04540 null
2025-12-04 X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale Pei Yang et.al. 2512.04537 null
2025-12-04 PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement Yu-Wei Zhan et.al. 2512.04532 null
2025-12-04 VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory Yifei Yu et.al. 2512.04519 null
2025-12-04 EgoLCD: Egocentric Video Generation with Long Context Diffusion Liuzhou Zhang et.al. 2512.04515 null
2025-12-03 Stable Signer: Hierarchical Sign Language Generative Model Sen Fang et.al. 2512.04048 null
2025-12-03 RELIC: Interactive Video World Model with Long-Horizon Memory Yicong Hong et.al. 2512.04040 null
2025-12-03 PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation Xiaolong Li et.al. 2512.04025 null
2025-12-03 TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning Tao Wu et.al. 2512.03963 null
2025-12-03 UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework Youxin Pang et.al. 2512.03918 null
2025-12-03 Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence Shuai Yang et.al. 2512.03905 null
2025-12-03 ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos Qi'ao Xu et.al. 2512.03666 null
2025-12-03 The promising potential of vision language models for the generation of textual weather forecasts Edward C. C. Steele et.al. 2512.03623 null
2025-12-03 ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation Yaokun Li et.al. 2512.03621 null
2025-12-03 LAMP: Language-Assisted Motion Planning for Controllable Video Generation Muhammed Burak Kizil et.al. 2512.03619 null
2025-12-03 Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding Haoran Zhou et.al. 2512.03601 null
2025-12-03 Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation Yuchen Deng et.al. 2512.03590 null
2025-12-03 Dynamic Optical Test for Bot Identification (DOT-BI): A simple check to identify bots in surveys and online processes Malte Bleeker et.al. 2512.03580 null
2025-12-03 Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching Wei Chee Yew et.al. 2512.03553 null
2025-12-03 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation Subin Kim et.al. 2512.03534 null
2025-12-03 FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation Yiyi Cai et.al. 2512.03520 null
2025-12-03 Towards Object-centric Understanding for Instructional Videos Wenliang Guo et.al. 2512.03479 null
2025-12-03 GeoVideo: Introducing Geometric Regularization into Video Generation Model Yunpeng Bai et.al. 2512.03453 null
2025-12-03 GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers Zhiye Song et.al. 2512.03451 null
2025-12-03 FireSentry: A Multi-Modal Spatio-temporal Benchmark Dataset for Fine-Grained Wildfire Spread Forecasting Nan Zhou et.al. 2512.03369 null
2025-12-02 Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling Yueru Jia et.al. 2512.03044 null
2025-12-02 OneThinker: All-in-one Reasoning Model for Image and Video Kaituo Feng et.al. 2512.03043 null
2025-12-02 MultiShotMaster: A Controllable Multi-Shot Video Generation Framework Qinghe Wang et.al. 2512.03041 null
2025-12-02 Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation Zeqi Xiao et.al. 2512.03040 null
2025-12-02 ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Mengchen Zhang et.al. 2512.03036 null
2025-12-02 MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation Youxin Pang et.al. 2512.03034 null
2025-12-02 SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control Yuxuan Mu et.al. 2512.03028 null
2025-12-02 Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks Matthew Dutson et.al. 2512.03014 null
2025-12-02 In-Context Sync-LoRA for Portrait Video Editing Sagi Polaczek et.al. 2512.03013 null
2025-12-02 Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench Lanxiang Hu et.al. 2512.02942 null
2025-12-02 LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization Zhihan Xiao et.al. 2512.02933 null
2025-12-02 Taming Camera-Controlled Video Generation with Verifiable Geometry Reward Zhaoqing Wang et.al. 2512.02870 null
2025-12-02 Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video? Manuel Benavent-Lledo et.al. 2512.02846 null
2025-12-02 ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning Yifan Li et.al. 2512.02835 null
2025-12-02 From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity Haoming Liu et.al. 2512.02826 null
2025-12-02 FiMMIA: scaling semantic perturbation-based membership inference across modalities Anton Emelyanov et.al. 2512.02786 null
2025-12-02 Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset Qifan Liang et.al. 2512.02780 null
2025-12-02 Reasoning-Aware Multimodal Fusion for Hateful Video Detection Shuonan Yang et.al. 2512.02743 null
2025-12-02 Hear What Matters! Text-conditioned Selective Video-to-Audio Generation Junwon Lee et.al. 2512.02650 null
2025-12-02 RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence Xuming He et.al. 2512.02622 null
2025-12-01 Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now Varun Varma Thozhiyoor et.al. 2512.02016 null
2025-12-01 Generative Video Motion Editing with 3D Point Tracks Yao-Chih Lee et.al. 2512.02015 null
2025-12-01 TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Zhiheng Liu et.al. 2512.02014 null
2025-12-01 Learning Dexterous Manipulation Skills from Imperfect Simulations Elvis Hsieh et.al. 2512.02011 null
2025-12-01 Learning Visual Affordance from Audio Lidong Lu et.al. 2512.02005 null
2025-12-01 PAI-Bench: A Comprehensive Benchmark For Physical AI Fengzhe Zhou et.al. 2512.01989 null
2025-12-01 SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation Zisu Li et.al. 2512.01960 null
2025-12-01 GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment Haoyang He et.al. 2512.01952 null
2025-12-01 Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Zhongyu Yang et.al. 2512.01949 null
2025-12-01 COACH: Collaborative Agents for Contextual Highlighting - A Multi-Agent Framework for Sports Video Analysis Tsz-To Wong et.al. 2512.01853 null
2025-12-01 JPEGs Just Got Snipped: Croppable Signatures Against Deepfake Images Pericle Perazzo et.al. 2512.01845 null
2025-12-01 PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models Zeqing Wang et.al. 2512.01843 null
2025-12-01 Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling Meng Cao et.al. 2512.01821 null
2025-12-01 Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos Xavier Thomas et.al. 2512.01803 null
2025-12-01 Evaluating SAM2 for Video Semantic Segmentation Syed Hesham Syed Ariff et.al. 2512.01774 null
2025-12-01 VideoScoop: A Non-Traditional Domain-Independent Framework For Video Analysis Hafsa Billah et.al. 2512.01769 null
2025-12-01 StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos Daeun Lee et.al. 2512.01707 null
2025-12-01 DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models Patrick Kwon et.al. 2512.01686 null
2025-12-01 Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation Haodong Yan et.al. 2512.01677 null
2025-12-01 Exploring Scavenging Strategies and Cognitive Problem-Solving in Indian Free-Ranging Dogs Tuhin Subhra Pal et.al. 2512.01637 null
2025-11-30 CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions Simon Kohaut et.al. 2512.01095 null
2025-11-30 Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal Shenxi Liu et.al. 2512.01045 null
2025-11-30 VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference Jiaming Tang et.al. 2512.01031 null
2025-11-30 Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning Qi Wang et.al. 2512.00961 null
2025-11-30 Efficient and Scalable Monocular Human-Object Interaction Motion Reconstruction Boran Wen et.al. 2512.00960 null
2025-11-30 TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model Alireza Javanmardi et.al. 2512.00909 null
2025-11-30 PanFlow: Decoupled Motion Control for Panoramic Video Generation Cheng Zhang et.al. 2512.00832 null
2025-11-30 Seeing the Wind from a Falling Leaf Zhiyuan Gao et.al. 2512.00762 null
2025-11-29 Image Generation as a Visual Planner for Robotic Manipulation Ye Pang et.al. 2512.00532 null
2025-11-29 Structured Context Learning for Generic Event Boundary Detection Xin Gu et.al. 2512.00475 null
2025-11-29 What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards Minh-Quan Le et.al. 2512.00425 null
2025-11-29 SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control Ji Gan et.al. 2512.00413 null
2025-11-29 Low-Bitrate Video Compression through Semantic-Conditioned Diffusion Lingdong Wang et.al. 2512.00408 null
2025-11-29 MVAD : A Comprehensive Multimodal Video-Audio Dataset for AIGC Detection Mengxue Hu et.al. 2512.00336 null
2025-11-29 Comparative Evaluation of Generative AI Models for Chest Radiograph Report Generation in the Emergency Department Woo Hyeon Lim et.al. 2512.00271 null
2025-11-29 "Why the face?": Exploring Robot Error Detection Using Instrumented Bystander Reactions Maria Teresa Parreira et.al. 2512.00262 null
2025-11-29 Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views Kunwar Maheep Singh et.al. 2512.00255 null
2025-11-28 Chunking Strategies for Multimodal AI Systems Shashanka B R et.al. 2512.00185 null
2025-11-28 Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models Muhammad Maaz et.al. 2511.23478 null
2025-11-28 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Zhizhou Zhong et.al. 2511.23475 null
2025-11-28 Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model Junshu Tang et.al. 2511.23429 null
2025-11-28 DisMo: Disentangled Motion Representations for Open-World Motion Transfer Thomas Ressler-Antal et.al. 2511.23428 null
2025-11-28 Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach Haruki Sakajo et.al. 2511.23311 null
2025-11-28 Vision Bridge Transformer at Scale Zhenxiong Tan et.al. 2511.23199 null
2025-11-28 GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation Yuhao Wan et.al. 2511.23191 null
2025-11-28 Fast Multi-view Consistent 3D Editing with Video Priors Liyi Chen et.al. 2511.23172 null
2025-11-28 InstanceV: Instance-Level Video Generation Yuheng Chen et.al. 2511.23146 null
2025-11-28 DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Hongfei Zhang et.al. 2511.23127 null
2025-11-28 LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models Zuolei Li et.al. 2511.23034 null
2025-11-28 McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning Qiushi Yang et.al. 2511.22974 null
2025-11-28 BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Zeyu Zhang et.al. 2511.22973 null
2025-11-28 RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video Haiyang Mei et.al. 2511.22950 null
2025-11-28 One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfe Shijun Shi et.al. 2511.22940 null
2025-11-28 TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE Jiawen Wei et.al. 2511.22853 null
2025-11-28 Captain Safari: A World Engine Yu-Cheng Chou et.al. 2511.22815 null
2025-11-27 ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering Alberto Compagnoni et.al. 2511.22715 null
2025-11-27 Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration Mengyu Yang et.al. 2511.22533 null
2025-11-27 AI killed the video star. Audio-driven diffusion model for expressive talking head generation Baptiste Chopin et.al. 2511.22488 null
2025-11-26 TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos Seungjae Lee et.al. 2511.21690 null
2025-11-26 MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training Haotian Xue et.al. 2511.21592 null
2025-11-26 Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy Teng Hu et.al. 2511.21579 null
2025-11-26 Video Generation Models Are Good Latent Reward Models Xiaoyue Mi et.al. 2511.21541 null
2025-11-26 MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices Shuai Zhang et.al. 2511.21475 null
2025-11-26 Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning Xin Gu et.al. 2511.21375 null
2025-11-26 AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs Shuhan Xia et.al. 2511.21251 null
2025-11-26 AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control Xinyue Guo et.al. 2511.21146 null
2025-11-26 TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models Jiaming He et.al. 2511.21145 null
2025-11-26 Referring Video Object Segmentation with Cross-Modality Proxy Queries Baoli Sun et.al. 2511.21139 null
2025-11-26 Efficient Training for Human Video Generation with Entropy-Guided Prioritized Progressive Learning Changlin Li et.al. 2511.21136 null
2025-11-26 SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation Ziyi Chen et.al. 2511.21135 null
2025-11-26 CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion Dianbing Xi et.al. 2511.21129 null
2025-11-26 CartoonSing: Unifying Human and Nonhuman Timbres in Singing Generation Jionghao Han et.al. 2511.21045 null
2025-11-26 TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs Md Adnan Arefeen et.al. 2511.20965 null
2025-11-25 V $^{2}$ -SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence Jiancheng Pan et.al. 2511.20886 null
2025-11-25 Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries Sree Bhattacharyya et.al. 2511.20854 null
2025-11-25 MODEST: Multi-Optics Depth-of-Field Stereo Dataset Nisarg K. Trivedi et.al. 2511.20853 null
2025-11-25 Layer-Aware Video Composition via Split-then-Merge Ozgur Kara et.al. 2511.20809 null
2025-11-25 Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Hidir Yesiltepe et.al. 2511.20649 null
2025-11-25 Diverse Video Generation with Determinantal Point Process-Guided Policy Optimization Tahira Kazimi et.al. 2511.20647 null
2025-11-25 MotionV2V: Editing Motion in a Video Ryan Burgert et.al. 2511.20640 null
2025-11-25 iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation Zhoujie Fu et.al. 2511.20635 null
2025-11-25 MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models Chieh-Yun Chen et.al. 2511.20629 null
2025-11-25 ShapeGen: Towards High-Quality 3D Shape Synthesis Yangguang Li et.al. 2511.20624 null
2025-11-25 Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI Xinhao Liu et.al. 2511.20620 null
2025-11-25 E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems Rui Xue et.al. 2511.20564 null
2025-11-25 A Reason-then-Describe Instruction Interpreter for Controllable Video Generation Shengqiong Wu et.al. 2511.20563 null
2025-11-25 PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding Haoze Zhang et.al. 2511.20562 null
2025-11-25 STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow Jiatao Gu et.al. 2511.20462 null
2025-11-25 Block Cascading: Training Free Acceleration of Block-Causal Video Models Hmrishav Bandyopadhyay et.al. 2511.20426 null
2025-11-25 TReFT: Taming Rectified Flow Models For One-Step Image Translation Shengqian Li et.al. 2511.20307 null
2025-11-25 Back to the Feature: Explaining Video Classifiers with Video Counterfactual Explanations Chao Wang et.al. 2511.20295 null
2025-11-25 Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement Yang Liu et.al. 2511.20280 null
2025-11-25 Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation Daniel Kienzle et.al. 2511.20250 null
2025-11-25 SFA: Scan, Focus, and Amplify toward Guidance-aware Answering for Video TextVQA Haibin He et.al. 2511.20190 null
2025-11-25 Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis Mohammad Mahdi et.al. 2511.20186 null
2025-11-25 UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers Min Zhao et.al. 2511.20123 null
2025-11-25 Image Diffusion Models Exhibit Emergent Temporal Propagation in Videos Youngseo Kim et.al. 2511.19936 null
2025-11-24 VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection Qiang Wang et.al. 2511.19436 null
2025-11-24 Are Image-to-Video Models Good Zero-Shot Image Editors? Zechuan Zhang et.al. 2511.19435 null
2025-11-24 In-Video Instructions: Visual Signals as Generative Control Gongfan Fang et.al. 2511.19401 null
2025-11-24 Growing with the Generator: Self-paced GRPO for Video Generation Rui Li et.al. 2511.19356 null
2025-11-24 SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Jiaming Zhang et.al. 2511.19320 null
2025-11-24 SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis Lingwei Dang et.al. 2511.19319 null
2025-11-24 LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models Shuai Wang et.al. 2511.19261 null
2025-11-24 IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes Carl LindstrΓΆm et.al. 2511.19235 null
2025-11-24 Learning Plug-and-play Memory for Guiding Video Diffusion Models Selena Song et.al. 2511.19229 null
2025-11-24 AvatarBrush: Monocular Reconstruction of Gaussian Avatars with Intuitive Local Editing Mengtian Li et.al. 2511.19189 null
2025-11-24 RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning Deyi Ji et.al. 2511.19168 null
2025-11-24 HABIT: Human Action Benchmark for Interactive Traffic in CARLA Mohan Ramesh et.al. 2511.19109 null
2025-11-24 Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation Ruojun Xu et.al. 2511.19049 null
2025-11-24 View-Consistent Diffusion Representations for 3D-Consistent Video Generation Duolikun Danier et.al. 2511.18991 null
2025-11-24 Eevee: Towards Close-up High-resolution Video-based Virtual Try-on Jianhao Zeng et.al. 2511.18957 null
2025-11-24 One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control Zhenxing Mi et.al. 2511.18922 null
2025-11-24 EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models Wenhao Xu et.al. 2511.18920 null
2025-11-24 Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation Ruiying Liu et.al. 2511.18919 null
2025-11-24 MagicWorld: Interactive Geometry-driven Video World Exploration Guangyuan Li et.al. 2511.18886 null
2025-11-24 HunyuanVideo 1.5 Technical Report Bing Wu et.al. 2511.18870 null
2025-11-23 ViMix-14M: A Curated Multi-Source Video-Text Dataset with Long-Form, High-Quality Captions and Crawl-Free Access Timing Yang et.al. 2511.18382 null
2025-11-23 MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models Xiyang Wu et.al. 2511.18373 null
2025-11-23 Alias-free 4D Gaussian Splatting Zilong Chen et.al. 2511.18367 null
2025-11-23 TRANSPORTER: Transferring Visual Semantics from VLM Manifolds Alexandros Stergiou et.al. 2511.18359 null
2025-11-23 MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference Zitong Xu et.al. 2511.18352 null
2025-11-23 FlowPortal: Residual-Corrected Flow for Training-Free Video Relighting and Background Replacement Wenshuo Gao et.al. 2511.18346 null
2025-11-23 AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert Yuting Gao et.al. 2511.18314 null
2025-11-23 Point-to-Point: Sparse Motion Guidance for Controllable Video Editing Yeji Song et.al. 2511.18277 null
2025-11-23 SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors Ruijie Fan et.al. 2511.18264 null
2025-11-23 EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning Yogesh Kulkarni et.al. 2511.18242 null
2025-11-22 MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning Yi-Yang Zhang et.al. 2511.18209 null
2025-11-22 InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity Haoming Wang et.al. 2511.18200 null
2025-11-22 EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses Enrico Pallotta et.al. 2511.18173 null
2025-11-22 Video4Edit: Viewing Image Editing as a Degenerate Temporal Process Xiaofan Li et.al. 2511.18131 null
2025-11-22 Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning Xiaohong Liu et.al. 2511.18104 null
2025-11-22 Spotlight: Identifying and Localizing Video Generation Errors Using VLMs Aditya Chinchure et.al. 2511.18102 null
2025-11-22 Hybrid Event Frame Sensors: Modeling, Calibration, and Simulation Yunfan Lu et.al. 2511.18037 null
2025-11-22 Diverse Instance Generation via Diffusion Models for Enhanced Few-Shot Object Detection in Remote Sensing Images Yanxing Liu et.al. 2511.18031 null
2025-11-22 Plan-X: Instruct Video Generation via Semantic Planning Lun Huang et.al. 2511.17986 null
2025-11-22 VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment Ziheng Jia et.al. 2511.17962 null
2025-11-21 EvDiff: High Quality Video with an Event Camera Weilun Li et.al. 2511.17492 null
2025-11-21 Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination Yolo Yunlong Tang et.al. 2511.17490 null
2025-11-21 Counterfactual World Models via Digital Twin-conditioned Video Diffusion Yiqing Shen et.al. 2511.17481 null
2025-11-21 Planning with Sketch-Guided Verification for Physics-Aware Video Generation Yidong Huang et.al. 2511.17450 null
2025-11-21 Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal Xiaolong Qian et.al. 2511.17353 null
2025-11-21 Loomis Painter: Reconstructing the Painting Process Markus Pobitzer et.al. 2511.17344 null
2025-11-21 Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM Chiori Hori et.al. 2511.17335 null
2025-11-21 FORWARD: Dataset of a forwarder operating in rough terrain Mikael LundbΓ€ck et.al. 2511.17318 null
2025-11-21 PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention Yipeng Chen et.al. 2511.17185 null
2025-11-21 Investigating self-supervised representations for audio-visual deepfake detection Dragos-Alexandru Boldisor et.al. 2511.17181 null
2025-11-21 Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models He Huang et.al. 2511.17094 null
2025-11-21 H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation Yijie Zhu et.al. 2511.17079 null
2025-11-21 MatPedia: A Universal Generative Foundation for High-Fidelity Material Synthesis Di Luo et.al. 2511.16957 null
2025-11-21 Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models Dailan He et.al. 2511.16955 null
2025-11-21 Point-Supervised Facial Expression Spotting with Gaussian-Based Instance-Adaptive Intensity Modeling Yicheng Deng et.al. 2511.16952 null
2025-11-21 FingerCap: Fine-grained Finger-level Hand Motion Captioning Xin Shen et.al. 2511.16951 null
2025-11-21 R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios Lu Zhu et.al. 2511.16901 null
2025-11-21 Avoiding Quality Saturation in UGC Compression Using Denoised References Xin Xiong et.al. 2511.16876 null
2025-11-20 Vorion: A RISC-V GPU with Hardware-Accelerated 3D Gaussian Rendering and Training Yipeng Wang et.al. 2511.16831 null
2025-11-20 Generative Augmented Reality: Paradigms, Technologies, and Future Applications Chen Liang et.al. 2511.16783 null
2025-11-20 Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO Junhao Cheng et.al. 2511.16669 null
2025-11-20 V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Yang Luo et.al. 2511.16668 null
2025-11-20 SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking Haofeng Liu et.al. 2511.16618 null
2025-11-20 YOWO: You Only Walk Once to Jointly Map An Indoor Scene and Register Ceiling-mounted Cameras Fan Yang et.al. 2511.16521 null
2025-11-20 An analytical and experimental study of the energy transition discourse on YouTube Aleix Bassolas et.al. 2511.16497 null
2025-11-20 Flow and Depth Assisted Video Prediction with Latent Transformer Eliyas Suleyman et.al. 2511.16484 null
2025-11-20 PIPHEN: Physical Interaction Prediction with Hamiltonian Energy Networks Kewei Chen et.al. 2511.16200 null
2025-11-20 FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos Jeremie Ochin et.al. 2511.16183 null
2025-11-20 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Yi Yang et.al. 2511.16175 null
2025-11-20 Video2Layout: Recall and Reconstruct Metric-Grounded Cognitive Map for Spatial Reasoning Yibin Huang et.al. 2511.16160 null
2025-11-20 MagBotSim: Physics-Based Simulation and Reinforcement Learning Environments for Magnetic Robotics Lara Bergmann et.al. 2511.16158 null
2025-11-20 Degradation-Aware Hierarchical Termination for Blind Quality Enhancement of Compressed Video Li Yu et.al. 2511.16137 null
2025-11-20 VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation Chenyang Wu et.al. 2511.16124 null
2025-11-20 Decoupling Complexity from Scale in Latent Diffusion Model Tianxiong Zhong et.al. 2511.16117 null
2025-11-20 VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning Zishan Xu et.al. 2511.16077 null
2025-11-20 Panel-by-Panel Souls: A Performative Workflow for Expressive Faces in AI-Assisted Manga Creation Qing Zhang et.al. 2511.16038 null
2025-11-20 Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion Dingkun Zhou et.al. 2511.16020 null
2025-11-20 Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click Raphael Ruschel et.al. 2511.15948 null
2025-11-20 Automated Interpretable 2D Video Extraction from 3D Echocardiography Milos Vukadinovic et.al. 2511.15946 null
2025-11-19 RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification Meilong Xu et.al. 2511.15923 null
2025-11-19 First Frame Is the Place to Go for Video Content Customization Jingxi Chen et.al. 2511.15700 null
2025-11-19 Joint Semantic-Channel Coding and Modulation for Token Communications Jingkai Ying et.al. 2511.15699 null
2025-11-19 The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification Dante Francisco Wasmuht et.al. 2511.15622 null
2025-11-19 Multimodal Evaluation of Russian-language Architectures Artem Chervyakov et.al. 2511.15552 null
2025-11-19 Deep Learning for Accurate Vision-based Catch Composition in Tropical Tuna Purse Seiners Xabier Lekunberri et.al. 2511.15468 null
2025-11-19 ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation Simon Boeder et.al. 2511.15396 null
2025-11-19 PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback Sirui Chen et.al. 2511.15253 null
2025-11-19 Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation Firdavs Nasriddinov et.al. 2511.15159 null
2025-11-19 Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Cheng Yang et.al. 2511.15065 null
2025-11-19 Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Vladimir Arkhipkin et.al. 2511.14993 null
2025-11-18 SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification Xiangyu Li et.al. 2511.14977 null
2025-11-18 RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems Jaro Meyer et.al. 2511.14948 null
2025-11-18 CPSL: Representing Volumetric Video via Content-Promoted Scene Layers Kaiyuan Hu et.al. 2511.14927 null
2025-11-18 GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis Antonio Ruiz et.al. 2511.14884 null
2025-11-18 Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising Yifan Wang et.al. 2511.14719 null
2025-11-18 FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation Yunfeng Wu et.al. 2511.14712 null
2025-11-18 ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection Mohammad Romani et.al. 2511.14554 null
2025-11-18 DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation Xiangchen Yin et.al. 2511.14530 null
2025-11-18 FlowRoI A Fast Optical Flow Driven Region of Interest Extraction Framework for High-Throughput Image Compression in Immune Cell Migration Analysis Xiaowei Xu et.al. 2511.14419 null
2025-11-18 ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Junfu Pu et.al. 2511.14349 null
2025-11-18 Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs Yiyi Miao et.al. 2511.14315 null
2025-11-18 Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction Learning Rui Liu et.al. 2511.14249 null
2025-11-18 Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Zhuo Li et.al. 2511.14178 null
2025-11-18 Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models Hao Zhen et.al. 2511.14120 null
2025-11-18 Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services Liuyi Jin et.al. 2511.14119 null
2025-11-18 A Patient-Independent Neonatal Seizure Prediction Model Using Reduced Montage EEG and ECG Sithmini Ranasingha et.al. 2511.14110 null
2025-11-18 Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations Yiqing Shen et.al. 2511.14100 null
2025-11-18 Privis: Towards Content-Aware Secure Volumetric Video Delivery Kaiyuan Hu et.al. 2511.14005 null
2025-11-17 Learning Skill-Attributes for Transferable Assessment in Video Kumar Ashutosh et.al. 2511.13993 null
2025-11-17 PoCGM: Poisson-Conditioned Generative Model for Sparse-View CT Reconstruction Changsheng Fang et.al. 2511.13967 null
2025-11-17 SAE-MCVT: A Real-Time and Scalable Multi-Camera Vehicle Tracking Framework Powered by Edge Computing Yuqiang Lin et.al. 2511.13904 null
2025-11-17 Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors Mert Onur Cakiroglu et.al. 2511.13897 null
2025-11-17 Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Xinxin Liu et.al. 2511.13853 null
2025-11-17 Segment Anything Across Shots: A Method and Benchmark Hengrui Hu et.al. 2511.13715 null
2025-11-17 UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity Junwei Yu et.al. 2511.13714 null
2025-11-17 TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Harold Haodong Chen et.al. 2511.13704 null
2025-11-17 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting Jiangnan Ye et.al. 2511.13684 null
2025-11-17 CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding Shrenik Patel et.al. 2511.13644 null
2025-11-17 Computer Vision based group activity detection and action spotting Narthana Sivalingam et.al. 2511.13315 null
2025-11-17 CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving Enhui Ma et.al. 2511.13297 null
2025-11-17 FoleyBench: A Benchmark For Video-to-Audio Models Satvik Dixit et.al. 2511.13219 null
2025-11-17 Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification Rifen Lin et.al. 2511.13150 null
2025-11-17 VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language Zonghao Ying et.al. 2511.13127 null
2025-11-17 CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model Yuqi Zhang et.al. 2511.13121 null
2025-11-17 Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining Zhaocheng Yu et.al. 2511.13113 null
2025-11-17 Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention Taiye Chen et.al. 2511.12940 null
2025-11-17 Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models Guoyan Wang et.al. 2511.12937 null
2025-11-17 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos Dianbing Xi et.al. 2511.12935 null
2025-11-17 Generative Photographic Control for Scene-Consistent Video Cinematic Editing Huiqiang Sun et.al. 2511.12921 null
2025-11-17 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Junyi Ma et.al. 2511.12878 null
2025-11-17 Video Finetuning Improves Reasoning Between Frames Ruiqi Yang et.al. 2511.12868 null
2025-11-16 SAGA: Source Attribution of Generative AI Videos Rohit Kundu et.al. 2511.12834 null
2025-11-16 Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis Zeqin Yu et.al. 2511.12658 null
2025-11-16 Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Yunxin Li et.al. 2511.12609 null
2025-11-16 TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction Yukuo Ma et.al. 2511.12578 null
2025-11-16 ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding Yuan Zhou et.al. 2511.12530 null
2025-11-16 DualGR: Generative Retrieval with Long and Short-Term Interests Modeling Zhongchao Yi et.al. 2511.12518 null
2025-11-16 DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection Jialiang Shen et.al. 2511.12511 null
2025-11-16 VLA-R: Vision-Language Action Retrieval toward Open-World End-to-End Autonomous Driving Hyunki Seong et.al. 2511.12405 null
2025-11-16 SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs Shail Desai et.al. 2511.12404 null
2025-11-15 Fast Reasoning Segmentation for Images and Videos Yiqing Shen et.al. 2511.12368 null
2025-11-15 Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning Yiqing Shen et.al. 2511.12365 null
2025-11-15 AURA: Development and Validation of an Augmented Unplanned Removal Alert System using Synthetic ICU Videos Junhyuk Seo et.al. 2511.12241 null
2025-11-15 Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System Aditi Bhalla et.al. 2511.12196 null
2025-11-15 Towards Obstacle-Avoiding Control of Planar Snake Robots Exploring Neuro-Evolution of Augmenting Topologies Advik Sinha et.al. 2511.12148 null
2025-11-15 Adaptive Begin-of-Video Tokens for Autoregressive Video Diffusion Models Tianle Cheng et.al. 2511.12099 null
2025-11-15 Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound Dengming Zhang et.al. 2511.12077 null
2025-11-15 ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation Jiahui Sun et.al. 2511.12072 null
2025-11-15 PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling Sijie Wang et.al. 2511.12056 null
2025-11-15 TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space Wenxuan Miao et.al. 2511.12035 null
2025-11-14 Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Siyou Li et.al. 2511.11910 null
2025-11-14 KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference Huawei Zhang et.al. 2511.11907 null
2025-11-14 Scalable Policy Evaluation with Video World Models Wei-Cheng Tseng et.al. 2511.11520 null
2025-11-14 Disentangling Emotional Bases and Transient Fluctuations: A Low-Rank Sparse Decomposition Approach for Video Affective Analysis Feng-Qi Cui et.al. 2511.11406 null
2025-11-14 YCB-Ev SD: Synthetic event-vision dataset for 6DoF object pose estimation Pavel Rojtberg et.al. 2511.11344 null
2025-11-14 RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting Ruocheng Wu et.al. 2511.11213 null
2025-11-14 VIDEOP2R: Video Understanding from Perception to Reasoning Yifan Jiang et.al. 2511.11113 null
2025-11-14 LiteAttention: A Temporal Sparse Attention for Diffusion Transformers Dor Shmilovich et.al. 2511.11062 null
2025-11-14 EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation Zongyang Qiu et.al. 2511.11002 null
2025-11-14 Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment Wenbin Bai et.al. 2511.10987 null
2025-11-14 Text-guided Weakly Supervised Framework for Dynamic Facial Expression Recognition Gunho Jung et.al. 2511.10958 null
2025-11-14 Language-Guided Graph Representation Learning for Video Summarization Wenrui Li et.al. 2511.10953 null
2025-11-14 Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling Seoik Jung et.al. 2511.10866 null
2025-11-13 Expert Consensus-based Video-Based Assessment Tool for Workflow Analysis in Minimally Invasive Colorectal Surgery: Development and Validation of ColoWorkflow Pooja P Jain et.al. 2511.10766 null
2025-11-13 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals Shruti Singh Baghel et.al. 2511.10615 null
2025-11-13 TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding Jinxuan Li et.al. 2511.10241 null
2025-11-13 Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization Ashutosh Anshul et.al. 2511.10212 null
2025-11-13 SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition Qilang Ye et.al. 2511.10091 null
2025-11-13 When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? Qilang Ye et.al. 2511.10059 null
2025-11-13 Reinforcing Trustworthiness in Multimodal Emotional Support Systems Huy M. Le et.al. 2511.10011 null
2025-11-13 AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting Aymen Mir et.al. 2511.09827 null
2025-11-12 Density Estimation and Crowd Counting Balachandra Devarangadi Sunil et.al. 2511.09723 null
2025-11-12 PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild Felix B. Mueller et.al. 2511.09675 null
2025-11-12 TempRetinex: Retinex-based Unsupervised Enhancement for Low-light Video Under Diverse Lighting Conditions Yini Li et.al. 2511.09609 null
2025-11-12 Bridging the Data Gap: Spatially Conditioned Diffusion Model for Anomaly Generation in Photovoltaic Electroluminescence Images Shiva Hanifi et.al. 2511.09604 null
2025-11-12 Diffusion-Based Quality Control of Medical Image Segmentations across Organs Vincenzo MarcianΓ² et.al. 2511.09588 null
2025-11-12 Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation Xinyi Tong et.al. 2511.09585 null
2025-11-12 SPIDER: Scalable Physics-Informed Dexterous Retargeting Chaoyi Pan et.al. 2511.09484 null
2025-11-12 MCAD: Multimodal Context-Aware Audio Description Generation For Soccer Lipisha Chaudhary et.al. 2511.09448 null
2025-11-12 A cross-modal pre-training framework with video data for improving performance and generalization of distributed acoustic sensing Junyi Duan et.al. 2511.09342 null
2025-11-12 GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow Rui Wan et.al. 2511.09272 null
2025-11-12 Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots Yuxi Wei et.al. 2511.09241 null
2025-11-12 AILINKPREVIEWER: Enhancing Code Reviews with LLM-Powered Link Previews Panya Trakoolgerntong et.al. 2511.09223 null
2025-11-12 DBINDS -- Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos? Yanlin Wu et.al. 2511.09184 null
2025-11-10 Robot Learning from a Physical World Model Jiageng Mao et.al. 2511.07416 null
2025-11-10 StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation Tianrui Feng et.al. 2511.07399 null
2025-11-10 Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation Jie Du et.al. 2511.01450 null
2025-11-09 GenAI vs. Human Creators: Procurement Mechanism Design in Two-/Three-Layer Markets Rui Ai et.al. 2511.06559 null
2025-11-09 RelightMaster: Precise Video Relighting with Multi-plane Light Images Weikang Bian et.al. 2511.06271 null
2025-11-08 Neodragon: Mobile Video Generation using Diffusion Transformer Animesh Karnewar et.al. 2511.06055 null
2025-11-07 THEval. Evaluation Framework for Talking Head Video Generation Nabyl Quignon et.al. 2511.04520 null
2025-11-06 InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation Jinlai Liu et.al. 2511.04675 null
2025-11-06 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Jingqi Tong et.al. 2511.04570 null
2025-11-06 RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation Xiangjun Zhang et.al. 2511.04317 null
2025-11-06 PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection Peiyao Wang et.al. 2511.03997 null
2025-11-05 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Guozhen Zhang et.al. 2511.03334 null
2025-11-05 Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising Shuangquan Lyu et.al. 2511.03272 null
2025-11-04 Video Text Preservation with Synthetic Text-Rich Videos Ziyang Liu et.al. 2511.05573 null
2025-11-04 ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation Panwang Pan et.al. 2511.00511 null
2025-11-03 How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment Zhen Chen et.al. 2511.01775 null
2025-11-03 Driving scenario generation and evaluation using a structured layer representation and foundational models Arthur Hubert et.al. 2511.01541 null
2025-11-03 Towards One-step Causal Video Generation via Adversarial Self-Distillation Yongqi Yang et.al. 2511.01419 null
2025-11-03 MotionStream: Real-Time Video Generation with Interactive Motion Controls Joonghyuk Shin et.al. 2511.01266 null
2025-11-01 Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models Panwang Pan et.al. 2511.00503 null
2025-10-31 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Xiangyu Fan et.al. 2510.27684 null
2025-10-31 Fine-Tuning Open Video Generators for Cinematic Scene Synthesis: A Small-Data Pipeline with LoRA and Wan2.1 I2V Meftun Akarsu et.al. 2510.27364 null
2025-10-31 DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model Yucheng Xing et.al. 2510.27169 null
2025-10-31 Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Ziyu Guo et.al. 2510.26802 null
2025-10-30 AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency Piyushkumar Patel et.al. 2511.00107 null
2025-10-30 LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation Huanlin Gao et.al. 2511.00090 null
2025-10-30 SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting Dongyue Lu et.al. 2510.26796 null
2025-10-30 The Quest for Generalizable Motion Generation: Data, Model, and Evaluation Jing Lin et.al. 2510.26794 null
2025-10-30 Co-Evolving Latent Action World Models Yucen Wang et.al. 2510.26433 null
2025-10-30 LoCoT2V-Bench: A Benchmark for Long-Form and Complex Text-to-Video Generation Xiangqing Zheng et.al. 2510.26412 null
2025-10-29 VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning Baolu Li et.al. 2510.25772 null
2025-10-29 VC4VG: Optimizing Video Captions for Text-to-Video Generation Yang Du et.al. 2510.24134 null
2025-10-28 World Simulation with Video Foundation Models for Physical AI NVIDIA et.al. 2511.00062 null
2025-10-28 VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos Qiucheng Wu et.al. 2510.24904 null
2025-10-28 Generative View Stitching Chonghyuk Song et.al. 2510.24718 null
2025-10-28 Uniform Discrete Diffusion with Metric Path for Video Generation Haoge Deng et.al. 2510.24717 null
2025-10-28 MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration Junhyuk So et.al. 2510.24211 null
2025-10-28 LongCat-Video Technical Report Meituan LongCat Team et.al. 2510.22200 null
2025-10-27 CoMo: Compositional Motion Customization for Text-to-Video Generation Youcan Xu et.al. 2510.23007 null
2025-10-27 Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method Bohan Li et.al. 2510.22973 null
2025-10-26 MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control Fatemeh Nazarieh et.al. 2510.22810 null
2025-10-25 Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration Zheng Wei et.al. 2510.22431 null
2025-10-24 Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising Mateo Clemente et.al. 2510.21991 null
2025-10-24 BachVid: Training-Free Video Generation with Consistent Background and Character Han Yan et.al. 2510.21696 null
2025-10-24 Epipolar Geometry Improves Video Generation Models Orest Kupyn et.al. 2510.21615 null
2025-10-24 OmniNWM: Omniscient Driving Navigation World Models Bohan Li et.al. 2510.18313 null
2025-10-23 Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications Shamim Yazdani et.al. 2510.21887 null
2025-10-23 Video-As-Prompt: Unified Semantic Control for Video Generation Yuxuan Bian et.al. 2510.20888 null
2025-10-23 Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers Dean L Slack et.al. 2510.20807 null
2025-10-23 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling Bingjie Gao et.al. 2510.20206 null
2025-10-23 Evaluating Video Models as Simulators of Multi-Person Pedestrian Trajectories Aaron Appelle et.al. 2510.20182 null
2025-10-23 Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning Takehiro Aoshima et.al. 2510.19193 null
2025-10-23 A Renaissance of Explicit Motion Information Mining from Transformers for Action Recognition Peiqin Zhuang et.al. 2510.18705 null
2025-10-22 Improving the Physics of Video Generation with VJEPA-2 Reward Signal Jianhao Yuan et.al. 2510.21840 null
2025-10-22 A new wave of vehicle insurance fraud fueled by generative AI Amir Hever et.al. 2510.19957 null
2025-10-22 PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis Qing Mao et.al. 2510.19527 null
2025-10-22 GigaBrain-0: A World Model-Powered Vision-Language-Action Model GigaBrain Team et.al. 2510.19430 null
2025-10-22 FeatureFool: Zero-Query Fooling of Video Models via Feature Map Duoxun Tang et.al. 2510.18362 null
2025-10-22 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models Yongshun Zhang et.al. 2510.17519 null
2025-10-22 ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints Meiqi Wu et.al. 2510.14847 null
2025-10-21 MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models Aritra Bhowmik et.al. 2510.19022 null
2025-10-21 UltraGen: High-Resolution Video Generation with Hierarchical Attention Teng Hu et.al. 2510.18775 null
2025-10-21 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Weinan Jia et.al. 2510.18692 null
2025-10-21 Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model Zhenxing Zhang et.al. 2510.18573 null
2025-10-20 World-in-World: World Models in a Closed-Loop World Jiahan Zhang et.al. 2510.18135 null
2025-10-20 Demystifying Transition Matching: When and Why It Can Beat Flow Matching Jaihoon Kim et.al. 2510.17991 null
2025-10-20 From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models Zefan Cai et.al. 2510.17247 null
2025-10-20 DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Weijie Wang et.al. 2510.15264 null
2025-10-20 Identity-Preserving Image-to-Video Generation via Reward-Guided Optimization Liao Shen et.al. 2510.14255 null
2025-10-19 An empirical study of the effect of video encoders on Temporal Video Grounding Ignacio M. De la Jara et.al. 2510.17007 null
2025-10-19 From Mannequin to Human: A Pose-Aware and Identity-Preserving Video Generation Framework for Lifelike Clothing Display Xiangyu Mu et.al. 2510.16833 null
2025-10-19 STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding Zhifei Chen et.al. 2510.14588 null
2025-10-17 VISTA: A Test-Time Self-Improving Video Generation Agent Do Xuan Long et.al. 2510.15831 null
2025-10-17 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Qingyan Bai et.al. 2510.15742 null
2025-10-17 Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning Xiangyu Meng et.al. 2510.14256 null
2025-10-17 Ctrl-VI: Controllable Video Synthesis via Variational Inference Haoyi Duan et.al. 2510.07670 null
2025-10-16 TGT: Text-Grounded Trajectories for Locally Controlled Video Generation Guofeng Zhang et.al. 2510.15104 null
2025-10-16 RealDPO: Real or Not Real, that is the Preference Guo Cheng et.al. 2510.14955 null
2025-10-16 DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation Yu Zhou et.al. 2510.14949 null
2025-10-16 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation JoungBin Lee et.al. 2510.14945 null
2025-10-16 In-Context Learning with Unpaired Clips for Instruction-based Video Editing Xinyao Liao et.al. 2510.14648 null
2025-10-16 Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures Yuancheng Xu et.al. 2510.14179 null
2025-10-15 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Sihui Ji et.al. 2510.13809 null
2025-10-15 CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas Zian Li et.al. 2510.13669 null
2025-10-15 VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator Hyojun Go et.al. 2510.13454 null
2025-10-15 Counting Hallucinations in Diffusion Models Shuai Fu et.al. 2510.13080 null
2025-10-14 SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models Zhengxu Tang et.al. 2510.13042 null
2025-10-14 MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars Felix Taubner et.al. 2510.12785 null
2025-10-14 Time-Correlated Video Bridge Matching Viacheslav Vasilev et.al. 2510.12453 null
2025-10-14 BIGFix: Bidirectional Image Generation with Token Fixing Victor Besnier et.al. 2510.12231 null
2025-10-14 Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback Xingpei Ma et.al. 2510.12089 null
2025-10-13 Point Prompting: Counterfactual Tracking with Video Diffusion Models Ayush Shrivastava et.al. 2510.11715 null
2025-10-13 MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps Jiahui Lei et.al. 2510.11107 null
2025-10-13 Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization Shuo Xing et.al. 2510.08789 null
2025-10-12 AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Yu Li et.al. 2510.10670 null
2025-10-12 DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis Peiyin Chen et.al. 2510.10650 null
2025-10-11 EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection Huaizhi Qu et.al. 2510.13652 null
2025-10-11 MultiCOIN: Multi-Modal COntrollable Video INbetweening Maham Tanveer et.al. 2510.08561 null
2025-10-10 Stable Video Infinity: Infinite-Length Video Generation with Error Recycling Wuyang Li et.al. 2510.09212 null
2025-10-10 MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling Qian Wang et.al. 2508.08487 null
2025-10-09 SkipSR: Faster Super Resolution with Token Skipping Rohan Choudhury et.al. 2510.08799 null
2025-10-09 NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos Hongyu Li et.al. 2510.08568 null
2025-10-09 VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Minghong Cai et.al. 2510.08555 null
2025-10-09 X2Video: Adapting Diffusion Models for Multimodal Controllable Neural Video Rendering Zhitong Huang et.al. 2510.08530 null
2025-10-09 FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control Zhiyuan Zhang et.al. 2510.08527 null
2025-10-09 UniVideo: Unified Understanding, Generation, and Editing for Videos Cong Wei et.al. 2510.08377 null
2025-10-09 LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation Yushi Huang et.al. 2510.08318 null
2025-10-09 UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution Shian Du et.al. 2510.08143 null
2025-10-09 Real-Time Motion-Controllable Autoregressive Video Diffusion Kesen Zhao et.al. 2510.08131 null
2025-10-09 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving Tianrui Zhang et.al. 2510.07944 null
2025-10-09 TTOM: Test-Time Optimization and Memorization for Compositional Video Generation Leigang Qu et.al. 2510.07940 null
2025-10-09 Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection Yanjie Pan et.al. 2510.07654 null
2025-10-09 Paper2Video: Automatic Video Generation from Scientific Papers Zeyu Zhu et.al. 2510.05096 null
2025-10-08 TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility Saman Motamed et.al. 2510.07550 null
2025-10-08 DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis Nithin C. Babu et.al. 2510.07441 null
2025-10-08 WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation Zezhong Qian et.al. 2510.07313 null
2025-10-08 MATRIX: Mask Track Alignment for Interaction-aware Video Generation Siyoon Jin et.al. 2510.07310 null
2025-10-08 TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation Jiaben Chen et.al. 2510.07249 null
2025-10-08 MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis Yihao Zhi et.al. 2510.07190 null
2025-10-08 Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report Riccardo Mereu et.al. 2510.07092 null
2025-10-08 Addressing the ID-Matching Challenge in Long Video Captioning Zhantao Yang et.al. 2510.06973 null
2025-10-07 Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models Jiahao Wang et.al. 2510.06209 null
2025-10-07 When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach Daniel GonzΓ‘lbez-Biosca et.al. 2510.05661 null
2025-10-06 LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation Yang Xiao et.al. 2510.05367 null
2025-10-06 VChain: Chain-of-Visual-Thought for Reasoning in Video Generation Ziqi Huang et.al. 2510.05094 null
2025-10-06 Character Mixing for Video Generation Tingting Liao et.al. 2510.05093 null
2025-10-06 Bridging Text and Video Generation: A Survey Nilay Kumar et.al. 2510.04999 null
2025-10-06 What Drives Compositional Generalization in Visual Generative Models? Karim Farid et.al. 2510.03075 null
2025-10-05 ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Jay Zhangjie Wu et.al. 2510.04290 null
2025-10-05 Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers Shikang Zheng et.al. 2510.04188 null
2025-10-04 Generating Human Motion Videos using a Cascaded Text-to-Video Framework Hyelin Nam et.al. 2510.03909 null
2025-10-03 Mask2IV: Interaction-Centric Video Generation via Mask Trajectories Gen Li et.al. 2510.03135 null
2025-10-03 Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction Kaisi Guan et.al. 2510.03117 null
2025-10-03 When and Where do Events Switch in Multi-Event Video Generation? Ruotong Liao et.al. 2510.03049 null
2025-10-03 Pack and Force Your Memory: Long-form and Consistent Video Generation Xiaofei Wu et.al. 2510.01784 null
2025-10-02 Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation Beijia Lu et.al. 2510.02617 null
2025-10-02 How Confident are Video Models? Empowering Video Models to Express their Uncertainty Zhiting Mei et.al. 2510.02571 null
2025-10-02 Inferring Dynamic Physical Properties from Video Foundation Models Guanqi Zhan et.al. 2510.02311 null
2025-10-02 MultiModal Action Conditioned Video Generation Yichen Li et.al. 2510.02287 null
2025-10-02 Learning to Generate Object Interactions with Physics-Guided Video Diffusion David Romero et.al. 2510.02284 null
2025-10-02 Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Justin Cui et.al. 2510.02283 null
2025-10-02 TempoControl: Temporal Attention Guidance for Text-to-Video Models Shira Schiber et.al. 2510.02226 null
2025-10-02 Multi-marginal temporal SchrΓΆdinger Bridge Matching for video generation from unpaired data Thomas Gravier et.al. 2510.01894 null
2025-10-01 IMAGEdit: Let Any Subject Transform Fei Shen et.al. 2510.01186 null
2025-10-01 EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory Jiahao Wang et.al. 2510.01183 null
2025-10-01 Code2Video: A Code-centric Paradigm for Educational Video Generation Yanzhe Chen et.al. 2510.01174 null
2025-10-01 From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation Fan Yang et.al. 2510.00806 null
2025-10-01 Arbitrary Generative Video Interpolation Guozhen Zhang et.al. 2510.00578 null
2025-10-01 BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Zhaoyang Li et.al. 2510.00438 null
2025-09-30 Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Chetwin Low et.al. 2510.01284 null
2025-09-30 Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation Agneet Chatterjee et.al. 2509.26555 null
2025-09-30 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation Chenhui Zhu et.al. 2509.26391 null
2025-09-30 PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution Shian Du et.al. 2509.26025 null
2025-09-30 Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel Haotian Dong et.al. 2509.24979 null
2025-09-30 QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification Weilun Feng et.al. 2509.23681 null
2025-09-29 FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation Yunyang Ge et.al. 2509.25187 null
2025-09-29 DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder Junyu Chen et.al. 2509.25182 null
2025-09-29 Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Kunhao Liu et.al. 2509.25161 null
2025-09-29 PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion Yuyang Yin et.al. 2509.24997 null
2025-09-29 SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation Shuang Liang et.al. 2509.24980 null
2025-09-29 Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer Mohsen Ghafoorian et.al. 2509.24899 null
2025-09-29 Enhancing Physical Plausibility in Video Generation by Reasoning the Implausibility Yutong Hao et.al. 2509.24702 null
2025-09-29 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Junsong Chen et.al. 2509.24695 null
2025-09-29 Learning Object-Centric Representations Based on Slots in Real World Scenarios Adil Kaan Akan et.al. 2509.24652 null
2025-09-29 UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark Ailing Zhang et.al. 2509.24427 null
2025-09-29 CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers Kai Liu et.al. 2509.24416 null
2025-09-29 NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis Yixuan Ren et.al. 2509.24353 null
2025-09-29 FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation Seungwook Kim et.al. 2509.24241 null
2025-09-28 Autoregressive Video Generation beyond Next Frames Prediction Sucheng Ren et.al. 2509.24081 null
2025-09-28 SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Jintao Zhang et.al. 2509.24006 null
2025-09-28 VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement Shulian Zhang et.al. 2509.23584 null
2025-09-27 Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing Rohit Chowdhury et.al. 2509.23279 null
2025-09-27 Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction Bolin Chen et.al. 2509.23169 null
2025-09-26 Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery Jiayin Liu et.al. 2509.23003 null
2025-09-26 VideoScore2: Think before You Score in Generative Video Evaluation Xuan He et.al. 2509.22799 null
2025-09-26 Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs Xingyu Fu et.al. 2509.22646 null
2025-09-26 LongLive: Real-time Interactive Long Video Generation Shuai Yang et.al. 2509.22622 null
2025-09-26 EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation Yuan Xu et.al. 2509.22578 null
2025-09-26 EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Zhehao Dong et.al. 2509.22407 null
2025-09-26 Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers Jibin Song et.al. 2509.21893 null
2025-09-26 DiTraj: training-free trajectory control for video diffusion transformer Cheng Lei et.al. 2509.21839 null
2025-09-26 MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation Yu Shang et.al. 2509.21797 null
2025-09-26 LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE Yu Shang et.al. 2509.21790 null
2025-09-26 UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Lan Chen et.al. 2509.21760 null
2025-09-25 FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction Yixiang Dai et.al. 2509.21657 null
2025-09-25 What Happens Next? Anticipating Future Motion by Generating Point Trajectories Gabrijel Boduljak et.al. 2509.21592 null
2025-09-25 ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering Weikai Lin et.al. 2509.21541 null
2025-09-25 NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics Yu Yuan et.al. 2509.21309 null
2025-09-25 MotionFlow:Learning Implicit Motion Flow for Complex Camera Trajectory Control in Video Generation Guojun Lei et.al. 2509.21119 null
2025-09-25 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Xuan Ju et.al. 2509.20360 null
2025-09-24 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation Chen Wang et.al. 2509.20358 null
2025-09-24 4D Driving Scene Generation With Stereo Forcing Hao Lu et.al. 2509.20251 null
2025-09-24 CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion Chenhao Ji et.al. 2509.19979 null
2025-09-24 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Yang Zhou et.al. 2509.12201 null
2025-09-23 Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters Pin-Yen Chiu et.al. 2509.18831 null
2025-09-22 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models Geonung Kim et.al. 2509.17985 null
2025-09-22 I2VWM: Robust Watermarking for Image to Video Generation Guanjie Wang et.al. 2509.17773 null
2025-09-21 Echo-Path: Pathology-Conditioned Echo Video Generation Kabir Hamzah Muhammad et.al. 2509.17190 null
2025-09-21 VidCLearn: A Continual Learning Approach for Text-to-Video Generation Luca Zanchetta et.al. 2509.16956 null
2025-09-21 $\mathtt{M^3VIR}$ : A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation Yuanzhi Li et.al. 2509.16873 null
2025-09-20 RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation Tianyi Yan et.al. 2509.16500 null
2025-09-19 Lynx: Towards High-Fidelity Personalized Video Generation Shen Sang et.al. 2509.15496 null
2025-09-19 AToken: A Unified Tokenizer for Vision Jiasen Lu et.al. 2509.14476 null
2025-09-18 OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data BjΓΆrn MΓΆller et.al. 2509.15479 null
2025-09-18 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Yuming Jiang et.al. 2509.15212 null
2025-09-18 WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance Chenxi Song et.al. 2509.15130 null
2025-09-18 DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images Kazuma Nagata et.al. 2509.14685 null
2025-09-18 BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching Hanshuai Cui et.al. 2509.13789 null
2025-09-17 PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models Artem Lykov et.al. 2509.13903 null
2025-09-17 TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving Jiawei Wang et.al. 2509.13164 null
2025-09-17 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Yikang Ding et.al. 2509.09595 null
2025-09-16 \textsc{Gen2Real}: Towards Demo-Free Dexterous Manipulation by Harnessing Generated Video Kai Ye et.al. 2509.14178 null
2025-09-16 BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models Yuming Li et.al. 2509.06040 null
2025-09-15 AvatarSync: Rethinking Talking-Head Animation through Autoregressive Perspective Yuchen Deng et.al. 2509.12052 null
2025-09-15 SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching Jiacheng Liu et.al. 2509.11628 null
2025-09-15 MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment Yanyun Pu et.al. 2509.11589 null
2025-09-14 VideoAgent: Personalized Synthesis of Scientific Videos Xiao Liang et.al. 2509.11253 null
2025-09-14 PanoLora: Bridging Perspective and Panoramic Video Generation with LoRA Adaptation Zeyu Dong et.al. 2509.11092 null
2025-09-12 Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation Hao Zhang et.al. 2509.10687 null
2025-09-12 T2Bs: Text-to-Character Blendshapes via Video Generation Jiahao Luo et.al. 2509.10678 null
2025-09-12 Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching Zhixin Zheng et.al. 2509.10312 null
2025-09-11 Improving Video Diffusion Transformer Training by Multi-Feature Fusion and Alignment from Self-Supervised Vision Encoders Dohun Lee et.al. 2509.09547 null
2025-09-11 Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training Ruicheng Zhang et.al. 2509.06723 null
2025-09-10 RewardDance: Reward Scaling in Visual Generation Jie Wu et.al. 2509.08826 null
2025-09-10 GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts Jenna Kang et.al. 2509.08818 null
2025-09-10 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Liyang Chen et.al. 2509.08519 null
2025-09-09 ANYPORTAL: Zero-Shot Consistent Video Background Replacement Wenshuo Gao et.al. 2509.07472 null
2025-09-09 Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching Feng Wang et.al. 2509.05952 null
2025-09-09 Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts Adam Cole et.al. 2509.05323 null
2025-09-07 UniVerse-1: Unified Audio-Video Generation via Stitching of Experts Duomin Wang et.al. 2509.06155 null
2025-09-04 Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image -- Technical Preview Jun-Kun Chen et.al. 2509.04450 null
2025-09-04 Human Motion Video Generation: A Survey Haiwei Xue et.al. 2509.03883 null
2025-09-03 CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation Zixin Zhu et.al. 2509.01028 null
2025-09-01 Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement Jiayi Gao et.al. 2509.01362 null
2025-09-01 Communicative Agents for Slideshow Storytelling Video Generation based on LLMs Jingxing Fan et.al. 2509.01277 null
2025-09-01 FantasyHSI: Video-Generation-Centric 4D Human Synthesis In Any Scene through A Graph-based Multi-Agent Framework Lingzhou Mu et.al. 2509.01232 null
2025-08-30 DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective Yushuo Chen et.al. 2509.00403 null
2025-08-28 Mixture of Contexts for Long Video Generation Shengqu Cai et.al. 2508.21058 null
2025-08-28 POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models Jiaxiang Cheng et.al. 2508.21019 null
2025-08-28 Learning Primitive Embodied World Models: Towards Scalable Robotic Learning Qiao Sun et.al. 2508.20840 null
2025-08-28 Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation Jiusi Li et.al. 2508.20471 null
2025-08-28 Ego-centric Predictive Model Conditioned on Hand Trajectories Binjie Zhang et.al. 2508.19852 null
2025-08-28 MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation Ming Chen et.al. 2508.19320 null
2025-08-27 ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion Xurui Peng et.al. 2508.21091 null
2025-08-26 ROSE: Remove Objects with Side Effects in Videos Chenxuan Miao et.al. 2508.18633 null
2025-08-26 Wan-S2V: Audio-Driven Cinematic Video Generation Xin Gao et.al. 2508.18621 null
2025-08-26 Waver: Wave Your Way to Lifelike Video Generation Yifu Zhang et.al. 2508.15761 null
2025-08-25 SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling Fanjiang Ye et.al. 2508.17756 null
2025-08-25 OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models Huanpeng Chu et.al. 2508.16212 null
2025-08-24 A Synthetic Dataset for Manometry Recognition in Robotic Applications Pedro Antonio Rabelo Saraiva et.al. 2508.17468 null
2025-08-24 MoCo: Motion-Consistent Human Video Generation via Structure-Appearance Decoupling Haoyu Wang et.al. 2508.17404 null
2025-08-24 DiCache: Let Diffusion Model Determine Its Own Cache Jiazi Bu et.al. 2508.17356 null
2025-08-23 SSG-Dit: A Spatial Signal Guided Framework for Controllable Video Generation Peng Hu et.al. 2508.17062 null
2025-08-23 HiCache: Training-free Acceleration of Diffusion Models via Hermite Polynomial-based Feature Caching Liang Feng et.al. 2508.16984 null
2025-08-23 HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation Sizhe Shan et.al. 2508.16930 null
2025-08-22 Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation Chun-Peng Chang et.al. 2508.16512 null
2025-08-22 Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers Shikang Zheng et.al. 2508.16211 null
2025-08-21 Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning Yijun Liu et.al. 2508.15874 null
2025-08-21 CineScale: Free Lunch in High-Resolution Cinematic Visual Generation Haonan Qiu et.al. 2508.15774 null
2025-08-21 Scaling Group Inference for Diverse and High-Quality Generation Gaurav Parmar et.al. 2508.15773 null
2025-08-21 WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception Zhiheng Liu et.al. 2508.15720 null
2025-08-21 TiP4GEN: Text to Immersive Panorama 4D Scene Generation Ke Xing et.al. 2508.12415 null
2025-08-20 DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing Weitao Wang et.al. 2508.14465 null
2025-08-20 MoVieDrive: Multi-Modal Multi-View Urban Scene Video Generation Guile Wu et.al. 2508.14327 null
2025-08-19 xDiff: Online Diffusion Model for Collaborative Inter-Cell Interference Management in 5G O-RAN Peihao Yan et.al. 2508.15843 null
2025-08-19 InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing Shaoshu Yang et.al. 2508.14033 null
2025-08-19 Physics-Based 3D Simulation for Synthetic Data Generation and Failure Analysis in Packaging Stability Assessment Samuel Seligardi et.al. 2508.13989 null
2025-08-18 4DNeX: Feed-Forward 4D Generative Modeling Made Easy Zhaoxi Chen et.al. 2508.13154 null
2025-08-18 Precise Action-to-Video Generation Through Visual Action Prompts Yuang Wang et.al. 2508.13104 null
2025-08-18 EgoTwin: Dreaming Body and View in First Person Jingqiao Xiu et.al. 2508.13013 null
2025-08-18 Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Xianglong He et.al. 2508.13009 null
2025-08-18 Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation Qirui Li et.al. 2508.12969 null
2025-08-18 Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models Jianshu Zeng et.al. 2508.12945 null
2025-08-18 S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Chubin Chen et.al. 2508.12880 null
2025-08-18 E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model Ronghao Lin et.al. 2508.12854 null
2025-08-18 MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration Yuanxin Wei et.al. 2508.12691 null
2025-08-15 CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models Xiaoxue Wu et.al. 2508.11484 null
2025-08-15 Preacher: Paper-to-Video Agentic System Jingwei Liu et.al. 2508.09632 null
2025-08-14 GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning Kelin Yu et.al. 2508.11049 null
2025-08-14 EVCtrl: Efficient Control Adapter for Visual Generation Zixiang Yang et.al. 2508.10963 null
2025-08-14 Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation Harold Haodong Chen et.al. 2508.10858 null
2025-08-14 Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation Youping Gu et.al. 2508.10774 null
2025-08-14 AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences Jieyu Li et.al. 2508.10771 null
2025-08-14 HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis Shiyu Liu et.al. 2508.10566 null
2025-08-14 From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts Yuji Wang et.al. 2508.09476 null
2025-08-14 Yan: Foundational Interactive Video Generation Deheng Ye et.al. 2508.08601 null
2025-08-13 Physical Autoregressive Model for Robotic Manipulation without Action Pretraining Zijian Song et.al. 2508.09822 null
2025-08-12 X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents Guoxian Song et.al. 2508.09383 null
2025-08-12 Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices Ya Zou et.al. 2508.09136 null
2025-08-12 TaoCache: Structure-Maintained Video Generation Acceleration Zhentao Fan et.al. 2508.08978 null
2025-08-12 Subjective and Objective Quality Assessment of Banding Artifacts on Compressed Videos Qi Zheng et.al. 2508.08700 null
2025-08-12 RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space Jingyun Liang et.al. 2508.08588 null
2025-08-12 S^2VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix Peng Dai et.al. 2508.08048 null
2025-08-12 Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation Fangyuan Mao et.al. 2508.07981 null
2025-08-12 Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation Bowen Xue et.al. 2508.07901 null
2025-08-11 VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By \underline{V}alue \underline{S}ign \underline{F}lip Wenqi Guo et.al. 2508.10931 null
2025-08-11 StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation Shuyuan Tu et.al. 2508.08248 null
2025-08-11 Matrix-3D: Omnidirectional Explorable 3D World Generation Zhongqi Yang et.al. 2508.08086 null
2025-08-11 Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation Xiaoyan Liu et.al. 2508.07769 null
2025-08-11 ShoulderShot: Generating Over-the-Shoulder Dialogue Videos Yuang Zhang et.al. 2508.07597 null
2025-08-08 Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video Jixuan He et.al. 2508.06715 null
2025-08-08 SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment Yanxiao Sun et.al. 2508.06082 null
2025-08-08 DreamVE: Unified Instruction-based Image and Video Editing Bin Xia et.al. 2508.06080 null
2025-08-07 Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Yue Liao et.al. 2508.05635 null
2025-08-07 B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding Changho Choi et.al. 2508.05269 null
2025-08-07 PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation Jingxuan He et.al. 2508.05091 null
2025-08-07 S $^2$ Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation Weilun Feng et.al. 2508.04016 null
2025-08-06 MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning Quang-Trung Truong et.al. 2508.04549 null
2025-08-06 LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation Kangrui Cen et.al. 2508.04228 null
2025-08-06 Motion is the Choreographer: Learning Latent Pose Dynamics for Seamless Sign Language Generation Jiayi He et.al. 2508.04049 null
2025-08-06 Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation Xunzhi Xiang et.al. 2508.03334 null
2025-08-05 Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm Lin Zhang et.al. 2508.03955 null
2025-08-05 LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation Jianxiong Gao et.al. 2508.03694 null
2025-08-05 RAAG: Ratio Aware Adaptive Guidance Shangwen Zhu et.al. 2508.03442 null
2025-08-05 V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models Jisoo Kim et.al. 2508.03254 null
2025-08-05 Multi-human Interactive Talking Dataset Zeyu Zhu et.al. 2508.03050 null
2025-08-05 MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention Qi Xie et.al. 2508.03034 null
2025-08-05 D3: Training-Free AI-Generated Video Detection Using Second-Order Features Chende Zheng et.al. 2508.00701 null
2025-08-04 X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio Chenxu Zhang et.al. 2508.02944 null
2025-08-04 DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework Tongchun Zuo et.al. 2508.02807 null
2025-08-04 QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots Sheng Wu et.al. 2508.02512 null
2025-08-04 PoseGuard: Pose-Guided Generation with Safety Guardrails Kongxin Wang et.al. 2508.02476 null
2025-08-04 Talking Surveys: How Photorealistic Embodied Conversational Agents Shape Response Quality, Engagement, and Satisfaction Matus Krajcovic et.al. 2508.02376 null
2025-08-03 Versatile Transition Generation with Image-to-Video Diffusion Zuhao Yang et.al. 2508.01698 null
2025-08-01 Video Generators are Robot Policies Junbang Liang et.al. 2508.00795 null
2025-08-01 SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation Kien T. Pham et.al. 2508.00782 null
2025-08-01 Video Forgery Detection with Optical Flow Residuals and Spatial-Temporal Consistency Xi Xue et.al. 2508.00397 null
2025-08-01 GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection Suhang Cai et.al. 2508.00312 null
2025-08-01 Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence Danzhen Fu et.al. 2508.00299 null
2025-08-01 HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly Chang Liu et.al. 2507.19924 null
2025-07-31 World Consistency Score: A Unified Metric for Video Generation Quality Akshat Rakheja et.al. 2508.00144 null
2025-07-30 GVD: Guiding Video Diffusion Model for Scalable Video Distillation Kunyang Li et.al. 2507.22360 null
2025-07-29 JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 Xinhan Di et.al. 2507.20987 null
2025-07-28 Compositional Video Synthesis by Temporal Object-Centric Learning Adil Kaan Akan et.al. 2507.20855 null
2025-07-27 MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation Shuolin Xu et.al. 2507.20368 null
2025-07-26 ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion Xuanchen Wang et.al. 2507.19836 null
2025-07-25 ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment Chong Xia et.al. 2507.19058 null
2025-07-24 Captain Cinema: Towards Short Movie Generation Junfei Xiao et.al. 2507.18634 null
2025-07-24 Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis Yanzuo Lu et.al. 2507.18569 null
2025-07-24 Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows Simin Huo et.al. 2507.18405 null
2025-07-24 T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation Yubin Chen et.al. 2507.18107 null
2025-07-24 Enhancing Scene Transition Awareness in Video Generation via Post-Training Hanwen Shen et.al. 2507.18046 null
2025-07-24 Celeb-DF++: A Large-scale Challenging Video DeepFake Benchmark for Generalizable Forensics Yuezun Li et.al. 2507.18015 null
2025-07-24 Controllable Video Generation: A Survey Yue Ma et.al. 2507.16869 null
2025-07-23 Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA Rameen Abdal et.al. 2507.17963 null
2025-07-23 Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation Jaechul Roh et.al. 2507.17937 null
2025-07-23 Yume: An Interactive World Generation Model Xiaofeng Mao et.al. 2507.17744 null
2025-07-23 EndoGen: Conditional Autoregressive Endoscopic Video Generation Xinyu Liu et.al. 2507.17388 null
2025-07-22 Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching Haiyang Liu et.al. 2507.18649 null
2025-07-22 MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation Yanchen Liu et.al. 2507.16310 null
2025-07-22 PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation Yaofang Liu et.al. 2507.16116 null
2025-07-21 Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models Enes Sanli et.al. 2507.15824 null
2025-07-21 TokensGen: Harnessing Condensed Tokens for Long Video Generation Wenqi Ouyang et.al. 2507.15728 null
2025-07-21 Conditional Video Generation for High-Efficiency Video Compression Fangqiu Yi et.al. 2507.15269 null
2025-07-19 BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM Haiquan Wen et.al. 2507.14632 null
2025-07-19 Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey Jiahui Zhang et.al. 2507.14501 null
2025-07-18 Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis Tongtong Su et.al. 2507.13753 null
2025-07-17 $\nabla$ NABLA: Neighborhood Adaptive Block-Level Attention Dmitrii Mikhailov et.al. 2507.13546 null
2025-07-17 "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Jing Gu et.al. 2507.13428 null
2025-07-17 Taming Diffusion Transformer for Real-Time Mobile Video Generation Yushu Wu et.al. 2507.13343 null
2025-07-17 Leveraging Pre-Trained Visual Models for AI-Generated Video Detection Keerthi Veeramachaneni et.al. 2507.13224 null
2025-07-17 LoViC: Efficient Long Video Generation with Context Compression Jiaxiu Jiang et.al. 2507.12952 null
2025-07-17 World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving Yanchen Guan et.al. 2507.12762 null
2025-07-16 EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models Jiajian Xie et.al. 2507.11980 null
2025-07-15 NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models X. Feng et.al. 2507.11245 null
2025-07-14 Flows and Diffusions on the Neural Manifold Daniel Saragih et.al. 2507.10623 null
2025-07-14 M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation Kui Jiang et.al. 2507.08307 null
2025-07-14 Democratizing High-Fidelity Co-Speech Gesture Video Generation Xu Yang et.al. 2507.06812 null
2025-07-12 $I^{2}$ -World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting Zhimin Liao et.al. 2507.09144 null
2025-07-11 Taming generative video models for zero-shot optical flow extraction Seungwoo Kim et.al. 2507.09082 null
2025-07-11 Detecting Deepfake Talking Heads from Facial Biometric Anomalies Justin D. Norman et.al. 2507.08917 null
2025-07-11 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Hangjie Yuan et.al. 2507.08801 null
2025-07-11 Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Wongi Jeong et.al. 2507.08422 null
2025-07-11 T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates Zhitao Wang et.al. 2507.07633 null
2025-07-10 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Haoyu Wu et.al. 2507.07982 null
2025-07-10 Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions Longfei Li et.al. 2507.07978 null
2025-07-10 Scaling RL to Long Videos Yukang Chen et.al. 2507.07966 null
2025-07-09 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Mohamed Elmoghany et.al. 2507.07202 null
2025-07-09 Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation Tao Feng et.al. 2507.06830 null
2025-07-09 PromptTea: Let Prompts Tell TeaCache the Optimal Threshold Zishen Huang et.al. 2507.06739 null
2025-07-09 Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis Hao Tang et.al. 2507.06689 null
2025-07-09 FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation Liqiang Jing et.al. 2507.06523 null
2025-07-09 Omni-Video: Democratizing Unified Video Understanding and Generation Zhiyu Tan et.al. 2507.06119 null
2025-07-09 Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation Zhenghao Zhang et.al. 2507.05963 null
2025-07-09 LongAnimation: Long Animation Generation with Dynamic Global-Local Memory Nan Chen et.al. 2507.01945 null
2025-07-08 Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions Jaewan Park et.al. 2507.06133 null
2025-07-08 MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos Rongsheng Wang et.al. 2507.05675 null
2025-07-08 StreamDiT: Real-Time Streaming Text-to-Video Generation Akio Kodaira et.al. 2507.03745 null
2025-07-07 HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding Yuxuan Cai et.al. 2507.04909 null
2025-07-07 Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning Jiayun Hu et.al. 2507.04758 null
2025-07-07 Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations Yuji Wang et.al. 2507.04705 null
2025-07-06 MambaVideo for Discrete Video Tokenization with Channel-Split Quantization Dawit Mureja Argaw et.al. 2507.04559 null
2025-07-06 CLIP-RL: Surgical Scene Segmentation Using Contrastive Language-Vision Pretraining & Reinforcement Learning Fatmaelzahraa Ali Ahmed et.al. 2507.04317 null
2025-07-05 PresentAgent: Multimodal Agent for Presentation Video Generation Jingwei Shi et.al. 2507.04036 null
2025-07-05 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation Rang Meng et.al. 2507.03905 null
2025-07-03 RefTok: Reference-Based Tokenization for Video Generation Xiang Fan et.al. 2507.02862 null
2025-07-03 Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching Xin Zhou et.al. 2507.02860 null
2025-07-03 AnyI2V: Animating Any Conditional Image with Motion Control Ziye Li et.al. 2507.02857 null
2025-07-03 Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation FranΓ§ois Rozet et.al. 2507.02608 null
2025-07-03 RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment Jianing Jin et.al. 2506.23852 null
2025-07-02 SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations Zhican Wang et.al. 2507.01309 null
2025-07-02 LLM-based Realistic Safety-Critical Driving Video Generation Yongjie Fu et.al. 2507.01264 null
2025-07-02 AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation Xiao Liu et.al. 2507.01255 null
2025-07-01 Geometry-aware 4D Video Generation for Robot Manipulation Zeyi Liu et.al. 2507.01099 null
2025-07-01 Populate-A-Scene: Affordance-Aware Human Video Generation Mengyi Shan et.al. 2507.00334 null
2025-07-01 Listener-Rewarded Thinking in VLMs for Image Preferences Alexander Gambashidze et.al. 2506.22832 null
2025-06-30 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion Yu Lu et.al. 2507.00162 null
2025-06-30 Epona: Autoregressive Diffusion World Model for Autonomous Driving Kaiwen Zhang et.al. 2506.24113 null
2025-06-30 VMoBA: Mixture-of-Block Attention for Video Diffusion Models Jianzong Wu et.al. 2506.23858 null
2025-06-30 SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation Shuai Tan et.al. 2506.23690 null
2025-06-30 ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models Zixun Fang et.al. 2506.23513 null
2025-06-29 Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis Lei-lei Li et.al. 2506.23263 null
2025-06-29 RoboScape: Physics-informed Embodied World Model Yu Shang et.al. 2506.23135 null
2025-06-27 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Yuhao Liu et.al. 2506.22432 null
2025-06-27 RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation Liudi Yang et.al. 2506.22007 null
2025-06-27 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Hongbo Liu et.al. 2506.21356 null
2025-06-27 DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing Lingling Cai et.al. 2506.20967 null
2025-06-26 SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture Kehan Sui et.al. 2506.21478 null
2025-06-26 HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation Diego Biagini et.al. 2506.21287 null
2025-06-26 Video Virtual Try-on with Conditional Diffusion Transformer Inpainter Cheng Zou et.al. 2506.21270 null
2025-06-26 Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models Donggoo Kang et.al. 2506.20946 null
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601 null
2025-06-25 BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos Jiahao Lin et.al. 2506.20103 null
2025-06-24 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Xingyang Li et.al. 2506.19852 null
2025-06-24 GenHSI: Controllable Generation of Human-Scene Interaction Videos Zekun Li et.al. 2506.19840 null
2025-06-24 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Liangbin Xie et.al. 2506.19838 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-24 Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation Jintao Rong et.al. 2506.19348 null
2025-06-23 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Runjia Li et.al. 2506.18903 null
2025-06-23 From Virtual Games to Real-World Play Wenqiang Sun et.al. 2506.18901 null
2025-06-23 FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation Kaiyi Huang et.al. 2506.18899 null
2025-06-23 MinD: Unified Visual Imagination and Control via Hierarchical World Models Xiaowei Chi et.al. 2506.18897 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-23 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset Zhuowei Chen et.al. 2506.18851 null
2025-06-23 Matrix-Game: Interactive World Foundation Model Yifan Zhang et.al. 2506.18701 null
2025-06-23 RDPO: Real Data Preference Optimization for Physics Consistency Video Generation Wenxu Qian et.al. 2506.18655 null
2025-06-23 BulletGen: Improving 4D Reconstruction with Bullet-Time Generation Denys Rozumnyi et.al. 2506.18601 null
2025-06-23 VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning Xuanyu Zhang et.al. 2506.18564 null
2025-06-23 Emergent Temporal Correspondences from Video Diffusion Transformers Jisu Nam et.al. 2506.17220 link
2025-06-21 STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation Jiamin Wang et.al. 2506.13138 null
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201 null
2025-06-20 Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation Riccardo Corvi et.al. 2506.16802 null
2025-06-20 Sekai: A Video Dataset towards World Exploration Zhen Li et.al. 2506.15675 null
2025-06-20 Show-o2: Improved Native Unified Multimodal Models Jinheng Xie et.al. 2506.15564 link
2025-06-19 VideoGAN-based Trajectory Proposal for Automated Vehicles Annajoyce Mariani et.al. 2506.16209 link
2025-06-19 FastInit: Fast Noise Initialization for Temporally Consistent Video Generation Chengyu Bai et.al. 2506.16119 null
2025-06-19 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Tianchen Zhao et.al. 2506.16054 null
2025-06-19 Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization Cong Wang et.al. 2506.15980 link
2025-06-18 VideoMAR: Autoregressive Video Generatio with Continuous Tokens Hu Yu et.al. 2506.14168 null
2025-06-18 Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren et.al. 2506.09042 link
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 link
2025-06-17 CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation Jia-Chen Zhang et.al. 2506.14206 null
2025-06-16 EchoShot: Multi-Shot Portrait Video Generation Jiahao Wang et.al. 2506.15838 null
2025-06-16 UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions Zhucun Xue et.al. 2506.13691 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-13 SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation Xu Wang et.al. 2506.11621 null
2025-06-13 Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models Sridhar S et.al. 2506.10005 null
2025-06-12 GenWorld: Towards Detecting AI-generated Real-world Simulation Videos Weiliang Chen et.al. 2506.10975 null
2025-06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation Jiancheng Huang et.al. 2506.10915 null
2025-06-12 GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning Xiaoyi Bao et.al. 2506.10639 null
2025-06-12 DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers Lizhen Wang et.al. 2506.10568 null
2025-06-12 AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation Haoyuan Shi et.al. 2506.10540 null
2025-06-11 AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation Chao Liang et.al. 2506.11144 null
2025-06-11 PlayerOne: Egocentric World Simulator Yuanpeng Tu et.al. 2506.09995 null
2025-06-11 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Zhenzhi Wang et.al. 2506.09984 null
2025-06-11 ReSim: Reliable World Simulation for Autonomous Driving Jiazhi Yang et.al. 2506.09981 null
2025-06-11 DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning Dongxu Liu et.al. 2506.09644 null
2025-06-11 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Shanchuan Lin et.al. 2506.09350 null
2025-06-10 Seedance 1.0: Exploring the Boundaries of Video Generation Models Yu Gao et.al. 2506.09113 null
2025-06-10 FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Zheqi He et.al. 2506.09081 link
2025-06-10 VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Xinlong Chen et.al. 2506.09079 null
2025-06-10 MagCache: Fast Video Generation with Magnitude-Aware Cache Zehong Ma et.al. 2506.09045 link
2025-06-10 Product of Experts for Visual Generation Yunzhi Zhang et.al. 2506.08894 null
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Yang Bai et.al. 2506.08632 null
2025-06-10 How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models Huixuan Zhang et.al. 2506.08351 null
2025-06-10 From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models Pablo Acuaviva et.al. 2506.07280 null
2025-06-09 Seeing Voices: Generating A-Roll Video from Audio with Mirage Aditi Sundararaman et.al. 2506.08279 null
2025-06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Xun Huang et.al. 2506.08009 null
2025-06-09 Dreamland: Controllable World Creation with Simulator and Generative Models Sicheng Mo et.al. 2506.08006 null
2025-06-09 Audio-Sync Video Generation with Multi-Stream Temporal Control Shuchen Weng et.al. 2506.08003 null
2025-06-09 Generative Modeling of Weights: Generalization or Memorization? Boya Zeng et.al. 2506.07998 link
2025-06-09 Video Unlearning via Low-Rank Refusal Vector Simone Facchiano et.al. 2506.07891 null
2025-06-09 EgoM2P: Egocentric Multimodal Multitask Pretraining Gen Li et.al. 2506.07886 null
2025-06-09 PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement Teng Hu et.al. 2506.07848 null
2025-06-09 Consistent Video Editing as Flow-Driven Image-to-Video Generation Ge Wang et.al. 2506.07713 null
2025-06-09 Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation Boris Martirosyan et.al. 2506.07706 null
2025-06-09 Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers Haosong Liu et.al. 2506.05096 null
2025-06-08 TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation Min-Jung Kim et.al. 2506.07205 null
2025-06-08 Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Sangwon Jang et.al. 2506.07177 null
2025-06-08 Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion Huaize Liu et.al. 2506.07136 null
2025-06-07 Self-Adapting Improvement Loops for Robotic Learning Calvin Luo et.al. 2506.06658 null
2025-06-06 Restereo: Diffusion stereo video generation and restoration Xingchang Huang et.al. 2506.06023 null
2025-06-06 LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models Haojie Yu et.al. 2506.05806 null
2025-06-06 FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion Akide Liu et.al. 2506.04648 null
2025-06-05 EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh Tao Hu et.al. 2506.05554 null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343 null
2025-06-05 FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Huihan Wang et.al. 2506.04956 link
2025-06-05 DualX-VSR: Dual Axial Spatial $\times$ Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation Shuo Cao et.al. 2506.04830 null
2025-06-05 Follow-Your-Creation: Empowering 4D Creation through Video Inpainting Yue Ma et.al. 2506.04590 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-05 SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios Lingwei Dang et.al. 2506.02444 link
2025-06-04 LayerFlow: A Unified Model for Layer-aware Video Generation Sihui Ji et.al. 2506.04228 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-04 DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Ziyi Wu et.al. 2506.03517 null
2025-06-03 Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas Austin Silveria et.al. 2506.03275 null
2025-06-03 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation Yuanze Lin et.al. 2506.03150 null
2025-06-03 Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval Jiwen Yu et.al. 2506.03141 null
2025-06-03 CamCloneMaster: Enabling Reference-based Camera Control for Video Generation Yawen Luo et.al. 2506.03140 null
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126 null
2025-06-03 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Zhengyao Lv et.al. 2506.03123 null
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099 null
2025-06-03 SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis Ssharvien Kumar Sivakumar et.al. 2506.03082 null
2025-06-03 ORV: 4D Occupancy-centric Robot Video Generation Xiuyu Yang et.al. 2506.03079 link
2025-06-03 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Pengtao Chen et.al. 2506.03065 null
2025-06-03 LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering Xiaoyi Feng et.al. 2506.02733 null
2025-06-03 LumosFlow: Motion-Guided Long Video Generation Jiahao Chen et.al. 2506.02497 null
2025-06-02 Motion aware video generative model Bowen Xue et.al. 2506.02244 null
2025-06-02 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Xiao Fu et.al. 2506.01943 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-02 Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks Tao Yang et.al. 2506.01758 null
2025-06-02 Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents Shuting Wang et.al. 2506.01689 null
2025-06-02 LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model Xiaodong Wang et.al. 2506.01546 null
2025-06-02 Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark Shuyu Yang et.al. 2506.01466 null
2025-06-02 DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion Geunmin Hwang et.al. 2506.01454 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-30 DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds Jiaxu Zhang et.al. 2505.24733 null
2025-05-30 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation Yang-Tian Sun et.al. 2505.24521 null
2025-05-30 Interactive Video Generation via Domain Adaptation Ishaan Rawal et.al. 2505.24253 null
2025-05-30 STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models Zheng Tan et.al. 2505.24210 link
2025-05-29 MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng et.al. 2505.23742 link
2025-05-29 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Tingyu Song et.al. 2505.23693 link
2025-05-29 VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models Xiangdong Zhang et.al. 2505.23656 link
2025-05-29 VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation Shi-Xue Zhang et.al. 2505.23484 link
2025-05-29 Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis Hengyuan Cao et.al. 2505.23325 null
2025-05-29 RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer Liu Liu et.al. 2505.23171 null
2025-05-29 Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing Tongtong Su et.al. 2505.23134 link
2025-05-29 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation Siyuan Wang et.al. 2505.23120 link
2025-05-29 GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion Gwanghyun Kim et.al. 2505.23085 null
2025-05-29 MOVi: Training-free Text-conditioned Multi-Object Video Generation Aimon Rahman et.al. 2505.22980 null
2025-05-29 HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Shuolin Xu et.al. 2505.22977 link
2025-05-29 Minute-Long Videos with Dual Parallelisms Zeqing Wang et.al. 2505.21070 link
2025-05-28 ATI: Any Trajectory Instruction for Controllable Video Generation Angtian Wang et.al. 2505.22944 null
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers Weilun Feng et.al. 2505.22167 null
2025-05-28 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing Guanwen Feng et.al. 2505.22141 null
2025-05-28 LatentMove: Towards Complex Human Movement Video Generation Ashkan Taghipour et.al. 2505.22046 null
2025-05-28 PanoWan: Lifting Diffusion Video Generation Models to 360Β° with Latitude/Longitude-aware Mechanisms Yifei Xia et.al. 2505.22016 null
2025-05-28 Learning World Models for Interactive Video Generation Taiye Chen et.al. 2505.21996 null
2025-05-28 SageAttention2++: A More Efficient Implementation of SageAttention2 Jintao Zhang et.al. 2505.21136 link
2025-05-28 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Shenghai Yuan et.al. 2505.20292 link
2025-05-27 HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation Bowen Chen et.al. 2505.21831 null
2025-05-27 Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation Ke Zhang et.al. 2505.21653 null
2025-05-27 VideoMarkBench: Benchmarking Robustness of Video Watermarking Zhengyuan Jiang et.al. 2505.21620 link
2025-05-27 Frame In-N-Out: Unbounded Controllable Image-to-Video Generation Boyang Wang et.al. 2505.21491 null
2025-05-27 Dynamic Vision from EEG Brain Recordings: How much does EEG know? Prajwal Singh et.al. 2505.21385 null
2025-05-27 RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy Aiyue Chen et.al. 2505.21036 null
2025-05-27 Frame-Level Captions for Long Video Generation with Complex Multi Scenes Guangcong Zheng et.al. 2505.20827 null
2025-05-27 Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt Xiang Zhu et.al. 2505.20795 null
2025-05-27 Photography Perspective Composition: Towards Aesthetic Perspective Recommendation Lujian Yao et.al. 2505.20655 null
2025-05-27 Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training Bolin Lai et.al. 2505.20629 null
2025-05-27 Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM Peng Liu et.al. 2505.19901 null
2025-05-26 MotionPro: A Precise Motion Controller for Image-to-Video Generation Zhongwei Zhang et.al. 2505.20287 null
2025-05-26 DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving Wenchao Sun et.al. 2505.19692 link
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-26 The Role of Video Generation in Enhancing Data-Limited Action Understanding Wei Li et.al. 2505.19495 null
2025-05-26 Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals Nate Gillman et.al. 2505.19386 null
2025-05-26 DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation Junhao Chen et.al. 2505.18078 null
2025-05-25 From Single Images to Motion Policies via Video-Generation Environment Representations Weiming Zhi et.al. 2505.19306 null
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-05-25 WorldEval: World Model as Real-World Robot Policies Evaluator Yaxuan Li et.al. 2505.19017 null
2025-05-25 Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency Hyunho Ha et.al. 2505.18932 null
2025-05-25 Interspatial Attention for Efficient 4D Human Video Generation Ruizhi Shao et.al. 2505.15800 null
2025-05-24 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Shuo Yang et.al. 2505.18875 null
2025-05-24 VORTA: Efficient Video Diffusion via Routing Sparse Attention Wenhao Sun et.al. 2505.18809 link
2025-05-24 DVD-Quant: Data-free Video Diffusion Transformers Quantization Zhiteng Li et.al. 2505.18663 link
2025-05-24 ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos Xiaodong Wang et.al. 2505.18650 null
2025-05-23 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions Zizhang Li et.al. 2505.18151 null
2025-05-23 SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain Jiawei Zhou et.al. 2505.17727 null
2025-05-23 Scaling Image and Video Generation via Test-Time Evolutionary Search Haoran He et.al. 2505.17618 null
2025-05-23 InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO Xueji Fang et.al. 2505.17574 link
2025-05-23 Challenger: Affordable Adversarial Driving Video Generation Zhiyuan Xu et.al. 2505.15880 null
2025-05-22 Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis Xin You et.al. 2505.17333 null
2025-05-22 Training-Free Efficient Video Generation via Dynamic Token Carving Yuechen Zhang et.al. 2505.16864 link
2025-05-22 Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts Taewon Kang et.al. 2505.16819 null
2025-05-22 MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM Siwei Meng et.al. 2505.16456 null
2025-05-21 Generative AI for Autonomous Driving: A Review Katharina Winter et.al. 2505.15863 null
2025-05-21 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection Zhipei Xu et.al. 2505.15173 null
2025-05-21 CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation Xinran Wang et.al. 2505.15145 link
2025-05-21 BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation Haiquan Wen et.al. 2505.12620 link
2025-05-21 Video-GPT via Next Clip Diffusion Shaobin Zhuang et.al. 2505.12489 null
2025-05-20 Programmatic Video Prediction Using Large Language Models Hao Tang et.al. 2505.14948 link
2025-05-20 Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers Sucheng Ren et.al. 2505.14687 link
2025-05-20 LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer Changgu Chen et.al. 2505.14167 null
2025-05-20 Hunyuan-Game: Industrial-grade Intelligent Game Creation Model Ruihuang Li et.al. 2505.14135 null
2025-05-20 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 link
2025-05-19 FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Dian Shao et.al. 2505.13437 null
2025-05-19 MAGI-1: Autoregressive Video Generation at Scale Sand. ai et.al. 2505.13211 link
2025-05-19 DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories Joel Jang et.al. 2505.12705 link
2025-05-19 Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking Zihan Su et.al. 2505.12667 null
2025-05-18 EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models Hu Yue et.al. 2505.09694 link
2025-05-17 FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge Xuan Shen et.al. 2505.14709 link
2025-05-17 DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance Xuan Shen et.al. 2505.14708 link
2025-05-17 LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation Jiarui Wang et.al. 2505.12098 link
2025-05-17 VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption Tianxiong Zhong et.al. 2505.12053 null
2025-05-17 STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives Bo Wang et.al. 2505.08350 null
2025-05-16 QVGen: Pushing the Limit of Quantized Video Generative Models Yushi Huang et.al. 2505.11497 null
2025-05-16 Face Consistency Benchmark for GenAI Video Michal Podstawski et.al. 2505.11425 null
2025-05-16 Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model Wei Li et.al. 2505.07449 link
2025-05-15 ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars Rui-Yang Ju et.al. 2505.10072 null
2025-05-15 Generating time-consistent dynamics with discriminator-guided image diffusion models Philipp Hess et.al. 2505.09089 null
2025-05-15 Generative Pre-trained Autoregressive Diffusion Transformer Yuan Zhang et.al. 2505.07344 null
2025-05-14 Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios Huafeng Shi et.al. 2505.10584 null
2025-05-13 Generative AI for Autonomous Driving: Frontiers and Opportunities Yuping Wang et.al. 2505.08854 link
2025-05-13 Symbolically-Guided Visual Plan Inference from Uncurated Video Data Wenyan Yang et.al. 2505.08444 null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818 null
2025-05-12 ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models Ozgur Kara et.al. 2505.07652 null
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-11 BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation Panwen Hu et.al. 2505.06985 null
2025-05-10 Jailbreaking the Text-to-Video Generative Models Jiayang Liu et.al. 2505.06679 null
2025-05-10 ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images Xianghao Kong et.al. 2505.06537 null
2025-05-08 3D Scene Generation: A Survey Beichen Wen et.al. 2505.05474 link
2025-05-08 T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models Xuyang Guo et.al. 2505.04946 null
2025-05-08 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Teng Hu et.al. 2505.04512 null
2025-05-06 Real-Time Person Image Synthesis Using a Flow Matching Model Jiwoo Jeong et.al. 2505.03562 link
2025-05-06 Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights Zhaiming Shen et.al. 2505.03205 null
2025-05-04 DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization Wenchuan Wang et.al. 2505.02192 null
2025-05-03 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting Anushka Agarwal et.al. 2505.01928 null
2025-05-03 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth Bu Jin et.al. 2505.01729 null
2025-05-02 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos Zongxia Li et.al. 2505.01481 link
2025-05-02 FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis Jiangtong Tan et.al. 2505.01172 link
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation Xuyang Guo et.al. 2505.00337 null
2025-04-30 Direct Motion Models for Assessing Generated Videos Kelsey Allen et.al. 2505.00209 null
2025-04-30 Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis Michal Geyer et.al. 2505.00135 null
2025-04-30 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction Qihao Liu et.al. 2504.21855 null
2025-04-30 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation Haiyang Zhou et.al. 2504.21650 link
2025-04-30 Simple Visual Artifact Detection in Sora-Generated Videos Misora Sugiyama et.al. 2504.21334 null
2025-04-30 Capturing Conditional Dependence via Auto-regressive Diffusion Models Xunpeng Huang et.al. 2504.21314 null
2025-04-29 TesserAct: Learning 4D Embodied World Models Haoyu Zhen et.al. 2504.20995 null
2025-04-29 DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs Hao Luan et.al. 2504.20754 null
2025-04-29 Advance Fake Video Detection via Vision Transformers Joy Battocchio et.al. 2504.20669 null
2025-04-28 CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition Quynh Phung et.al. 2504.19894 null
2025-04-28 DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer Junpeng Jiang et.al. 2504.19614 null
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-26 Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation Jong Inn Park et.al. 2504.18805 null
2025-04-25 NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration Haotian Dong et.al. 2504.18448 null
2025-04-25 We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback Minkyu Choi et.al. 2504.17180 null
2025-04-24 Dynamic Camera Poses and Where to Find Them Chris Rockwell et.al. 2504.17788 null
2025-04-24 MV-Crafter: An Intelligent System for Music-guided Video Generation Chuer Chen et.al. 2504.17267 null
2025-04-24 DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks Yinqi Li et.al. 2504.17253 link
2025-04-23 Subject-driven Video Generation via Disentangled Identity and Motion Daneul Kim et.al. 2504.17816 null
2025-04-23 BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation Ruotong Wang et.al. 2504.16907 null
2025-04-23 ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Ying Li et.al. 2504.16464 null
2025-04-23 VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models Xuming Hu et.al. 2504.16359 null
2025-04-22 DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment Xiaofan Li et.al. 2504.18576 link
2025-04-22 Survey of Video Diffusion Models: Foundations, Implementations, and Applications Yimu Wang et.al. 2504.16081 link
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning Wang Lin et.al. 2504.15932 null
2025-04-22 Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views Ningli Xu et.al. 2504.15786 null
2025-04-22 DiTPainter: Efficient Video Inpainting with Diffusion Transformers Xian Wu et.al. 2504.15661 null
2025-04-21 Solving New Tasks by Adapting Internet Video Knowledge Calvin Luo et.al. 2504.15369 null
2025-04-21 Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Xianpan Zhou et.al. 2504.15182 null
2025-04-21 DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation Weijie He et.al. 2504.15032 null
2025-04-21 Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Chenjie Cao et.al. 2504.14899 link
2025-04-21 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 link
2025-04-21 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Lvmin Zhang et.al. 2504.12626 link
2025-04-20 Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis Jingjing Ren et.al. 2504.14470 null
2025-04-19 SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation Minho Park et.al. 2504.14396 link
2025-04-18 Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting Jiaxin Huang et.al. 2504.11092 null
2025-04-17 Understanding Attention Mechanism in Video Diffusion Models Bingyan Liu et.al. 2504.12027 null
2025-04-17 VideoPanda: Video Panoramic Diffusion with Multi-view Attention Kevin Xie et.al. 2504.11389 null
2025-04-17 StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Roberto Henschel et.al. 2403.14773 null
2025-04-16 VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate Zhihang Yuan et.al. 2504.12259 link
2025-04-16 Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM Zirui Pan et.al. 2504.12048 null
2025-04-16 The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation Bingjie Gao et.al. 2504.11739 null
2025-04-16 ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation Zongyi Li et.al. 2410.20502 null
2025-04-15 InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation Yukang Lin et.al. 2504.10905 null
2025-04-15 OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding Dianbing Xi et.al. 2504.10825 null
2025-04-14 H-MoRe: Learning Human-centric Motion Representation for Action Analysis Zhanbo Huang et.al. 2504.10676 link
2025-04-14 H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models Yushu Wu et.al. 2504.10567 null
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358 null
2025-04-14 Aligning Anime Video Generation with Human Feedback Bingwen Zhu et.al. 2504.10044 null
2025-04-14 EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise Chao Liu et.al. 2504.09789 null
2025-04-13 CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models Pooja Guhan et.al. 2504.09472 null
2025-04-11 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Team Seawead et.al. 2504.08685 null
2025-04-11 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Jialu Li et.al. 2504.08641 null
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rosa Wolf et.al. 2504.08438 null
2025-04-11 EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model Renda Li et.al. 2504.08344 null
2025-04-11 RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements Guangcong Zheng et.al. 2504.08212 link
2025-04-11 TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation Ruineng Li et.al. 2504.08181 null
2025-04-10 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction Zeren Jiang et.al. 2504.07961 link
2025-04-10 Beyond the Frame: Generating 360Β° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940 null
2025-04-10 Diffusion Transformers for Tabular Data Time Series Generation Fabrizio Garuti et.al. 2504.07566 link
2025-04-09 EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation Diljeet Jagpal et.al. 2504.06861 null
2025-04-09 DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation Wangbo Zhao et.al. 2504.06803 link
2025-04-09 RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism Elia Peruzzo et.al. 2504.06672 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-08 CamContextI2V: Context-aware Controllable Video Generation Luis Denninger et.al. 2504.06022 link
2025-04-08 Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants Nikolaj T. MΓΌcke et.al. 2504.05852 link
2025-04-07 One-Minute Video Generation with Test-Time Training Karan Dalal et.al. 2504.05298 null
2025-04-07 Video-Bench: Human-Aligned Video Generation Benchmark Hui Han et.al. 2504.04907 null
2025-04-07 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-05 Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization Yikai Wang et.al. 2504.04153 link
2025-04-05 Multi-identity Human Image Animation with Structural Video Diffusion Zhenzhi Wang et.al. 2504.04126 null
2025-04-05 Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models Xuyang Guo et.al. 2504.04051 null
2025-04-05 DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Maksim Siniukov et.al. 2504.04010 null
2025-04-04 Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models Xuran Ma et.al. 2504.03140 link
2025-04-04 MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition Takahiro Shirakawa et.al. 2504.02361 null
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments Chenyu Zhang et.al. 2504.02918 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792 null
2025-04-03 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model Shengjun Zhang et.al. 2504.02764 null
2025-04-03 ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer Jiayi Gao et.al. 2504.02451 link
2025-04-03 SkyReels-A2: Compose Anything in Video Diffusion Transformers Zhengcong Fei et.al. 2504.02436 link
2025-04-03 OmniCam: Unified Multimodal Video Generation via Camera Control Xiaoda Yang et.al. 2504.02312 null
2025-04-03 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang et.al. 2504.01956 null
2025-04-03 Loong: Generating Minute-level Long Videos with Autoregressive Language Models Yuqing Wang et.al. 2410.02757 null
2025-04-02 Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet Sebastian Barros et.al. 2504.03752 null
2025-04-02 WorldPrompter: Traversable Text-to-Scene Generation Zhaoyang Zhang et.al. 2504.02045 null
2025-04-02 Towards Physically Plausible Video Generation via VLM Planning Xindi Yang et.al. 2503.23368 null
2025-04-01 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Junhao Cheng et.al. 2504.01014 link
2025-04-01 WorldScore: A Unified Evaluation Benchmark for World Generation Haoyi Duan et.al. 2504.00983 null
2025-04-01 DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding Chong Li et.al. 2504.00432 null
2025-04-01 HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation Boyuan Wang et.al. 2503.24026 null
2025-04-01 On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices Bosung Kim et.al. 2503.23796 link
2025-03-31 GazeLLM: Multimodal LLMs incorporating Human Visual Attention Jun Rekimoto et.al. 2504.00221 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 null
2025-03-31 JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation Fangda Chen et.al. 2503.23951 null
2025-03-31 HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Kun Liu et.al. 2503.23715 null
2025-03-30 VideoGen-Eval: Agent-based System for Video Generation Evaluation Yuhang Yang et.al. 2503.23452 link
2025-03-30 JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Kai Liu et.al. 2503.23377 null
2025-03-30 MoCha: Towards Movie-Grade Talking Character Synthesis Cong Wei et.al. 2503.23307 null
2025-03-30 SketchVideo: Sketch-based Video Generation and Editing Feng-Lin Liu et.al. 2503.23284 null
2025-03-29 Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models Prin Phunyaphibarn et.al. 2503.20240 null
2025-03-28 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Jangho Park et.al. 2503.22622 null
2025-03-28 EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation Hadrien Reynaud et.al. 2503.22357 null
2025-03-28 CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving Yishen Ji et.al. 2503.22231 null
2025-03-27 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang et.al. 2503.21781 null
2025-03-27 Exploring the Evolution of Physics Cognition in Video Generation: A Survey Minghui Lin et.al. 2503.21765 link
2025-03-27 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Dian Zheng et.al. 2503.21755 link
2025-03-27 Audio-driven Gesture Generation via Deviation Feature in the Latent Space Jiahui Chen et.al. 2503.21616 null
2025-03-27 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Jinwei Qi et.al. 2503.21144 null
2025-03-26 Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations Haitong Liu et.al. 2503.21824 link
2025-03-26 Synthetic Video Enhances Physical Fidelity in Video Synthesis Qi Zhao et.al. 2503.20822 null
2025-03-26 RecTable: Fast Modeling Tabular Data with Rectified Flow Masane Fuchi et.al. 2503.20731 link
2025-03-26 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports Xiangwen Zhang et.al. 2503.20654 null
2025-03-26 GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving Lloyd Russell et.al. 2503.20523 null
2025-03-26 VPO: Aligning Text-to-Video Generation Models with Prompt Optimization Jiale Cheng et.al. 2503.20491 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 Video Motion Graphs Haiyang Liu et.al. 2503.20218 null
2025-03-26 Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing Jaihoon Kim et.al. 2503.19385 null
2025-03-26 EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models Yufei Cai et.al. 2503.19369 link
2025-03-25 Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors Yuke Lou et.al. 2503.20118 null
2025-03-25 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals Stefan Stojanov et.al. 2503.19953 null
2025-03-25 FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling Qiusheng Huang et.al. 2503.19940 null
2025-03-25 FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Xuan Ju et.al. 2503.19907 null
2025-03-25 Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation Tianhao Qi et.al. 2503.19881 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Haiyu Zhang et.al. 2503.19462 null
2025-03-25 MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation Yukang Lin et.al. 2503.19383 null
2025-03-25 Long-Context Autoregressive Video Modeling with Next-Frame Prediction Yuchao Gu et.al. 2503.19325 link
2025-03-25 Aether: Geometric-Aware Unified World Modeling Aether Team et.al. 2503.18945 null
2025-03-25 AMD-Hummingbird: Towards an Efficient Text-to-Video Model Takashi Isobe et.al. 2503.18559 link
2025-03-25 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942 null
2025-03-24 Training-free Diffusion Acceleration with Bottleneck Sampling Ye Tian et.al. 2503.18940 null
2025-03-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-03-24 Can Text-to-Video Generation help Video-Language Alignment? Luca Zanella et.al. 2503.18507 null
2025-03-24 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen et.al. 2503.18429 null
2025-03-24 Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance Sicong Feng et.al. 2503.18386 null
2025-03-23 LongDiff: Training-Free Long Video Generation in One Go Zhuoling Li et.al. 2503.18150 null
2025-03-23 TransAnimate: Taming Layer Diffusion to Generate RGBA Video Xuewei Chen et.al. 2503.17934 null
2025-03-22 RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation Zhiqiang Yuan et.al. 2503.17735 null
2025-03-21 Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks Bhishma Dedhia et.al. 2503.17539 null
2025-03-21 Position: Interactive Generative Video as Next-Generation Game Engine Jiwen Yu et.al. 2503.17359 null
2025-03-21 AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process Junjie Hu et.al. 2503.17029 null
2025-03-21 Enabling Versatile Controls for Video Diffusion Models Xu Zhang et.al. 2503.16983 link
2025-03-21 SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation Chun-Han Yao et.al. 2503.16396 null
2025-03-20 A Recipe for Generating 3D Worlds From a Single Image Katja Schwarz et.al. 2503.16611 null
2025-03-20 XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu et.al. 2503.16428 link
2025-03-20 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance Quanhao Li et.al. 2503.16421 null
2025-03-20 ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos Haolin Yang et.al. 2503.16400 null
2025-03-20 PoseTraj: Pose-Aware Trajectory Control in Video Diffusion Longbin Ji et.al. 2503.16068 null
2025-03-20 Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models Marc BenedΓ­ San MillΓ‘n et.al. 2503.15996 null
2025-03-20 MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving Haiguang Wang et.al. 2503.15875 link
2025-03-20 VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling Hyojun Go et.al. 2503.15855 null
2025-03-20 VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention Mingzhe Zheng et.al. 2503.15138 null
2025-03-19 Temporal Regularization Makes Your Video Generator Stronger Harold Haodong Chen et.al. 2503.15417 null
2025-03-19 Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models Tingxiu Chen et.al. 2503.14966 link
2025-03-18 MusicInfuser: Making Video Diffusion Listen and Dance Susung Hong et.al. 2503.14505 null
2025-03-18 MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Hongyu Zhang et.al. 2503.14428 null
2025-03-18 Impossible Videos Zechen Bai et.al. 2503.14378 null
2025-03-18 LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models Yu Cheng et.al. 2503.14325 link
2025-03-18 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-03-18 Fast Autoregressive Video Generation with Diagonal Decoding Yang Ye et.al. 2503.14070 null
2025-03-18 AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark Xinhao Xiang et.al. 2503.14064 link
2025-03-17 MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis Shitong Shao et.al. 2503.13319 null
2025-03-17 Language-guided Open-world Video Anomaly Detection Zihao Liu et.al. 2503.13160 null
2025-03-17 Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction Zheyuan Liu et.al. 2503.12953 null
2025-03-17 AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations Quang Trung Truong et.al. 2503.12828 null
2025-03-17 Long-Video Audio Synthesis with Multi-Agent Collaboration Yehang Zhang et.al. 2503.10719 null
2025-03-16 SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs Guibiao Liao et.al. 2503.12535 null
2025-03-16 VMBench: A Benchmark for Perception-Aligned Video Motion Generation Xinran Ling et.al. 2503.10076 link
2025-03-15 ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis Yu Fang et.al. 2503.14526 null
2025-03-15 A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI Paula Andrea PΓ©rez-Toro et.al. 2503.12102 null
2025-03-15 SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering Byeongjun Park et.al. 2503.12024 link
2025-03-14 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Jianhong Bai et.al. 2503.11647 null
2025-03-14 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Ziqin Zhou et.al. 2503.11513 null
2025-03-14 TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao et.al. 2503.11423 null
2025-03-14 Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model Haoyang Huang et.al. 2503.11251 link
2025-03-14 Cross-Modal Learning for Music-to-Music-Video Description Generation Zhuoyuan Mao et.al. 2503.11190 null
2025-03-14 Long Context Tuning for Video Generation Yuwei Guo et.al. 2503.10589 null
2025-03-14 On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis et.al. 2503.09837 null
2025-03-13 CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models Hao He et.al. 2503.10592 null
2025-03-13 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Yufan Deng et.al. 2503.10391 null
2025-03-13 Semantic Latent Motion for Portrait Video Generation Qiyuan Zhang et.al. 2503.10096 null
2025-03-13 UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? Yuanxin Liu et.al. 2503.09949 link
2025-03-13 Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers Yasheng Sun et.al. 2503.09942 null
2025-03-13 VideoMerge: Towards Training-free Long Video Generation Siyang Zhang et.al. 2503.09926 null
2025-03-13 WonderVerse: Extendable 3D Scene Generation with Video Generative Models Hao Feng et.al. 2503.09160 null
2025-03-12 Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework Jing Wang et.al. 2503.10704 null
2025-03-12 LuciBot: Automated Robot Policy Learning from Generated Videos Xiaowen Qiu et.al. 2503.09871 null
2025-03-12 I2V3D: Controllable image-to-video generation with 3D guidance Zhiyuan Zhang et.al. 2503.09733 null
2025-03-12 Accelerating Diffusion Sampling via Exploiting Local Transition Coherence Shangwen Zhu et.al. 2503.09675 null
2025-03-12 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Xiangyu Peng et.al. 2503.09642 link
2025-03-12 PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Chenyu Li et.al. 2503.09595 link
2025-03-12 Unified Dense Prediction of Video Diffusion Lehan Yang et.al. 2503.09344 null
2025-03-12 Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space Jian Zhu et.al. 2503.09215 null
2025-03-12 SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video Chengshu Zhao et.al. 2503.09154 link
2025-03-12 Reangle-A-Video: 4D Video Generation as Video-to-Video Translation Hyeonho Jeong et.al. 2503.09151 null
2025-03-12 $^R$ FLAV: Rolling Flow matching for infinite Audio Video generation Alex Ergasti et.al. 2503.08307 link
2025-03-12 Object-Centric World Model for Language-Guided Manipulation Youngjoon Jeong et.al. 2503.06170 null
2025-03-11 V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video Jianqi Chen et.al. 2503.09631 null
2025-03-11 REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder Yitian Zhang et.al. 2503.08665 null
2025-03-11 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Subin Kim et.al. 2503.08605 null
2025-03-11 WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation Jing Wang et.al. 2503.08153 null
2025-03-11 ObjectMover: Generative Object Movement with Video Prior Xin Yu et.al. 2503.08037 null
2025-03-11 How Can Video Generative AI Transform K-12 Education? Examining Teachers' Perspectives through TPACK and TAM Unggi Lee et.al. 2503.08003 null
2025-03-11 VACE: All-in-One Video Creation and Editing Zeyinzi Jiang et.al. 2503.07598 null
2025-03-11 LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation Quanjian Song et.al. 2503.06508 link
2025-03-10 DreamRelation: Relation-Centric Video Customization Yujie Wei et.al. 2503.07602 null
2025-03-10 AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion Mingzhen Sun et.al. 2503.07418 null
2025-03-10 Automated Movie Generation via Multi-Agent CoT Planning Weijia Wu et.al. 2503.07314 link
2025-03-10 From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers Jiacheng Liu et.al. 2503.06923 link
2025-03-09 VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation Hritik Bansal et.al. 2503.06800 null
2025-03-09 TR-DQ: Time-Rotation Diffusion Quantization Yihua Shao et.al. 2503.06564 null
2025-03-09 QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation Junyi Wu et.al. 2503.06545 link
2025-03-09 Generative Video Bi-flow Chen Liu et.al. 2503.06364 null
2025-03-08 Text2Story: Advancing Video Storytelling with Text Guidance Taewon Kang et.al. 2503.06310 null
2025-03-08 ROCM: RLHF on consistency models Shivanshu Shekhar et.al. 2503.06171 null
2025-03-08 VACT: A Video Automatic Causal Testing System and a Benchmark Haotong Yang et.al. 2503.06163 null
2025-03-08 GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation Ye Tao et.al. 2503.06136 null
2025-03-08 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Runze Zhang et.al. 2503.06053 null
2025-03-08 The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Aoxiong Yin et.al. 2503.04606 link
2025-03-08 Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Nianzu Yang et.al. 2503.03708 link
2025-03-07 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Hongwei Yi et.al. 2503.05978 null
2025-03-07 MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio Xuenan Xu et.al. 2503.05242 link
2025-03-07 Unified Reward Model for Multimodal Understanding and Generation Yibin Wang et.al. 2503.05236 null
2025-03-07 Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos Zhiyu Tan et.al. 2502.21314 null
2025-03-06 Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation Alexey Buzovkin et.al. 2503.04871 link
2025-03-06 FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video Yue Gao et.al. 2503.04720 null
2025-03-06 What Are You Doing? A Closer Look at Controllable Human Video Generation Emanuele Bugliarello et.al. 2503.04666 null
2025-03-05 ProReflow: Progressive Reflow with Decomposed Velocity Lei Ke et.al. 2503.04824 null
2025-03-05 GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Xuanchi Ren et.al. 2503.03751 link
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689 link
2025-03-05 High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights Yuna Kato et.al. 2503.03558 link
2025-03-05 Video Super-Resolution: All You Need is a Video Diffusion Model Zhihao Zhan et.al. 2503.03355 null
2025-03-04 GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning Zhun Mou et.al. 2503.02341 null
2025-03-04 Unified Video Action Model Shuang Li et.al. 2503.00200 null
2025-03-03 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation Wenhao Wang et.al. 2503.01739 link
2025-03-03 VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo et.al. 2503.01107 null
2025-03-03 TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis Menghao Li et.al. 2502.19454 null
2025-03-02 Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think Jie Tian et.al. 2503.00948 link
2025-03-01 Learning to Animate Images from A Few Videos to Portray Delicate Human Actions Haoxin Li et.al. 2503.00276 null
2025-02-28 Training-free and Adaptive Sparse Attention for Efficient Long Video Generation Yifei Xia et.al. 2502.21079 null
2025-02-28 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Xiao Wang et.al. 2502.20811 null
2025-02-28 WorldModelBench: Judging Video Generation Models As World Models Dacheng Li et.al. 2502.20694 null
2025-02-28 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Ke Cao et.al. 2502.14377 null
2025-02-27 Mobius: Text to Seamless Looping Video Generation via Latent Shift Xiuli Bi et.al. 2502.20307 link
2025-02-27 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Sotiris Anagnostidis et.al. 2502.20126 null
2025-02-27 C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation Yuhao Li et.al. 2502.19868 link
2025-02-26 Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis Long Cheng et.al. 2503.01873 null
2025-02-26 Glad: A Streaming Scene Generator for Autonomous Driving Bin Xie et.al. 2503.00045 null
2025-02-26 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode Lingzhou Mu et.al. 2502.19455 null
2025-02-25 SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Jintao Zhang et.al. 2502.18137 link
2025-02-25 ASurvey: Spatiotemporal Consistency in Video Generation Zhiyu Yin et.al. 2502.17863 null
2025-02-24 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-24 Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions Zhong Li et.al. 2502.17119 link
2025-02-21 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Min Zhao et.al. 2502.15894 null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672 link
2025-02-21 LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities Florian Sestak et.al. 2502.12128 link
2025-02-20 Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Sanghyun Yi et.al. 2502.15077 null
2025-02-20 LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection Qingyuan Liu et.al. 2502.14994 null
2025-02-20 Improving the Diffusability of Autoencoders Ivan Skorokhodov et.al. 2502.14831 null
2025-02-20 Designing Parameter and Compute Efficient Diffusion Transformers using Distillation Vignesh Sundaresha et.al. 2502.14226 null
2025-02-19 FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Yunpeng Zhang et.al. 2502.13995 link
2025-02-19 LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation Junchen Fu et.al. 2502.12945 null
2025-02-18 VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Xinlong Chen et.al. 2502.12782 link
2025-02-18 MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation Sihyun Yu et.al. 2502.12632 null
2025-02-17 DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation Zhihang Yuan et.al. 2502.11897 link
2025-02-17 Object-Centric Image to Video Generation with Language Guidance Angel Villar-Corrales et.al. 2502.11655 null
2025-02-17 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Guoqing Ma et.al. 2502.10248 link
2025-02-17 Magic 1-For-1: Generating One Minute Video Clips within One Minute Hongwei Yi et.al. 2502.07701 link
2025-02-16 MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation Michael Fuest et.al. 2502.11234 null
2025-02-16 Phantom: Subject-consistent video generation via cross-modal alignment Lijie Liu et.al. 2502.11079 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-14 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control Teng Li et.al. 2502.10059 null
2025-02-14 GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Hongyin Zhang et.al. 2502.09268 null
2025-02-13 Enhance-A-Video: Better Generated Video for Free Yang Luo et.al. 2502.07508 link
2025-02-12 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Qinghe Wang et.al. 2502.08639 null
2025-02-12 FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis Wonjoon Jin et.al. 2502.08244 null
2025-02-12 Learning Human Skill Generators at Key-Step Levels Yilu Wu et.al. 2502.08234 null
2025-02-12 AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance Zhao Wang et.al. 2502.08189 null
2025-02-12 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling Shuhuai Ren et.al. 2502.07737 null
2025-02-12 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Sixiao Zheng et.al. 2502.07531 null
2024-05-07 LLM-grounded Video Diffusion Models Long Lian et.al. 2309.17444 null
2023-10-12 Echocardiography video synthesis from end diastolic semantic map via diffusion model Phi Nguyen Van et.al. 2310.07131 null
2023-05-30 Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising Fu-Yun Wang et.al. 2305.18264 null
2023-03-21 Latent Video Diffusion Models for High-Fidelity Long Video Generation Yingqing He et.al. 2211.13221 null
2022-07-12 Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis Long Zhuo et.al. 2207.05049 null
2021-12-02 Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image Andrew Liu et.al. 2012.09855 null
2020-11-10 Audeo: Audio Generation for a Silent Performance Video Kun Su et.al. 2006.14348 null
2019-10-15 TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation Fan Yang et.al. 1910.05899 null

(back to top)

TryOn

TryOn

Publish Date Title Authors PDF Code
2025-12-08 Comparing quantum channels using Hermitian-preserving trace-preserving linear maps: A physically meaningful approach Arindam Mitra et.al. 2512.07822 null
2025-12-08 Training-free Clothing Region of Interest Self-correction for Virtual Try-On Shengjie Lu et.al. 2512.07126 null
2025-12-08 VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack Shiji Zhao et.al. 2512.05853 null
2025-12-05 Where to Fly, What to Send: Communication-Aware Aerial Support for Ground Robots Harshil Suthar et.al. 2512.06207 null
2025-12-05 Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer Rong Wang et.al. 2512.05593 null
2025-12-04 Not All Birds Look The Same: Identity-Preserving Generation For Birds Aaron Sun et.al. 2512.04485 null
2025-12-03 Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits Robert Dilworth et.al. 2512.03465 null
2025-12-02 Methods in complete intersections in corank one Satya Mandal et.al. 2512.02373 null
2025-11-30 Asymptotic and nonlinear geometries of Banach spaces and their interactions Florent P. Baudier et.al. 2512.00817 null
2025-11-29 Password-Activated Shutdown Protocols for Misaligned Frontier Agents Kai Williams et.al. 2512.03089 null
2025-11-29 Kicking for Goal or Touch? An Expected Points Framework for Penalty Decisions in Rugby Union Kenny Watts et.al. 2512.00312 null
2025-11-26 On a form of intrinsic optimism in Set Theory M. MuΓ±oz PΓ©rez et.al. 2512.02045 null
2025-11-24 Systematic assessment of the Hubble tension via Bayesian jackknife testing Thomas Hughes et.al. 2511.19341 null
2025-11-24 Solar-GECO: Perovskite Solar Cell Property Prediction with Geometric-Aware Co-Attention Lucas Li et.al. 2511.19263 null
2025-11-24 Can we detect treatment effect waning from time-to-event data? Eni Musta et.al. 2511.19096 null
2025-11-24 Eevee: Towards Close-up High-resolution Video-based Virtual Try-on Jianhao Zeng et.al. 2511.18957 null
2025-11-24 Rethinking Garment Conditioning in Diffusion-based Virtual Try-On Kihyun Na et.al. 2511.18775 null
2025-11-23 Projective deduction of the non-trivial first integral to the Euler problem: an explicit computation Gabriella Pinzari et.al. 2511.18569 null
2025-11-22 Towards a General Framework for HTN Modeling with LLMs Israel Puerta-Merino et.al. 2511.18165 null
2025-11-22 Active Learning with Selective Time-Step Acquisition for PDEs Yegon Kim et.al. 2511.18107 null
2025-11-21 Pre-cache: A Microarchitectural Solution to prevent Meltdown and Spectre Subhash Sethumurugan et.al. 2511.17726 null
2025-11-20 Data-Driven Stellar Spectral Modelling with GSPICE Douglas P. Finkbeiner et.al. 2511.16754 null
2025-11-19 UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment Wei Zhang et.al. 2511.15831 null
2025-11-19 Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution N Dinesh Reddy et.al. 2511.14210 null
2025-11-18 A System Dynamics Approach to Evaluating Sludge Management Strategies in Vinasse Treatment: Cost-Benefit Analysis and Scenario Assessment Agustin Olivares et.al. 2511.14607 null
2025-11-18 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos Dianbing Xi et.al. 2511.12935 null
2025-11-17 Multi-Objective Statistical Model Checking using Lightweight Strategy Sampling (extended version) Pedro R. D'Argenio et.al. 2511.13460 null
2025-11-16 Nonlocal action in Everettian Quantum Mechanics Mordecai Waegell et.al. 2511.12403 null
2025-11-16 RefVTON: person-to-person Try on with Additional Unpaired Visual Reference Liuzhuozheng Li et.al. 2511.00956 null
2025-11-14 Learning Fair Representations with Kolmogorov-Arnold Networks Amisha Priyadarshini et.al. 2511.11767 null
2025-11-14 Discovering Meaningful Units with Visually Grounded Semantics from Image Captions Melika Behjati et.al. 2511.11262 null
2025-11-14 Power Ensemble Aggregation for Improved Extreme Event AI Prediction Julien Collard et.al. 2511.11170 null
2025-11-13 Optimal Welfare in Noncooperative Network Formation under Attack Natan Doubez et.al. 2511.10845 null
2025-11-13 Generating optimal Gravitational-Wave template banks with metric-preserving autoencoders Giovanni Cabass et.al. 2511.10466 null
2025-11-12 Efficiently Transforming Neural Networks into Decision Trees: A Path to Ground Truth Explanations with RENTT Helena Monke et.al. 2511.09299 null
2025-11-12 Food as Soft Power: Taiwanese Gastrodiplomacy on Social Media and Algorithmic Suppression Andrew Yen Chang et.al. 2511.05729 null
2025-11-10 Detecting Suicidal Ideation in Text with Interpretable Deep Learning: A CNN-BiGRU with Attention Mechanism Mohaiminul Islam Bhuiyan et.al. 2511.08636 null
2025-11-10 On maximizing private neighbors in graphs Stephen T. Hedetniemi et.al. 2511.07248 null
2025-11-06 Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Ellis Brown et.al. 2511.04655 null
2025-11-06 IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection Kaveh Eskandari Miandoab et.al. 2511.04528 null
2025-11-06 The truth is no diaper: Human and AI-generated associations to emotional words Ε pela Vintar et.al. 2511.04077 null
2025-11-04 Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement Sanghyun Lee et.al. 2511.05562 null
2025-11-04 FLAME: Flexible and Lightweight Biometric Authentication Scheme in Malicious Environments Fuyi Wang et.al. 2511.02176 null
2025-11-03 Confounding Factors in Relating Model Performance to Morphology Wessel Poelman et.al. 2511.01380 null
2025-11-02 AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs Yubo Wang et.al. 2511.05549 null
2025-11-01 Sparse and nonparametric estimation of equations governing dynamical systems with applications to biology G. Pillonetto et.al. 2511.00579 null
2025-10-31 Quantum-dot single photon source performance with off-resonant pulse preparation schemes Gavin Crowder et.al. 2511.00243 null
2025-10-31 EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs Ali Satvaty et.al. 2511.00192 null
2025-10-31 Consistency Training Helps Stop Sycophancy and Jailbreaks Alex Irpan et.al. 2510.27062 null
2025-10-30 Ring-polymer instanton theory for tunneling between asymmetric wells Marit R. Fiechter et.al. 2510.26592 null
2025-10-29 Heuristic Quantum Advantage with Peaked Circuits Hrant Gharibyan et.al. 2510.25838 null
2025-10-29 Tackling the Algorithmic Control Crisis -- the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents B. Bodo et.al. 2510.25337 null
2025-10-16 ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On Junseo Park et.al. 2509.25749 null
2025-10-09 Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection Yanjie Pan et.al. 2510.07654 null
2025-10-06 AvatarVTON: 4D Virtual Try-On for Animatable Avatars Zicheng Jiang et.al. 2510.04822 null
2025-10-03 DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing Qi Li et.al. 2510.04797 null
2025-10-01 Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset Yannick Hauri et.al. 2510.00633 null
2025-09-29 UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections Zeyu Cai et.al. 2509.24817 null
2025-09-29 ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering Weikai Lin et.al. 2509.21541 null
2025-09-24 InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On Julien Han et.al. 2509.20524 null
2025-09-24 Efficient Encoder-Free Pose Conditioning and Pose Control for Virtual Try-On Qi Li et.al. 2509.20343 null
2025-09-23 Clothing agnostic Pre-inpainting Virtual Try-ON Sehyun Kim et.al. 2509.17654 null
2025-09-21 SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments Ruiyan Wang et.al. 2509.16960 null
2025-09-16 DEFT-VTON: Efficient Virtual Try-On with Consistent Generalised H-Transform Xingzi Xu et.al. 2509.13506 null
2025-09-05 LUIVITON: Learned Universal Interoperable VIrtual Try-ON Cong Cao et.al. 2509.05030 null
2025-09-04 Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image -- Technical Preview Jun-Kun Chen et.al. 2509.04450 null
2025-09-04 Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation Lydia Kin Ching Chau et.al. 2509.02445 null
2025-08-30 IC-Custom: Diverse Image Customization via In-Context Learning Yaowei Li et.al. 2507.01926 null
2025-08-28 Dress&Dance: Dress up and Dance as You Like It - Technical Preview Jun-Kun Chen et.al. 2508.21070 null
2025-08-28 FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models Zheng Chong et.al. 2508.20586 null
2025-08-25 JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on Aowen Wang et.al. 2508.17614 null
2025-08-19 OmniTry: Virtual Try-On Anything without Masks Yutong Feng et.al. 2508.13632 null
2025-08-16 DualFit: A Two-Stage Virtual Try-On via Warping and Synthesis Minh Tran et.al. 2508.12131 null
2025-08-12 StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback Hongbo Ma et.al. 2508.06555 null
2025-08-11 MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization Ankan Deria et.al. 2508.08488 null
2025-08-11 Undress to Redress: A Training-Free Framework for Virtual Try-On Zhiying Li et.al. 2508.07680 null
2025-08-07 One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose Jinxi Liu et.al. 2508.04559 null
2025-08-06 Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Seungyong Lee et.al. 2508.04825 null
2025-08-06 Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis Angang Zhang et.al. 2508.04551 null
2025-08-06 FFHQ-Makeup: Paired Synthetic Makeup Dataset with Facial Consistency Across Multiple Styles Xingchao Yang et.al. 2508.03241 null
2025-08-04 DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework Tongchun Zuo et.al. 2508.02807 null
2025-07-29 From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos Chenjian Gao et.al. 2507.20331 null
2025-07-29 Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism Jun Zheng et.al. 2412.09822 null
2025-07-21 FW-VTON: Flattening-and-Warping for Person-to-Person Virtual Try-on Zheng Wang et.al. 2507.16010 null
2025-07-20 OmniVTON: Training-Free Universal Virtual Try-On Zhaotong Yang et.al. 2507.15037 null
2025-07-11 Scalable and Realistic Virtual Try-on Application for Foundation Makeup with Kubelka-Munk Theory Hui Pang et.al. 2507.07333 null
2025-07-08 TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model Yujie Hu et.al. 2507.05790 null
2025-07-02 FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization Peng Zheng et.al. 2507.01792 null
2025-06-30 KiseKloset: Comprehensive System For Outfit Retrieval, Recommendation, And Try-On Thanh-Tung Phan-Nguyen et.al. 2506.23471 null
2025-06-29 DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On Xiang Xu et.al. 2506.23295 null
2025-06-26 Video Virtual Try-on with Conditional Diffusion Transformer Inpainter Cheng Zou et.al. 2506.21270 null
2025-06-23 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-06-14 Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments Zaiqiang Wu et.al. 2506.12348 null
2025-06-13 HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment Ming Meng et.al. 2505.19638 null
2025-06-12 Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On Zaiqiang Wu et.al. 2506.10468 null
2025-06-06 ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On Jinjuan Wang et.al. 2506.05858 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-01 DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation Xianbing Sun et.al. 2506.00908 null
2025-05-29 VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration Ben Li et.al. 2505.23439 null
2025-05-28 MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on Guangyuan Li et.al. 2505.21325 null
2025-05-27 Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals Davide Lobba et.al. 2505.21062 null
2025-05-26 VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models Hu Xiaobin et.al. 2505.19571 null
2025-05-22 Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction Dong Li et.al. 2505.16980 null
2025-05-22 Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On Siqi Wan et.al. 2505.16977 link
2025-05-15 Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates Ren Li et.al. 2504.08353 link
2025-04-29 Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting Hanxi Liu et.al. 2504.20403 null
2025-04-24 FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model Kaicheng Pang et.al. 2504.17826 null
2025-04-24 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Min Wei et.al. 2504.17414 null
2025-04-21 Shape-Guided Clothing Warping for Virtual Try-On Xiaoyu Han et.al. 2504.15232 link
2025-04-21 Insert Anything: Image Insertion via In-Context Editing in DiT Wensong Song et.al. 2504.15009 null
2025-04-19 Flux Already Knows -- Activating Subject-Driven Image Generation without Training Hao Kang et.al. 2504.11478 link
2025-04-19 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-04-18 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation Fulvio Sanguigni et.al. 2504.14011 null
2025-04-17 Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off Riza Velioglu et.al. 2504.13078 link
2025-04-15 ReZero: Enhancing LLM search ability by trying one-more-time Alan Dao et.al. 2504.11001 null
2025-04-11 VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction Zijian He et.al. 2503.12165 null
2025-04-04 From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images Maliheh Toozandehjani et.al. 2504.03807 null
2025-04-03 MAD: Makeup All-in-One with Cross-Domain Diffusion Model Bo-Kai Ruan et.al. 2504.02545 null
2025-04-01 Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method Shufang Zhang et.al. 2504.00562 null
2025-03-26 ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On Ji Woo Hong et.al. 2503.20418 null
2025-03-26 Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks Hailong Guo et.al. 2501.15891 null
2025-03-25 Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage Zhengwentai Sun et.al. 2503.19486 null
2025-03-20 Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model Yingmao Miao et.al. 2503.16065 null
2025-03-18 Limb-Aware Virtual Try-On Network with Progressive Clothing Warping Shengping Zhang et.al. 2503.14074 link
2025-03-16 Progressive Limb-Aware Virtual Try-On Xiaoyu Han et.al. 2503.12588 link
2025-03-15 ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text Haifeng Ni et.al. 2501.16757 null
2025-03-11 MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input Zhenchen Wan et.al. 2503.08650 null
2025-03-11 RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency Siqi Li et.al. 2501.08682 null
2025-02-20 CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors Donghao Luo et.al. 2502.14373 null
2025-02-05 Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics Xuan Li et.al. 2502.03449 null
2025-02-03 MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer Le Shen et.al. 2502.01626 null
2025-01-26 IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter Xiaojing Zhong et.al. 2501.15616 null
2025-01-26 Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models Spencer Ramsey et.al. 2501.15571 null
2025-01-20 EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process Mostafa Atef et.al. 2501.11776 null
2025-01-20 CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation Zheng Chong et.al. 2501.11325 link
2025-01-17 Disharmony: Forensics using Reverse Lighting Harmonization Philip Wootaek Shin et.al. 2501.10212 null
2025-01-12 ODPG: Outfitting Diffusion with Pose Guided Condition Seohyun Lee et.al. 2501.06769 null
2025-01-10 MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer Junsheng Luan et.al. 2501.03630 null
2025-01-09 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On Shuliang Ning et.al. 2501.05369 null
2025-01-08 Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling Nannan Li et.al. 2501.04666 null
2025-01-07 HYB-VITON: A Hybrid Approach to Virtual Try-On Combining Explicit and Implicit Warping Kosuke Takemoto et.al. 2501.03910 link
2025-01-07 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2024-12-25 DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images Enbo Huang et.al. 2412.18797 null
2024-12-22 PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask Jeongho Kim et.al. 2412.16978 link
2024-12-19 DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On Wengyi Zhan et.al. 2412.14465 null
2024-12-19 FashionComposer: Compositional Fashion Image Generation Sihui Ji et.al. 2412.14168 null
2024-11-18 Try-On-Adapter: A Simple and Flexible Try-On Paradigm Hanzhong Guo et.al. 2411.10187 null
2024-07-18 Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models Phuong Dam et.al. 2403.07371 null
2024-07-18 Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images Aiyu Cui et.al. 2311.16094 null
2024-06-05 GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon Sanhita Pathak et.al. 2406.02184 null
2024-05-28 Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn Sanhita Pathak et.al. 2310.05024 null
2024-05-08 VTON-IT: Virtual Try-On using Image Translation Santosh Adhikari et.al. 2310.04558 null
2024-04-29 Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos Zhengze Xu et.al. 2404.17571 null
2024-04-02 TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On Jiazheng Xing et.al. 2404.00878 null
2023-04-03 Learning Garment DensePose for Robust Warping in Virtual Try-On Aiyu Cui et.al. 2303.17688 null
2021-09-13 Per Garment Capture and Synthesis for Real-time Virtual Try-on Toby Chong et.al. 2109.04654 null
2021-08-25 ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones Shan An et.al. 2108.10515 null
2021-06-01 An Efficient Style Virtual Try on Network for Clothing Business Industry Shanchen Pang et.al. 2105.13183 null
2021-01-14 ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on Gaurav Kuppa et.al. 2012.10495 null
2016-02-22 Issues in the Multiple Try Metropolis mixing L. Martino et.al. 1508.04253 null
2015-02-27 Trying to understand dark matter B. Hoeneisen et.al. 1502.07375 null
2014-05-20 On the flexibility of the design of Multiple Try Metropolis schemes Luca Martino et.al. 1201.0646 null

(back to top)

Visual Edit

Visual Edit

Publish Date Title Authors PDF Code
2025-12-08 OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing Haoyang He et.al. 2512.07826 null
2025-12-08 LongCat-Image Technical Report Meituan LongCat Team et.al. 2512.07584 null
2025-12-08 Unified Video Editing with Temporal Reasoner Xiangpeng Yang et.al. 2512.07469 null
2025-12-08 MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition Xinyu Wei et.al. 2512.07348 null
2025-12-08 AdLift: Lifting Adversarial Perturbations to Safeguard 3D Gaussian Splatting Assets Against Instruction-Driven Editing Ziming Hong et.al. 2512.07247 null
2025-12-08 Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits Masato Ishii et.al. 2512.07209 null
2025-12-05 EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Runjia Li et.al. 2512.06065 null
2025-12-05 EditThinker: Unlocking Iterative Reasoning for Any Image Editor Hongyu Li et.al. 2512.05965 null
2025-12-05 World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty Zhiting Mei et.al. 2512.05927 null
2025-12-05 Edit-aware RAW Reconstruction Abhijith Punnappurath et.al. 2512.05859 null
2025-12-05 InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem Yeobin Hong et.al. 2512.05672 null
2025-12-05 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency Xingxi Yin et.al. 2512.05557 null
2025-12-05 SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling Elisabetta Fedele et.al. 2512.05343 null
2025-12-05 EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Xin He et.al. 2512.04810 null
2025-12-04 DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of Rendering Vsevolod Plohotnuk et.al. 2512.05209 null
2025-12-04 Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models Rowan Bradbury et.al. 2512.05198 null
2025-12-04 Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints Minghan Zhu et.al. 2512.05079 null
2025-12-04 I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models Juntong Wang et.al. 2512.04660 null
2025-12-04 X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale Pei Yang et.al. 2512.04537 null
2025-12-04 RefaΓ§ade: Editing Object with Given Reference Texture Youze Huang et.al. 2512.04534 null
2025-12-04 SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation Xin Liang et.al. 2512.04529 null
2025-12-04 Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation Sidan Zhu et.al. 2512.04426 null
2025-12-04 ViDiC: Video Difference Captioning Jiangtao Wu et.al. 2512.03405 null
2025-12-03 PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design Jiazhe Wei et.al. 2512.04082 null
2025-12-03 DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment Sheng-Hao Liao et.al. 2512.03981 null
2025-12-03 Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence Shuai Yang et.al. 2512.03905 null
2025-12-03 GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces Melis Ocal et.al. 2512.03683 null
2025-12-03 Global-Local Aware Scene Text Editing Fuxiang Yang et.al. 2512.03574 null
2025-12-03 Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models Shojiro Yamabe et.al. 2512.03463 null
2025-12-03 SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation Yu Yuan et.al. 2512.03350 null
2025-12-03 LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization Zhihan Xiao et.al. 2512.02933 null
2025-12-03 Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code Pritam Deka et.al. 2512.02170 null
2025-12-02 PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement Haitian Zheng et.al. 2512.03247 null
2025-12-02 MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues Zichen Liu et.al. 2512.03046 null
2025-12-02 PPTArena: A Benchmark for Agentic PowerPoint Editing Michael Ofengenden et.al. 2512.03042 null
2025-12-02 In-Context Sync-LoRA for Portrait Video Editing Sagi Polaczek et.al. 2512.03013 null
2025-12-02 Are Detectors Fair to Indian IP-AIGC? A Cross-Generator Study Vishal Dubey et.al. 2512.02850 null
2025-12-02 Hear What Matters! Text-conditioned Selective Video-to-Audio Generation Junwon Lee et.al. 2512.02650 null
2025-12-02 PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding Zheng Huang et.al. 2512.02624 null
2025-12-02 PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Junyi Hou et.al. 2512.02589 null
2025-12-01 UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits Keming Ye et.al. 2512.02790 null
2025-12-01 DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction Xia Su et.al. 2512.02263 null
2025-12-01 CoatFusion: Controllable Material Coating in Images Sagie Levy et.al. 2512.02143 null
2025-12-01 Generative Video Motion Editing with 3D Point Tracks Yao-Chih Lee et.al. 2512.02015 null
2025-12-01 TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Zhiheng Liu et.al. 2512.02014 null
2025-12-01 FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing Yucheng Liao et.al. 2512.01755 null
2025-12-01 Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval Xin Wang et.al. 2512.01636 null
2025-12-01 Reversible Inversion for Training-Free Exemplar-guided Image Editing Yuke Li et.al. 2512.01382 null
2025-12-01 Fast Multi-view Consistent 3D Editing with Video Priors Liyi Chen et.al. 2511.23172 null
2025-11-30 Graph Queries from Natural Language using Constrained Language Models and Visual Editing Benedikt Kantz et.al. 2512.00948 null
2025-11-30 Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing Li Yuan et.al. 2512.00881 null
2025-11-30 PanFlow: Decoupled Motion Control for Panoramic Video Generation Cheng Zhang et.al. 2512.00832 null
2025-11-30 Seeing the Wind from a Falling Leaf Zhiyuan Gao et.al. 2512.00762 null
2025-11-30 Charts Are Not Images: On the Challenges of Scientific Chart Editing Shawn Li et.al. 2512.00752 null
2025-11-30 Dynamic-eDiTor: Training-Free Text-Driven 4D Scene Editing with Multimodal Diffusion Transformer Dong In Lee et.al. 2512.00677 null
2025-11-29 NeuroVolve: Evolving Visual Stimuli toward Programmable Neural Objectives Haomiao Chen et.al. 2512.00557 null
2025-11-29 Recognizing Pneumonia in Real-World Chest X-rays with a Classifier Trained with Images Synthetically Generated by Nano Banana Jiachuan Peng et.al. 2512.00428 null
2025-11-29 WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing Kaihang Pan et.al. 2512.00387 null
2025-11-29 POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models Wenshuo Chen et.al. 2512.00369 null
2025-11-29 USB: Unified Synthetic Brain Framework for Bidirectional Pathology-Healthy Generation and Editing Jun Wang et.al. 2512.00269 null
2025-11-28 DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline Rui Zhang et.al. 2511.23377 null
2025-11-28 Vision Bridge Transformer at Scale Zhenxiong Tan et.al. 2511.23199 null
2025-11-28 NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing Zhenyu Xu et.al. 2511.23105 null
2025-11-28 Evaluating the Clinical Impact of Generative Inpainting on Bone Age Estimation Felipe Akio Matsuoka et.al. 2511.23066 null
2025-11-28 JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Yunlong Lin et.al. 2511.23002 null
2025-11-28 MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation Yuta Oshima et.al. 2511.22989 null
2025-11-27 Improving Robotic Manipulation Robustness via NICE Scene Surgery Sajjad Pakdamansavoji et.al. 2511.22777 null
2025-11-27 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Z-Image Team et.al. 2511.22699 null
2025-11-27 Test-time scaling of diffusions with flow maps Amirmojtaba Sabour et.al. 2511.22688 null
2025-11-27 REASONEDIT: Towards Reasoning-Enhanced Image Editing Models Fukun Yin et.al. 2511.22625 null
2025-11-27 Creating Blank Canvas Against AI-enabled Image Forgery Qi Song et.al. 2511.22237 null
2025-11-27 3D-Consistent Multi-View Editing by Diffusion Guidance Josef Bengtson et.al. 2511.22228 null
2025-11-27 G $^2$ VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning Wenbo Hu et.al. 2511.21688 null
2025-11-26 PAT3D: Physics-Augmented Text-to-3D Scene Generation Guying Lin et.al. 2511.21978 null
2025-11-26 Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation Joonhyung Park et.al. 2511.21185 null
2025-11-26 AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control Xinyue Guo et.al. 2511.21146 null
2025-11-26 CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion Dianbing Xi et.al. 2511.21129 null
2025-11-26 FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain YuAn Wang et.al. 2511.21113 null
2025-11-26 MIRA: Multimodal Iterative Reasoning Agent for Image Editing Ziyun Zeng et.al. 2511.21087 null
2025-11-26 MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization Yingjie Xia et.al. 2511.21051 null
2025-11-26 CameraMaster: Unified Camera Semantic-Parameter Control for Photography Retouching Qirui Yang et.al. 2511.21024 null
2025-11-26 From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition Jingxi Chen et.al. 2511.20996 null
2025-11-26 Inversion-Free Style Transfer with Dual Rectified Flows Yingying Deng et.al. 2511.20986 null
2025-11-26 Beyond Realism: Learning the Art of Expressive Composition with StickerNet Haoming Lu et.al. 2511.20957 null
2025-11-25 GaINeR: Geometry-Aware Implicit Network Representation Weronika Jakubowska et.al. 2511.20924 null
2025-11-25 DinoLizer: Learning from the Best for Generative Inpainting Localization Minh Thong Doi et.al. 2511.20722 null
2025-11-25 MotionV2V: Editing Motion in a Video Ryan Burgert et.al. 2511.20640 null
2025-11-25 iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation Zhoujie Fu et.al. 2511.20635 null
2025-11-25 The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment Ziheng Ouyang et.al. 2511.20614 null
2025-11-25 PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding Haoze Zhang et.al. 2511.20562 null
2025-11-25 OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Hao Yu et.al. 2511.20211 null
2025-11-25 UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers Min Zhao et.al. 2511.20123 null
2025-11-25 Clair Obscur: an Illumination-Aware Method for Real-World Image Vectorization Xingyue Lin et.al. 2511.20034 null
2025-11-25 OmniRefiner: Reinforcement-Guided Local Diffusion Refinement Yaoli Liu et.al. 2511.19990 null
2025-11-25 Low-Resolution Editing is All You Need for High-Resolution Editing Junsung Lee et.al. 2511.19945 null
2025-11-25 Are Image-to-Video Models Good Zero-Shot Image Editors? Zechuan Zhang et.al. 2511.19435 null
2025-11-24 Agint: Agentic Graph Compilation for Software Engineering Agents Abhi Chivukula et.al. 2511.19635 null
2025-11-24 Vidi2: Large Multimodal Models for Video Understanding and Creation Vidi Team et.al. 2511.19529 null
2025-11-24 Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction Yun Zhou et.al. 2511.19426 null
2025-11-24 AvatarBrush: Monocular Reconstruction of Gaussian Avatars with Intuitive Local Editing Mengtian Li et.al. 2511.19189 null
2025-11-24 DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection Hai Ci et.al. 2511.19111 null
2025-11-24 Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming Mohammad Nour Al Awad et.al. 2511.18849 null
2025-11-24 NI-Tex: Non-isometric Image-based Garment Texture Generation Hui Shan et.al. 2511.18765 null
2025-11-24 DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving Hongbin Lin et.al. 2511.18713 null
2025-11-24 ObjectAlign: Neuro-Symbolic Object Consistency Verification and Correction Mustafa Munir et.al. 2511.18701 null
2025-11-24 Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation Shristi Das Biswas et.al. 2511.18684 null
2025-11-24 Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers Yiqing Shi et.al. 2511.18673 null
2025-11-23 FlowPortal: Residual-Corrected Flow for Training-Free Video Relighting and Background Replacement Wenshuo Gao et.al. 2511.18346 null
2025-11-23 Point-to-Point: Sparse Motion Guidance for Controllable Video Editing Yeji Song et.al. 2511.18277 null
2025-11-23 MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation Tao Shen et.al. 2511.18262 null
2025-11-22 Video4Edit: Viewing Image Editing as a Degenerate Temporal Process Xiaofan Li et.al. 2511.18131 null
2025-11-22 IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment Bowen Qu et.al. 2511.18055 null
2025-11-22 Paper2SysArch: Structure-Constrained System Architecture Generation from Scientific Papers Ziyi Guo et.al. 2511.18036 null
2025-11-21 Show Me: Unifying Instructional Image and Video Generation with Diffusion Models Yujiang Pu et.al. 2511.17839 null
2025-11-21 Native 3D Editing with Full Attention Weiwei Cai et.al. 2511.17501 null
2025-11-21 Illustrator's Depth: Monocular Layer Index Prediction for Image Decomposition Nissim Maruani et.al. 2511.17454 null
2025-11-21 Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing Suchetan G. Uppur et.al. 2511.17269 null
2025-11-21 PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention Yipeng Chen et.al. 2511.17185 null
2025-11-21 Spanning Tree Autoregressive Visual Generation Sangkyu Lee et.al. 2511.17089 null
2025-11-21 RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation Wenzhuo Sun et.al. 2511.17048 null
2025-11-21 DeltaDeno: Zero-Shot Anomaly Generation via Delta-Denoising Attribution Chaoran Xu et.al. 2511.16920 null
2025-11-21 FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation Yueru He et.al. 2511.14998 null
2025-11-20 WorldGen: From Text to Traversable and Interactive 3D Worlds Dilin Wang et.al. 2511.16825 null
2025-11-20 SVG360: Multi-View SVG Generation with Geometric and Color Consistency from a Single SVG Mengnan Jiang et.al. 2511.16766 null
2025-11-20 Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions Takuya Igaue et.al. 2511.16711 null
2025-11-20 Controllable Layer Decomposition for Reversible Multi-Layer Image Generation Zihao Liu et.al. 2511.16249 null
2025-11-19 One algebra for all : Geometric Algebra methods for neurosymbolic XR scene authoring, animation and neural rendering Manos Kamarianakis et.al. 2511.15398 null
2025-11-19 ChartEditor: A Reinforcement Learning Framework for Robust Chart Editing Liangyu Chen et.al. 2511.15266 null
2025-11-18 InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization Daniel Gilo et.al. 2511.14899 null
2025-11-18 UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning Rui Tian et.al. 2511.14760 null
2025-11-18 Task Addition and Weight Disentanglement in Closed-Vocabulary Models Adam Hazimeh et.al. 2511.14569 null
2025-11-18 ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation Zitong Xu et.al. 2511.14259 null
2025-11-18 InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior Weimin Bai et.al. 2511.14208 null
2025-11-18 UniSER: A Foundation Model for Unified Soft Effects Removal Jingdong Zhang et.al. 2511.14183 null
2025-11-18 Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations Yiqing Shen et.al. 2511.14100 null
2025-11-18 Error-Driven Scene Editing for 3D Grounding in Large Language Models Yue Zhang et.al. 2511.14086 null
2025-11-18 Semantic Context Matters: Improving Conditioning for Autoregressive Models Dongyang Jin et.al. 2511.14063 null
2025-11-18 Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline Rui Zuo et.al. 2511.13442 null
2025-11-18 MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation Junjie Yang et.al. 2511.13135 null
2025-11-17 Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine Xincheng Shuai et.al. 2511.13713 null
2025-11-17 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting Jiangnan Ye et.al. 2511.13684 null
2025-11-17 Language-Guided Invariance Probing of Vision-Language Models Jae Joong Lee et.al. 2511.13494 null
2025-11-17 Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling Adam Hazimeh et.al. 2511.13478 null
2025-11-17 TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing Yuchen Bao et.al. 2511.13399 null
2025-11-17 SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design Yunjie Yu et.al. 2511.13285 null
2025-11-17 Uncovering and Mitigating Transient Blindness in Multimodal Model Editing Xiaoqi Han et.al. 2511.13243 null
2025-11-17 InteractiveGNNExplainer: A Visual Analytics Framework for Multi-Faceted Understanding and Probing of Graph Neural Network Predictions TC Singh et.al. 2511.13160 null
2025-11-17 Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection Lintong Zhang et.al. 2511.12992 null
2025-11-17 Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes Feng Lv et.al. 2511.12932 null
2025-11-17 Generative Photographic Control for Scene-Consistent Video Cinematic Editing Huiqiang Sun et.al. 2511.12921 null
2025-11-16 Catastrophic Forgetting in Kolmogorov-Arnold Networks Mohammad Marufur Rahman et.al. 2511.12828 null
2025-11-16 Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis Zeqin Yu et.al. 2511.12658 null
2025-11-16 Designed to Spread: Generative Approaches to Enhance Information Diffusion Ziqing Qian et.al. 2511.12516 null
2025-11-15 ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks Ruixun Liu et.al. 2511.12267 null
2025-11-15 Mixture of States: Routing Token-Level Dynamics for Multimodal Generation Haozhe Liu et.al. 2511.12207 null
2025-11-15 FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing Kaixiang Yang et.al. 2511.12151 null
2025-11-15 Image-POSER: Reflective RL for Multi-Expert Image Generation and Editing Hossein Mohebbi et.al. 2511.11780 null
2025-11-14 PEtab-GUI: A graphical user interface to create, edit and inspect PEtab parameter estimation problems Paul Jonas Jost et.al. 2511.11515 null
2025-11-14 ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation Kaishen Wang et.al. 2511.11483 null
2025-11-14 WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Wei Chow et.al. 2511.11434 null
2025-11-14 SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing Yichao Tang et.al. 2511.11295 null
2025-11-14 Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing Cong Cao et.al. 2511.11236 null
2025-11-14 On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing Yunyi Ni et.al. 2511.10933 null
2025-11-14 STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data Yongdeuk Seo et.al. 2511.09977 null
2025-11-14 UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation Zhen Yang et.al. 2511.08195 null
2025-11-13 IPCD: Intrinsic Point-Cloud Decomposition Shogo Sato et.al. 2511.09866 null
2025-11-13 AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting Aymen Mir et.al. 2511.09827 null
2025-11-12 SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control Arman Zarei et.al. 2511.09715 null
2025-11-11 RePose-NeRF: Robust Radiance Fields for Mesh Reconstruction under Noisy Camera Poses Sriram Srinivasan et.al. 2511.08545 null
2025-11-11 3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation Yunhong He et.al. 2511.08536 null
2025-11-11 UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Zhengyang Liang et.al. 2511.08521 null
2025-11-11 HardFlow: Hard-Constrained Sampling for Flow-Matching Models via Trajectory Optimization Zeyang Li et.al. 2511.08425 null
2025-11-11 LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning Fengyi Fu et.al. 2511.08251 null
2025-11-11 VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics Daniel Cher et.al. 2511.07744 null
2025-11-09 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Assaf Singer et.al. 2511.08633 null
2025-11-09 AesTest: Measuring Aesthetic Intelligence from Perception to Production Guolong Wang et.al. 2511.06360 null
2025-11-09 RelightMaster: Precise Video Relighting with Multi-plane Light Images Weikang Bian et.al. 2511.06271 null
2025-11-07 On the Brittleness of CLIP Text Encoders Allie Tran et.al. 2511.04247 null
2025-11-07 Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing Zhihui Chen et.al. 2511.00801 null
2025-11-06 Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization Connor Dunlop et.al. 2511.05616 null
2025-11-06 MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers Ali Boudaghi et.al. 2511.04376 null
2025-11-06 Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization Zhejia Cai et.al. 2511.03950 null
2025-11-05 Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks Wenkai Fu et.al. 2511.05598 null
2025-11-05 Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition Jongseo Lee et.al. 2511.03725 null
2025-11-05 Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising Shuangquan Lyu et.al. 2511.03272 null
2025-11-05 ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing Yaosen Chen et.al. 2511.02505 null
2025-11-03 UniREditBench: A Unified Reasoning-based Image Editing Benchmark Feng Han et.al. 2511.01295 null
2025-10-31 BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing Jinsu Kim et.al. 2511.00143 null
2025-10-31 Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing Yijia Wang et.al. 2510.27335 null
2025-10-30 Security Risk of Misalignment between Text and Image in Multi-modal Model Xiaosen Wang et.al. 2510.26105 null
2025-10-29 LGCC: Enhancing Flow Matching Based Text-Guided Image Editing with Local Gaussian Coupling and Context Consistency Fangbing Liu et.al. 2511.01894 null
2025-10-29 SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing Sung-Hoon Yoon et.al. 2510.25970 null
2025-10-29 RegionE: Adaptive Region-Aware Generation for Efficient Image Editing Pengtao Chen et.al. 2510.25590 null
2025-10-29 LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Zeyu Wang et.al. 2510.22946 null
2025-10-28 Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Inclusion AI et.al. 2510.24821 null
2025-10-28 Group Relative Attention Guidance for Image Editing Xuanpu Zhang et.al. 2510.24657 null
2025-10-28 Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models Byeonghu Na et.al. 2510.23974 null
2025-10-27 Autoregressive Styled Text Image Generation, but Make it Reliable Carmine Zaccagnino et.al. 2510.23240 null
2025-10-27 UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization Huixuan Zhang et.al. 2510.23023 null
2025-10-27 VALA: Learning Latent Anchors for Training-Free and Temporally Consistent Zhangkai Wu et.al. 2510.22970 null
2025-10-27 FAME: Fairness-aware Attention-modulated Video Editing Zhangkai Wu et.al. 2510.22960 null
2025-10-27 LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas Guocheng Gordon Qian et.al. 2510.20820 null
2025-10-25 GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation Phillip Mueller et.al. 2510.22337 null
2025-10-24 FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing Or Ronai et.al. 2510.22010 null
2025-10-24 SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation Alec Helbling et.al. 2510.21120 null
2025-10-24 EditInfinity: Image Editing with Binary-Quantized Generative Models Jiahuan Wang et.al. 2510.20217 null
2025-10-24 Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks Kai Zeng et.al. 2510.19195 null
2025-10-23 Positional Encoding Field Yunpeng Bai et.al. 2510.20385 null
2025-10-23 FlowCycle: Pursuing Cycle-Consistent Flows for Text-based Editing Yanghao Wang et.al. 2510.20212 null
2025-10-22 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Yusu Qian et.al. 2510.19808 null
2025-10-21 PICABench: How Far Are We from Physically Realistic Image Editing? Yuandong Pu et.al. 2510.17681 null
2025-10-21 Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback Zongjian Li et.al. 2510.16888 null
2025-10-20 ConsistEdit: Highly Consistent and Precise Training-free Visual Editing Zixin Yin et.al. 2510.17803 null
2025-10-19 Region in Context: Text-condition Image editing with Human-like semantic reasoning Thuy Phuong Vu et.al. 2510.16772 null
2025-10-17 BLIP3o-NEXT: Next Frontier of Native Image Generation Jiuhai Chen et.al. 2510.15857 null
2025-10-17 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Qingyan Bai et.al. 2510.15742 null
2025-10-16 Coupled Diffusion Sampling for Training-Free Multi-View Image Editing Hadi Alzayer et.al. 2510.14981 null
2025-10-16 Learning an Image Editing Model without Image Editing Pairs Nupur Kumari et.al. 2510.14978 null
2025-10-16 In-Context Learning with Unpaired Clips for Instruction-based Video Editing Xinyao Liao et.al. 2510.14648 null
2025-10-15 Edit-Your-Interest: Efficient Video Editing via Feature Most-Similar Propagation Yi Zuo et.al. 2510.13084 null
2025-10-14 UniFusion: Vision-Language Model as Unified Encoder in Image Generation Kevin Li et.al. 2510.12789 null
2025-10-14 Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding Ye Chen et.al. 2510.12256 null
2025-10-14 VIDMP3: Video Editing by Representing Motion with Pose and Position Priors Sandeep Mishra et.al. 2510.12069 null
2025-10-13 IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment Yinan Chen et.al. 2510.11647 null
2025-10-13 Zero-shot Face Editing via ID-Attribute Decoupled Inversion Yang Hou et.al. 2510.11050 null
2025-10-13 GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation Shasha Guo et.al. 2510.11020 null
2025-10-13 DreamMakeup: Face Makeup Customization using Latent Diffusion Models Geon Yeong Park et.al. 2510.10918 null
2025-10-11 EditCast3D: Single-Frame-Guided 3D Editing with Video Propagation and View Selection Huaizhi Qu et.al. 2510.13652 null
2025-10-11 ReMix: Towards a Unified View of Consistent Character Generation and Editing Benjia Zhou et.al. 2510.10156 null
2025-10-11 MultiCOIN: Multi-Modal COntrollable Video INbetweening Maham Tanveer et.al. 2510.08561 null
2025-10-10 Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians Jin-Chuan Shi et.al. 2510.09438 null
2025-10-10 TBStar-Edit: From Image Editing Pattern Shifting to Consistency Enhancement Hao Fang et.al. 2510.04483 null
2025-10-09 FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching Jiacheng Liu et.al. 2510.08669 null
2025-10-09 Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing Rishubh Parihar et.al. 2510.08532 null
2025-10-09 InstructX: Towards Unified Visual Editing with MLLM Guidance Chong Mou et.al. 2510.08485 null
2025-10-09 UniVideo: Unified Understanding, Generation, and Editing for Videos Cong Wei et.al. 2510.08377 null
2025-10-09 InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing Haoran Yu et.al. 2510.08181 null
2025-10-09 Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing Zhentao Zou et.al. 2510.08157 null
2025-10-08 DreamOmni2: Multimodal Instruction-based Editing and Generation Bin Xia et.al. 2510.06679 null
2025-10-07 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Yi Xin et.al. 2510.06308 null
2025-10-07 Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling Young D. Kwon et.al. 2510.06295 null
2025-10-07 Diffusion-Based Image Editing for Breaking Robust Watermarks Yunyi Ni et.al. 2510.05978 null
2025-10-07 When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach Daniel GonzΓ‘lbez-Biosca et.al. 2510.05661 null
2025-10-06 SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder Ronen Kamenetsky et.al. 2510.05081 null
2025-10-05 ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Jay Zhangjie Wu et.al. 2510.04290 null
2025-10-05 Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers Shikang Zheng et.al. 2510.04188 null
2025-10-05 Prompt-to-Prompt: Text-Based Image Editing Via Cross-Attention Mechanisms -- The Research of Hyperparameters and Novel Mechanisms to Enhance Existing Frameworks Linn Bieske et.al. 2510.04034 null
2025-10-04 From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance Ardalan Aryashad et.al. 2510.03906 null
2025-10-04 Rare Text Semantics Were Always There in Your Diffusion Transformer Seil Kang et.al. 2510.03886 null
2025-10-03 DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing Qi Li et.al. 2510.04797 null
2025-10-03 OTR: Synthesizing Overlay Text Dataset for Text Removal Jan Zdenek et.al. 2510.02787 null
2025-10-02 DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Zihan Zhou et.al. 2510.02253 null
2025-10-02 Towards Better Optimization For Listwise Preference in Diffusion Models Jiamu Bai et.al. 2510.01540 null
2025-10-02 VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing Abdelilah Aitrouga et.al. 2509.25998 null
2025-10-01 IMAGEdit: Let Any Subject Transform Fei Shen et.al. 2510.01186 null
2025-10-01 EditTrack: Detecting and Attributing AI-assisted Image Editing Zhengyuan Jiang et.al. 2510.01173 null
2025-10-01 DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models Seunghoo Hong et.al. 2510.00778 null
2025-10-01 CAMILA: Context-Aware Masking for Image Editing with Language Alignment Hyunseung Kim et.al. 2509.19731 null
2025-09-30 EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Keming Wu et.al. 2509.26346 null
2025-09-30 Training-Free Reward-Guided Image Editing via Trajectory Optimal Control Jinho Chang et.al. 2509.25845 null
2025-09-30 Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation Mingyu Kang et.al. 2509.25776 null
2025-09-30 Dragging with Geometry: From Pixels to Geometry-Guided Image Editing Xinyu Pu et.al. 2509.25740 null
2025-09-30 EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling Xin Luo et.al. 2509.23909 null
2025-09-30 FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing Junyi Wu et.al. 2509.22244 null
2025-09-29 Training-Free Multimodal Guidance for Video to Audio Generation Eleonora Grassucci et.al. 2509.24550 null
2025-09-29 Instruction Guided Multi Object Image Editing with Quantity and Layout Consistency Jiaqi Tan et.al. 2509.24514 null
2025-09-29 Latent Visual Reasoning Bangzheng Li et.al. 2509.24251 null
2025-09-28 Visual CoT Makes VLMs Smarter but More Fragile Chunxue Xu et.al. 2509.23789 null
2025-09-28 Seedream 4.0: Toward Next-generation Multimodal Image Generation Team Seedream et.al. 2509.20427 null
2025-09-27 Object-AVEdit: An Object-level Audio-Visual Editing Model Youquan Fu et.al. 2510.00050 null
2025-09-26 EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Zhehao Dong et.al. 2509.22407 null
2025-09-26 SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks Jialiang Li et.al. 2509.21928 null
2025-09-26 Taming Flow-based I2V Models for Creative Video Editing Xianghao Kong et.al. 2509.21917 null
2025-09-26 TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation Qihang Wang et.al. 2509.21905 null
2025-09-25 FreeInsert: Personalized Object Insertion with Geometric and Style Control Yuhong Zhang et.al. 2509.20756 null
2025-09-25 ArtUV: Artist-style UV Unwrapping Yuguang Chen et.al. 2509.20710 null
2025-09-25 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Xuan Ju et.al. 2509.20360 null
2025-09-25 Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation Yuanhuiyi Lyu et.al. 2509.18639 null
2025-09-24 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation Shufan Li et.al. 2509.19244 null
2025-09-23 Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Yanzuo Lu et.al. 2509.18824 null
2025-09-23 GeoRemover: Removing Objects and Their Causal Visual Artifacts Zixin Zhu et.al. 2509.18538 null
2025-09-22 Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance Hongxing Fan et.al. 2509.17757 null
2025-09-20 Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media Zihan Ding et.al. 2509.16811 null
2025-09-20 V-CECE: Visual Counterfactual Explanations via Conceptual Edits Nikolaos Spanos et.al. 2509.16567 null
2025-09-19 Neural Atlas Graphs for Dynamic Scene Decomposition and Editing Jan Philipp Schneider et.al. 2509.16336 null
2025-09-19 Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution Chang Soo Lim et.al. 2509.15781 null
2025-09-18 AutoEdit: Automatic Hyperparameter Tuning for Image Editing Chau Pham et.al. 2509.15031 null
2025-09-18 MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks Mingsong Li et.al. 2509.14638 null
2025-09-18 End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection Fei Wang et.al. 2509.13214 null
2025-09-17 Controllable-Continuous Color Editing in Diffusion Model via Color Mapping Yuqi Yang et.al. 2509.13756 null
2025-09-17 LLM-I: LLMs are Naturally Interleaved Multimodal Creators Zirun Guo et.al. 2509.13642 null
2025-09-16 EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing Tianyu Chen et.al. 2509.13399 null
2025-09-16 Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder Qifei Jia et.al. 2509.12883 null
2025-09-16 Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations Jinjie Shen et.al. 2509.12653 null
2025-09-15 LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Zixin Yin et.al. 2509.12203 null
2025-09-13 EditDuet: A Multi-Agent System for Video Non-Linear Editing Marcelo Sandoval-Castaneda et.al. 2509.10761 null
2025-09-12 Immunizing Images from Text to Image Editing via Adversarial Cross-Attention Matteo Trippodo et.al. 2509.10359 null
2025-09-10 RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts Lauren H. Cooke et.al. 2509.08640 null
2025-09-09 Delta Velocity Rectified Flow for Text-to-Image Editing Gaspard Beaudouin et.al. 2509.05342 null
2025-09-04 Improved 3D Scene Stylization via Text-Guided Generative Image Editing with Region-Based Control Haruo Fujiwara et.al. 2509.05285 null
2025-09-04 Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping Jingyi Lu et.al. 2509.04582 null
2025-09-04 From Editor to Dense Geometry Estimator JiYuan Wang et.al. 2509.04338 null
2025-09-03 Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing Quan Dao et.al. 2509.01984 null
2025-09-02 Fidelity-preserving enhancement of ptychography with foundational text-to-image models Ming Du et.al. 2509.04513 null
2025-09-02 Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination Ziyun Zeng et.al. 2509.01986 null
2025-09-01 O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing Yuqing Chen et.al. 2509.01596 null
2025-09-01 Neural Scene Designer: Self-Styled Semantic Image Manipulation Jianman Lin et.al. 2509.01405 null
2025-08-30 LatentEdit: Adaptive Latent Control for Consistent Semantic Editing Siyi Liu et.al. 2509.00541 null
2025-08-28 Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation Chenfan Qu et.al. 2508.20987 null
2025-08-28 Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent En Ci et.al. 2508.20505 null
2025-08-28 Audio-Guided Visual Editing with Complex Multi-Modal Prompts Hyeonyu Kim et.al. 2508.20379 null
2025-08-27 Not Every Gift Comes in Gold Paper or with a Red Ribbon: Exploring Color Perception in Text-to-Image Models Shay Shomer Chai et.al. 2508.19791 null
2025-08-25 ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models Haitang Feng et.al. 2508.18271 null
2025-08-25 SpotEdit: Evaluating Visually-Guided Image Editing Methods Sara Ghazanfari et.al. 2508.18159 null
2025-08-24 An LLM-LVLM Driven Agent for Iterative and Fine-Grained Image Editing Zihan Liang et.al. 2508.17435 null
2025-08-24 Defending Deepfake via Texture Feature Perturbation Xiao Zhang et.al. 2508.17315 null
2025-08-24 PosBridge: Multi-View Positional Embedding Transplant for Identity-Aware Image Editing Peilin Xiong et.al. 2508.17302 null
2025-08-21 Visual Autoregressive Modeling for Instruction-Guided Image Editing Qingyang Mao et.al. 2508.15772 null
2025-08-20 AnchorSync: Global Consistency Optimization for Long Video Editing Zichi Liu et.al. 2508.14609 null
2025-08-20 DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing Weitao Wang et.al. 2508.14465 null
2025-08-19 Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing Feng-Lin Liu et.al. 2508.13797 null
2025-08-18 Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score Syed Muhmmad Israr et.al. 2508.12718 null
2025-08-18 TimeMachine: Fine-Grained Facial Age Editing with Identity Preservation Yilin Mi et.al. 2508.11284 null
2025-08-18 NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale NextStep Team et.al. 2508.10711 null
2025-08-16 PEdger++: Practical Edge Detection via Assembling Cross Information Yuanbin Fu et.al. 2508.11961 null
2025-08-14 LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters Haomin Zhang et.al. 2508.11074 null
2025-08-14 A Segmentation-driven Editing Method for Bolt Defect Augmentation and Detection Yangjie Xiao et.al. 2508.10509 null
2025-08-14 TweezeEdit: Consistent and Efficient Image Editing with Path Regularization Jianda Mao et.al. 2508.10498 null
2025-08-13 LIA-X: Interpretable Latent Portrait Animator Yaohui Wang et.al. 2508.09959 null
2025-08-12 Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Zeqian Long et.al. 2508.08134 null
2025-08-12 Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation Fangyuan Mao et.al. 2508.07981 null
2025-08-11 X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning Jian Ma et.al. 2508.07607 null
2025-08-11 Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing Joonghyuk Shin et.al. 2508.07519 null
2025-08-10 CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization Youqi Wang et.al. 2508.07413 null
2025-08-10 Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers Xin Ma et.al. 2508.07246 null
2025-08-09 CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing Weiyan Xie et.al. 2508.06937 null
2025-08-09 Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing Shichao Ma et.al. 2508.06916 null
2025-08-08 UGD-IML: A Unified Generative Diffusion-based Framework for Constrained and Unconstrained Image Manipulation Localization Yachun Mi et.al. 2508.06101 null
2025-08-08 DreamVE: Unified Instruction-based Image and Video Editing Bin Xia et.al. 2508.06080 null
2025-08-08 NEP: Autoregressive Image Editing via Next Editing Token Prediction Huimin Wu et.al. 2508.06044 null
2025-08-08 InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow Yiming Gong et.al. 2508.06033 null
2025-08-05 Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation Peiyu Wang et.al. 2508.03320 null
2025-08-05 Zero Shot Domain Adaptive Semantic Segmentation by Synthetic Data Generation and Progressive Adaptation Jun Luo et.al. 2508.03300 null
2025-08-05 LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing Liangyang Ouyang et.al. 2508.03144 null
2025-08-05 UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying Chengyu Bai et.al. 2508.03142 null
2025-08-05 The Promise of RL for Autoregressive Image Editing Saba Ahmadi et.al. 2508.01119 null
2025-08-04 Transport-Guided Rectified Flow Inversion: Improved Image Editing Using Optimal Transport Theory Marian Lupascu et.al. 2508.02363 null
2025-08-04 Qwen-Image Technical Report Chenfei Wu et.al. 2508.02324 null
2025-08-01 Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence Danzhen Fu et.al. 2508.00299 null
2025-08-01 Towards Robust Semantic Correspondence: A Benchmark and Insights Wenyue Chong et.al. 2508.00272 null
2025-08-01 Training-free Geometric Image Editing on Diffusion Models Hanshen Zhu et.al. 2507.23300 null
2025-07-31 UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing Hao Tang et.al. 2507.23278 null
2025-07-29 Low-Cost Test-Time Adaptation for Robust Video Editing Jianhui Wang et.al. 2507.21858 null
2025-07-29 From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos Chenjian Gao et.al. 2507.20331 null
2025-07-28 GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset Yuhan Wang et.al. 2507.21033 null
2025-07-28 ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation Sherry X. Chen et.al. 2507.07317 null
2025-07-25 HQ-SMem: Video Segmentation and Tracking Using Memory Efficient Object Embedding With Selective Update and Self-Supervised Distillation Feedback Elham Soltani Kazemi et.al. 2507.18921 null
2025-07-23 Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling Yi Xin et.al. 2507.17801 null
2025-07-22 ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement Kahim Wong et.al. 2507.16397 null
2025-07-22 Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling Chao Zhou et.al. 2507.16240 null
2025-07-22 LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs Zitong Xu et.al. 2507.16193 null
2025-07-20 Light Future: Multimodal Action Frame Prediction via InstructPix2Pix Zesen Zhong et.al. 2507.14809 null
2025-07-18 NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining Maksim Kuprashevich et.al. 2507.14119 null
2025-07-18 Moodifier: MLLM-Enhanced Emotion-Driven Image Editing Jiarong Ye et.al. 2507.14024 null
2025-07-16 MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing Shreya Kadambi et.al. 2507.13401 null
2025-07-15 EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing Vassilis Sioros et.al. 2507.11096 null
2025-07-14 Sparse Fine-Tuning of Transformers for Generative Tasks Wei Chen et.al. 2507.10855 null
2025-07-14 LayLens: Improving Deepfake Understanding through Simplified Explanations Abhijeet Narang et.al. 2507.10066 null
2025-07-11 FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields Gwanhyeong Koo et.al. 2507.08285 null
2025-07-08 2D Instance Editing in 3D Space Yuhuan Xie et.al. 2507.05819 null
2025-07-07 Neural-Driven Image Editing Pengfei Zhou et.al. 2507.05397 null
2025-07-07 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Chun-Hsiao Yeh et.al. 2507.05259 null
2025-07-07 S $^2$ Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control Xudong Liu et.al. 2507.04584 null
2025-07-04 Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images Yuran Dong et.al. 2507.03402 null
2025-07-04 LACONIC: A 3D Layout Adapter for Controllable Image Creation LΓ©opold Maillard et.al. 2507.03257 null
2025-07-03 From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding Xiangfeng Wang et.al. 2507.02790 null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908 null
2025-07-02 ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation Jimyeong Kim et.al. 2507.01496 null
2025-07-02 QC-OT: Optimal Transport with Quasiconformal Mapping Yuping Lv et.al. 2507.01456 null
2025-07-01 Ovis-U1 Technical Report Guo-Hua Wang et.al. 2506.23044 null
2025-06-30 A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement Gaozheng Pei et.al. 2506.23676 null
2025-06-30 TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity Yuzhuo Chen et.al. 2506.23484 null
2025-06-29 OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions Yuanhao Cai et.al. 2506.23361 null
2025-06-29 Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis Lei-lei Li et.al. 2506.23263 null
2025-06-28 Towards Explainable Bilingual Multimodal Misinformation Detection and Localization Yiwei He et.al. 2506.22930 null
2025-06-28 STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing Junsung Lee et.al. 2506.22868 null
2025-06-27 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy Yuhao Liu et.al. 2506.22432 null
2025-06-27 GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles Mengyi Shan et.al. 2506.21839 null
2025-06-27 DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing Lingling Cai et.al. 2506.20967 null
2025-06-26 Controllable 3D Placement of Objects with Scene-Aware Diffusion Models Mohamed Omran et.al. 2506.21446 null
2025-06-26 Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling Hansam Cho et.al. 2506.21045 null
2025-06-26 M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization Ju-Hyeon Nam et.al. 2506.20922 null
2025-06-26 FaSTA $^*$ : Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Advait Gupta et.al. 2506.20911 null
2025-06-26 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Jiacheng Chen et.al. 2506.17450 null
2025-06-25 EditP23: 3D Editing via Propagation of Image Prompts to Multi-View Roi Bar-On et.al. 2506.20652 null
2025-06-25 Towards Efficient Exemplar Based Image Editing with Multimodal VLMs Avadhoot Jadhav et.al. 2506.20155 null
2025-06-25 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-24 SceneCrafter: Controllable Multi-View Driving Scene Editing Zehao Zhu et.al. 2506.19488 null
2025-06-24 LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Chenjian Gao et.al. 2506.10082 null
2025-06-23 Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models Ilia Beletskii et.al. 2506.19103 null
2025-06-23 Let Your Video Listen to Your Music! Xinyu Zhang et.al. 2506.18881 null
2025-06-23 CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing Dinh-Khoi Vo et.al. 2506.18438 null
2025-06-23 Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction Han Zhang et.al. 2506.18290 null
2025-06-20 FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation Fan Yang et.al. 2506.16806 null
2025-06-19 Arch-Router: Aligning LLM Routing with Human Preferences Co Tran et.al. 2506.16655 null
2025-06-18 VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics Josef KuchaΕ™ et.al. 2506.15903 null
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 link
2025-06-16 AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing Biao Yang et.al. 2506.13301 null
2025-06-15 Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing Zhuoying Li et.al. 2506.13827 null
2025-06-15 ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies Chenglin Wang et.al. 2506.12830 null
2025-06-14 Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts Saemee Choi et.al. 2506.12520 null
2025-06-13 SphereDrag: Spherical Geometry-Aware Panoramic Image Editing Zhiao Feng et.al. 2506.11863 null
2025-06-13 **Consistent Video Editing a

About

πŸŽ“ Update HumanAIGC related papers from ArXiv daily

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%