Papers
-
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
-
OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation
-
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
-
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memorie
-
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
-
The Art of Efficient Reasoning: Data, Reward, and Optimization
-
Haitao Lin
-
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
-
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
-
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
-
Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives
-
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
-
RISE-Video: Can Video Generators Decode Implicit World Rules?
-
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
-
ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution
-
HY3D-Bench: Generation of 3D Assets
-
HunyuanImage 3.0 Technical Report
-
MAIN-VLA: Modeling Abstraction of Intention and eNvironment for Vision-Language-Action Models
-
AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment
-
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
-
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
-
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
-
RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering
-
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
-
UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos
-
Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation
-
One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection
-
DocDancer: Towards Agentic Document-Grounded Information Seeking
-
Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing
-
FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning
-
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
-
A Versatile Multimodal Agent for Multimedia Content Generation
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
-
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection
-
HY-MT1.5 Technical Report
-
D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
-
Streaming Video Instruction Tuning
-
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
-
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
-
AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
-
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation
-
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10
-
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
-
Distribution Matching Variational AutoEncoder
-
HunyuanVideo 1.5 Technical Report
-
Training-Free Group Relative Policy Optimization
-
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
-
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
