Papers

Filter by company

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

ByteDance

Published on: 2025-12-15 1 author
Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal

Sony Group Corporation (AIBO) / Massachusetts Institute of Technology

Published on: 2025-12-14 1 author
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Kuaishou Technology / Peking University

Published on: 2025-12-14 1 author
Diffusion Language Model Inference with Monte Carlo Tree Search

Amazon / Dartmouth College

Published on: 2025-12-13 1 author
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Kuaishou Technology / Tsinghua University

Published on: 2025-12-12 1 author
SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Published on: 2025-12-11 9 authors
Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Meta Platforms / Harvard University

Published on: 2025-12-11 11 authors
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving

AMD / Columbia University, Yale University

Published on: 2025-12-11 2 authors
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

Sony Group Corporation (AIBO) / Boston University

Published on: 2025-12-11 1 author
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Snap / University of California

Published on: 2025-12-11 1 author
Glance: Accelerating Diffusion Models with 1 Sample

Microsoft / Wissenschaftliche Hochschule für Unternehmensführung

Published on: 2025-12-11 1 author
Sharp Monocular View Synthesis in Less Than a Second

Apple

Published on: 2025-12-11 1 author
On Learning-Curve Monotonicity for Maximum Likelihood Estimators

OpenAI

Published on: 2025-12-11 1 author
Matrix-game 2.0: An open-source real-time and streaming interactive world model

Skywork AI

Published on: 2025-12-10 1 author
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

ByteDance

Published on: 2025-12-10 1 author
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Google / University College London

Published on: 2025-12-10 1 author
PAVAS: Physics-Aware Video-to-Audio Synthesis

Sony Group Corporation (AIBO) / Korea Advanced Institute of Science & Technology

Published on: 2025-12-09 1 author
Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation

Apple / Duke University

Published on: 2025-12-09 1 author
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Snap / Sun Yat-sen University

Published on: 2025-12-09 1 author
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Snap / Sun Yat-sen University

Published on: 2025-12-09 1 author
Process Reward Models That Think

LG Electronics / University of Michigan

Published on: 2025-12-08 1 author
Distribution Matching Variational AutoEncoder

Tencent / Peking University

Published on: 2025-12-08 1 author
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Skywork AI

Published on: 2025-12-08 1 author
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Kuaishou Technology / Tsinghua University

Published on: 2025-12-08 1 author
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents

Moonshot AI / Peking University

Published on: 2025-12-08 1 author
Unsupervised decoding of encoded reasoning using language model interpretability

Anthropic

Published on: 2025-12-06 1 author
EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Meituan / Beihang University

Published on: 2025-12-05 1 author
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Kuaishou Technology

Published on: 2025-12-05 1 author
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

Snap / Rice University

Published on: 2025-12-05 1 author
Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability

Z.ai

Published on: 2025-12-05 1 author
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Microsoft / Beijing Jiaotong University

Published on: 2025-12-05 1 author
Learning to Orchestrate Agents in Natural Language with the Conductor

Published on: 2025-12-04 6 authors
TRINITY: An Evolved LLM Coordinator

Published on: 2025-12-04 6 authors
SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Google / Google DeepMind

Published on: 2025-12-04 1 author
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Snap / Simon Fraser University

Published on: 2025-12-03 1 author
Training LLMs for Honesty via Confessions

OpenAI

Published on: 2025-12-03
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek

Published on: 2025-12-02 1 author
VIGS-SLAM: Visual Inertial Gaussian Splatting SLAM

Microsoft / ETH Zurich

Published on: 2025-12-02 1 author
The Art of Scaling Test-Time Compute for Large Language Models

Microsoft / Indian Institute of Technology Delhi

Published on: 2025-12-01 1 author
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Meta Platforms / King Abdullah University of Science and Technology (KAUST), The University of Hong Kong, University of Waterloo

Published on: 2025-12-01 1 author
The Adoption and Usage of AI Agents: Early Evidence from Perplexity

Perplexity / Harvard University

Published on: 2025-12-01 1 author
ThetaEvolve: Test-time Learning on Open Problems

Microsoft / Carnegie Mellon University, University of California, University of Washington, University of Wisconsin-Madison

Published on: 2025-11-28 1 author
LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models

Microsoft / University of Chinese Academy of Sciences

Published on: 2025-11-28 1 author
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Snap / University of California

Published on: 2025-11-26 1 author
LayerComposer: Multi-Human Personalized Generation via Layered Canvas

Snap / University of Toronto

Published on: 2025-11-25 1 author
UI-CUBE: Enterprise-Grade Computer Use Agent Benchmarking Beyond Task Accuracy to Operational Reliability

UiPath

Published on: 2025-11-21 6 authors
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Apple

Published on: 2025-11-20 1 author
Early Science Acceleration Experiments with GPT-5

OpenAI / Collège de France, Columbia University, Harvard University, Lawrence Livermore National Laboratory, The Jackson Laboratory, University of California, University of Cambridge, University of Oxford, Vanderbilt University

Published on: 2025-11-20 1 author
Anthropic Economic Index report: Uneven geographic and enterprise AI adoption

Anthropic

Published on: 2025-11-19 1 author
Weight-Sparse Transformers Have Interpretable Circuits

OpenAI

Published on: 2025-11-17 1 author

Prev 142 143 144 145 146 147 148 149 150 151 152 Next

Go to section

Search

Papers

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: