Papers
-
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View SynthesisTencent / Hong Kong University of Science and Technology, Monash University, Peking University, The Chinese University of Hong Kong
-
Learning-based Multi-View Stereo: A Survey
-
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
-
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
-
General-Purpose User Modeling with Behavioral Logs
-
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
-
Qwen2-Audio Technical Report
-
Qwen2 Technical Report
-
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
-
Helix Extractor 1.0: A Fine-Tuned Large Language Model for Document Information Extraction
-
Harvesting Efficient On-Demand Order Pooling from Skilled Couriers: Enhancing Graph Representation Learning for Refining Real-time Many-to-One Assignments
-
Claude 3.5 Sonnet Model Card Addendum
-
Abliteration
-
Multi-Agent Software Development through Cross-Team Collaboration
-
Efficient Large Language Model Inference with Limited Memory
-
AgentBoard: An Evaluation Platform for LLM-Based Autonomous Agents
-
Creative Text-to-Audio Generation via Synthesizer Programming
-
Retrieval Augmented Generation for Domain-specific Question Answering
-
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
-
Multimodal Chain-of-Thought Reasoning in Language Models
-
Magic-Me: Identity-Specific Video Customized Diffusion
-
Generative Image Dynamics
-
AI at Work Is Here. Now Comes the Hard Part
-
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
-
OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search
-
Distinguishing homolytic versus heterolytic bond dissociation of phenyl sulfonium cations with localized active space methods
-
More, better or different? Trade-offs between group size and competence development in jury theoremsInstitute for Futures Studies, Umeå University
-
Mixtral 8x22B (Cheaper, Better, Faster, Stronger)
-
Gemma: Open Models Based on Gemini Research and Technology
-
ChipNeMo: Domain-Adapted LLMs for Chip Design
-
mPLUG-Owl : Modularization Empowers Large Language Models with Multimodality
-
Word Importance Explains How Prompts Affect Language Model Outputs
-
SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models
-
GPT-4 Technical Report
-
The Claude 3 Model Family: Opus, Sonnet, and Haiku
-
Unifying Linear-Time Attention via Latent Probabilistic Modelling
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
-
SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
-
DINOv2: Learning Robust Visual Features without Supervision
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety TrainingAnthropic / Alignment Research Center, Apart Research, Coefficient Giving, Mila–Quebec AI Institute, Redwood Research, University of Oxford
-
Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
-
Mixtral of Experts
-
Autonomous Procedural Operations (ProcOps
-
Speech Translation with Large Language Models: An Industrial Practice
-
VideoPoet: A Large Language Model for Zero-Shot Video Generation
-
Intel GPU Inference Optimization
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
-
Knowledge Diffusion for Distillation
-
Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision
