Get alerts

Videos

taaft.com/videos

There are 0 Free AI tools for Videos.

Get alerts

Number of tools

Related Tasks✕

Models 109

Gen 3

Qwen 3.5 122B A10B

By Alibaba

Qwen3.5-122B-A10B is a larger Mixture-of-Experts Qwen3.5 model with 122B total parameters and 10B active per token, targeting higher peak capability while staying compute-efficient at inference.

📷Images 🎥Videos 📚Stories

NewMultimodal

Released 1mo ago
Gen 3

Qwen 3.5 Flash

By Alibaba

Qwen3.5-Flash is the speed and cost optimized Qwen3.5 variant designed for high-throughput chat and multimodal prompting with a very long context window.

💬Chatting 📷Images 🎥Videos

NewMultimodal

Released 1mo ago
Gen 3 Seedance

Seedance 2.0

By ByteDance

Seedance 2.0 is ByteDance's multimodal AI video model that turns text plus image, video and audio references into high-resolution, sound-synced clips, giving creators director-level control over camera, motion, style and multi-shot storytelling.

🎥Videos 🖼️Image generation 📚Summaries

NewMultimodal

Released 1mo ago
Gen 2

Ming Flash Omni 2.0

By InclusionAI

Ming-flash-omni 2.0 is an open sparse MoE omni-modal model that unifies text, image, video and audio understanding and generation, using a Ling-2.0 Mixture-of-Experts backbone with 100B parameters and about 6B active per token.

🖼️Image generation 🎵Music 📷Images 🎥Videos

NewMultimodal

Released 1mo ago
Gen 4

GameGen X

By GameGen-X

GameGen-X is a diffusion transformer specifically built for open-world game video, generating and interactively controlling characters, environments, and actions in long gameplay clips.

🎥Videos 📄Writing legal documents 🕹️Adventure games

NewVideo

Released 1mo ago
Gen 4

XMAX X1

By Xmax AI

XMAX X1 is a real-time interactive video model that fuses virtual and real worlds, using the phone camera and touch gestures for millisecond-level, on-device AR-style experiences.

🎥Videos 📢Prompts 📚Summaries 📚Novels

NewVideo

Released 1mo ago
Gen 3 Kling

Kling 3.0

By Kuaishou Technology

Kling Video 3.0 is Kuaishou's newest AI video model that unifies text, image, audio and reference video in one engine, generating up to 15 second photorealistic clips with native multi-language audio and strong consistency across shots.

🎥Videos 📷Images 📚Stories

NewMultimodal

Released 1mo ago
Gen 3 LongCat

LongCat Flash Omni

By Meituan

560B parameter omni-modal MoE model (about 27B active) for real time audio-visual interaction, built on LongCat-Flash with multimodal perception and speech modules.

🏋️Fitness 📷Images 💬Chatting 🎥Videos

NewMultimodal

Released 1mo ago
Gen 3

MOVA

By OpenMOSS

Open source foundation model that jointly generates video and audio in one pass, achieving tightly synchronized lip movements and environment-aware sound effects.

🎥Videos 🔊Text to speech 🎵Music 🎬Animations

NewVideo

Released 1mo ago
Gen 3

Vidu Q3

By ShengShu Technology

Vidu Q3 is ShengShu Technology’s long-form AI video model that generates a single 16-second clip with native, synchronized audio and 1080p video in one generation.

🎥Videos 📽️Image to video 🎤Lip sync videos

NewMultimodal

Released 1mo ago
Gen 3

SkyReels V3

By Skywork AI

Long-form video extension engine that analyzes scene semantics and motion to extend clips with coherent shots, maintaining strong temporal consistency and cinematic storytelling

🎥Videos 🎬Video editing 📚Stories 📝Content

NewMultimodal

Released 2mo ago
Gen 4 Grok

Grok Imagine 1

By xAI

Grok Imagine is xAI’s video-audio generative model, exposed through the Imagine API, that turns text or images into cinematic videos and supports text-to-image, text-to-video, image-to-video and rich video editing with strong instruction following, latency and cost performance.

🎥Videos 📢Prompts 📷Images 👗Fashion

NewVideo

Released 2mo ago
Gen 3 Kimi

Kimi K2.5

By Moonshot AI

Kimi K2.5 is Moonshot’s native multimodal open model, trained on about 15T vision-text tokens to power advanced coding and visual reasoning and to orchestrate self-directed agent swarms with up to 100 sub-agents and 1,500 tool calls.

💻Coding 📝Writing 📷Images 🎥Videos

NewMultimodal

Released 2mo ago
Gen 4 PixVerse

PixVerse v5.6

By Pixverse

PixVerse V5.6 is PixVerse’s latest video model, upgrading V5.5 with cinema level visuals, more natural multilingual voices, smoother physics aware motion and less warping, while keeping generation speed and cost roughly the same as earlier V5 models.

🎥Videos 📷Images 🔧Code optimization

NewVideo

Released 2mo ago
Gen 7 Gemma

MedGemma 1.5

By Google

MedGemma 1.5 4B is Google’s updated 4B-parameter medical vision-language model that improves CT, MRI and histopathology understanding while remaining compute efficient for offline and cloud healthcare text and imaging workflows.

🌍Internationalization 🎥Videos 📚Stories

NewText

Released 2mo ago
Gen 4 LTX

LTX 2

By LTX Studio

LTX-2 is Lightricks’ 19B diffusion-based audio-video foundation model that generates synchronized 4K video and stereo audio from text or images, with distilled and LoRA variants for faster local generation.

🎥Videos 🔍SEO content 🎵Music 📽️Image to video

NewVideo

Released 2mo ago
Gen 3 Seed

Seed 1.8

By ByteDance

General-purpose agentic model that unifies LLM and VLM abilities with search, code execution and GUI control, designed to perceive text and images, reason over long contexts and autonomously carry out multi-step tasks.

🤖Task automation 💻Coding 📷Images 🎥Videos

Multimodal

Released 3mo ago
Gen 7 Gemma

T5Gemma 2

By Google

T5Gemma 2 is Google’s next generation Gemma 3 based encoder-decoder family, a lightweight multilingual and multimodal LLM that reads text and images, outputs text, and offers 128K context with tied embeddings and merged attention for efficient on-device deployment.

📷Images 💻Coding 📝Writing 🎥Videos

Text

Released 3mo ago
Gen 3 Wan

Wan 2.6

By Alibaba

Wan 2.6 is Alibaba’s latest Wan AI multimodal video model, turning text, images, audio and short reference clips into up to 15 s 1080p videos with native audio sync, multi-shot storytelling, and strong character and style consistency.

🎥Videos 🖼️Image generation 📷Images

Multimodal

Released 3mo ago
Gen 4 LongCat

LongCat Video Avatar

By MeiGen AI

LongCat-Video-Avatar is Meituan’s audio-driven avatar model built on LongCat-Video, generating super-realistic, lip-synced long videos from audio plus optional text and images, with stable identity, natural motion and support for multi-person scenes.

🎥Video avatars 🎥Videos

Video

Released 3mo ago
Gen 4

Molmo 2

By Ai2

Molmo 2 is AI2’s open 4B/7B/8B multimodal model for images and video, delivering state-of-the-art grounded video QA, pointing and tracking that return coordinates and timestamps for events instead of text-only answers.

🎥Videos 📝Writing 📷Images 🔍Multimodal search

Text

Released 3mo ago
Gen 4 Runway Gen

Runway GWM 1

By Runway AI

Runway GWM-1 is a family of general world models built on Gen-4.5 that generate action-conditioned video in real time, powering explorable simulated environments, conversational avatars and robotics simulators for training and interactive applications.

🎥Videos 💬Conversational avatars

Video

Released 3mo ago
Gen 4 Kling

Kling 2.6

By Kuaishou Technology

Kling Video 2.6 is Kling AI's latest video model that natively generates video plus dialogue, music and sound effects in one step, turning text or images into 5-10 second 1080p clips with tightly synced audio-visual storytelling for creators and advertisers.

🎥Videos 🔊Text to speech 🎵Music

Video

Released 3mo ago
Gen 7 Mistral

Ministral 3 14B

By Mistral AI

The largest Ministral 3 model offers frontier text and vision capabilities comparable to larger 24B models. Edge-optimized for single GPU deployment (24GB VRAM in FP8), it delivers state-of-the-art performance for chat, document analysis, and complex reasoning tasks with multilingual support across 40+ languages.

💬Chatting 📋Document chat 🎥Videos

Text

Released 3mo ago
Gen 7 Mistral

Ministral 3 8B

By Mistral AI

Best-in-class text and vision model for edge deployment, optimized for single GPU operation with minimal footprint. Features interleaved sliding-window attention for efficient inference. Ideal for constrained environments, chat interfaces, image/document understanding, and balanced local deployment scenarios.

📷Images 📝Writing 💬Chatting 🎥Videos

Text

Released 3mo ago
Gen 7 Mistral

Ministral 3 3B

By Mistral AI

The smallest yet robust Ministral model, edge-optimized for ultra-low-resource environments. Despite its compact size (~3GB), it provides strong language and vision capabilities, outperforming older 7B models. Runs entirely in browser via WebGPU. Ideal for IoT devices, mobile apps, and offline assistants.

📷Images 👤Avatars 📢Prompts 🎥Videos

Text

Released 3mo ago
Gen 4 PixVerse

PixVerse V5.5

By Pixverse

PixVerse V5.5 is PixVerse’s audio-visual text and image to video model that generates 5-10 s 1080p multi-shot clips with native speech, music and SFX, improved motion stability and multi-shot camera control for story driven, lip-synced short videos.

🎥Videos 🔊Text to speech 🎥Short videos

Video

Released 3mo ago
Gen 7 Mistral

Mistral Large 3

By Mistral AI

Mistral 3 is Mistral AI’s next-gen open multimodal, multilingual family, combining small dense Ministral 3 edge models with the frontier Mistral Large 3 MoE to deliver image-aware, long-context language intelligence.

📷Images 💻Coding 📝Writing 🎥Videos

Text

Released 3mo ago
Gen 4 Runway Gen

Gen 4.5

By Runway AI

Runway Gen-4.5 is Runway's latest text-to-video and image-to-video model, ranked #1 on independent benchmarks for motion, realism and prompt adherence, delivering cinematic, physics-aware clips with fine camera, style and timing control for creators and studios.

🎥Videos 📽️Image to video

Video

Released 3mo ago
Gen 4 Kling

Kling o1

By Kuaishou Technology

Kling Video O1 is Kling AI’s unified multimodal video model that fuses text-to-video generation with image- and video-based editing, using advanced reasoning and motion control to create short, high-quality cinematic clips in a single workflow.

🎥Videos 🎬Video editing 📽️Image to video

Video

Released 3mo ago
Gen 4

Vidi2

By ByteDance

Vidi2 is ByteDance’s second generation large multimodal video model for understanding and creation, adding fine grained spatio temporal grounding, long video retrieval, and video question answering so it can find both the right time ranges and object boxes from natural language queries.

🎥Videos 🎥Video summaries 🤣Memes

Video

Released 4mo ago
Gen 4 Pika

Pika 2.5

By Mellis

Pika 2.5 is an upgraded text and image to video model that delivers sharper detail, smoother motion, better lip-sync and physics, plus more precise control over camera, pacing, and aspect ratios for social and production clips.

🎥Videos 📷Images 🔄Workflows

Video

Released 4mo ago
Gen 4 Hunyuan

HunyuanVideo-1.5

By Tencent

HunyuanVideo-1.5 is Tencent's 8.3B-parameter open-source video diffusion model for text-to-video and image-to-video generation, delivering high-quality, stable motion clips while running efficiently on consumer-grade GPUs.

🎥Videos 💻Coding 📷Images

Video

Released 4mo ago
Gen 4 SAM

SAM 3

By Meta Platforms

SAM 3 is Meta’s third-generation Segment Anything foundation model that performs promptable segmentation and tracking in images and videos, finding all instances of open-vocabulary concepts from text or visual prompts.

🖼️Image generation 💻Coding 📷Images 🎥Videos

Image

Released 4mo ago
Gen 4

Dia 2

By Nari Labs

Dia2 is an open source streaming dialogue TTS model that generates speech in real time from partial text, supports audio conditioning for natural back and forth conversations, and ships 1B and 2B checkpoints under Apache 2.0.

🔊Text to speech 💬Chatting 🎥Videos 🗒Transcription

Audio

Released 4mo ago
Gen 7

Hermes 4.3

By Nous Research

Hermes 4.3 is Nous Research’s 36B hybrid reasoning model, based on Seed-OSS-36B, offering long context (up to 512k) and very high helpfulness on RefusalBench while staying locally deployable.

📝Writing 💬Chatting 🎥Videos 💼Personal assistant

Text

Released 4mo ago
Gen 7 ERNIE

ERNIE 4.5 VL 28B A3B Thinking

By Baidu

A multimodal MoE model that “looks, reads, and reasons” across images, video, and text. It adds tool use and a Thinking with Images mode, supports long context, and activates about 3B parameters per token for flagship-level VLM quality at practical latency.

📷Images 💻Coding 👤Avatars 🎥Videos

Text

Released 4mo ago
Gen 4 LongCat

LongCat-Video

By Meituan

LongCat-Video is a long-horizon text and image to video model that keeps identity, style, and motion consistent over extended clips. It supports in-place edits, camera and pacing control, and fast previews to delivery-quality renders.

🎥Videos 🎬Video dubbing 🎬Video editing

Video

Released 5mo ago
Gen 4 Vidu

Vidu Q2 Turbo

By ShengShu Technology

Speed tuned Q2 variant for rapid drafts and cost efficient iteration.

🎥Videos

Video

Released 5mo ago
Gen 4 Vidu

Vidu Q2 Pro

By ShengShu Technology

Premium Q2 profile with maximum temporal stability and micro detail.

🎥Videos 🎬Video editing 🎥Video summaries

Video

Released 5mo ago
Gen 4 Vidu

Vidu Q2

By ShengShu Technology

Vidu Q2 is an upgraded text and image to video model. It delivers sharper detail, steadier motion, stronger identity and style locking, and precise control over camera and pacing. It supports text to video, image to video, and in-place edits for fast iteration to production-ready clips.

🎥Videos 🎬Video editing 📽️Image to video 🎥Video summaries

Video

Released 5mo ago
Gen 4 Nova

Nova Reel

By Amazon

Nova Reel is the video model for text or image to video and in place edits. It delivers stable motion, identity and style locking, and precise control of camera and pacing for clips that drop into real production timelines.

🎥Videos 🎬Video editing 📽️Image to video

Video

Released 5mo ago
Gen 4 Veo

Veo 3.1 Fast

By Google

Veo 3.1 Fast is the speed tuned variant of Veo 3.1. It trades a little peak fidelity for much lower latency and cost, keeping the same controls for camera, motion, and style so teams can iterate rapidly.

🎥Videos ✈️Travel 📚Stories 🔄Workflows

Video

Released 5mo ago
Gen 4 Veo

Veo 3.1

By Google

Veo 3.1 is a high fidelity text and image to video model. It delivers sharper detail, steadier motion, stronger identity and style locking, and precise control of camera and pacing for production ready clips.

🎥Videos 📽️Image to video

Video

Released 5mo ago
Gen 3 Grok

Grok Imagine v0.9

By xAI

Ling-flash-2.0 is a high-speed multilingual instruction model built for very low latency and high throughput. It supports long context, tool and function calling, and clean JSON outputs, which makes it ideal for live chat, voice assistants, and real-time automation.

📷Images 📝Writing 🔄Paraphrasing 🎥Videos

Multimodal

Released 5mo ago
Gen 4

VGAN

By VGAN

VGAN, the 2016 Video GAN, generates short clips directly from noise using a spatio temporal GAN. It learns a static background and a moving foreground with a learned mask, then a 3D conv discriminator judges realism in space and time.

🎥Videos

Video

Released 5mo ago
Gen 4 Sora

Sora 2 pro

By sora AI

Flagship text to video profile for complex scenes, realistic physics, and long, stable takes.

🎥Videos 🎬Video editing 📽️Image to video

Video

Released 5mo ago
Gen 3 Sora

Sora 2

By OpenAI

Sora 2 is OpenAI’s next-generation text/image-to-video model. It produces sharper, longer, more stable clips with better physics, identity and style locking, and precise control over camera and motion—built for fast iteration and production delivery.

🎥Videos 🔍SEO content 📚Stories

Video

Released 5mo ago
Gen 4 Hailuo

Hailuo 2.3

By MiniMax

Hailuo 2.3 is a high-quality text-to-video model for longer, steadier shots and stronger physics

🎥Videos 💻Coding 📷Images

Image

Released 6mo ago
Gen 4 Wan

Wan 2.5

By Alibaba

Wan 2.5 is the next-gen text-to-video system. It delivers sharper detail, longer and more stable shots, stronger physics, and tighter identity/style locking, with precise control over camera and motion. It supports text→video, image→video, and in-place edits for fast iteration to production-ready clips.

🎥Videos 📷Images 🎥Presentations 💼Sales

Video

Released 6mo ago

Search

Videos

Related Tasks✕

Models 109

Help

People also viewed

Feedback and Incident Report

Task Options

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: