✅
Tasks39,818🎨
Creativity9,869🎥
Videos942
Get alerts
467,354
179,495
157,293
137,358
125,651
114,800
94,576
82,635
81,772
78,468
65,091
53,798
52,092
51,728
50,450
44,409
42,831
42,030
41,701
41,188
Videos
taaft.com/videosThere are 0 Free AI tools for Videos.
Get alerts
Number of tools
0
Most popular
Roll
Free mode
100% free
Freemium
Free Trial
Models 109
-
By AlibabaQwen3.5-122B-A10B is a larger Mixture-of-Experts Qwen3.5 model with 122B total parameters and 10B active per token, targeting higher peak capability while staying compute-efficient at inference.NewMultimodalReleased 1mo ago
-
By AlibabaQwen3.5-Flash is the speed and cost optimized Qwen3.5 variant designed for high-throughput chat and multimodal prompting with a very long context window.NewMultimodalReleased 1mo ago
-
By ByteDanceSeedance 2.0 is ByteDance's multimodal AI video model that turns text plus image, video and audio references into high-resolution, sound-synced clips, giving creators director-level control over camera, motion, style and multi-shot storytelling.NewMultimodalReleased 1mo ago
-
By InclusionAIMing-flash-omni 2.0 is an open sparse MoE omni-modal model that unifies text, image, video and audio understanding and generation, using a Ling-2.0 Mixture-of-Experts backbone with 100B parameters and about 6B active per token.NewMultimodalReleased 1mo ago
-
By GameGen-XGameGen-X is a diffusion transformer specifically built for open-world game video, generating and interactively controlling characters, environments, and actions in long gameplay clips.NewVideoReleased 1mo ago
-
NewVideoReleased 1mo ago
-
Kling Video 3.0 is Kuaishou's newest AI video model that unifies text, image, audio and reference video in one engine, generating up to 15 second photorealistic clips with native multi-language audio and strong consistency across shots.NewMultimodalReleased 1mo ago
-
NewMultimodalReleased 1mo ago
-
By OpenMOSSOpen source foundation model that jointly generates video and audio in one pass, achieving tightly synchronized lip movements and environment-aware sound effects.NewVideoReleased 1mo ago
-
Vidu Q3 is ShengShu Technology’s long-form AI video model that generates a single 16-second clip with native, synchronized audio and 1080p video in one generation.NewMultimodalReleased 1mo ago
-
By Skywork AILong-form video extension engine that analyzes scene semantics and motion to extend clips with coherent shots, maintaining strong temporal consistency and cinematic storytellingNewMultimodalReleased 2mo ago
-
By xAIGrok Imagine is xAI’s video-audio generative model, exposed through the Imagine API, that turns text or images into cinematic videos and supports text-to-image, text-to-video, image-to-video and rich video editing with strong instruction following, latency and cost performance.NewVideoReleased 2mo ago
-
NewMultimodalReleased 2mo ago
-
By PixversePixVerse V5.6 is PixVerse’s latest video model, upgrading V5.5 with cinema level visuals, more natural multilingual voices, smoother physics aware motion and less warping, while keeping generation speed and cost roughly the same as earlier V5 models.NewVideoReleased 2mo ago
-
By GoogleMedGemma 1.5 4B is Google’s updated 4B-parameter medical vision-language model that improves CT, MRI and histopathology understanding while remaining compute efficient for offline and cloud healthcare text and imaging workflows.NewTextReleased 2mo ago
-
By LTX StudioLTX-2 is Lightricks’ 19B diffusion-based audio-video foundation model that generates synchronized 4K video and stereo audio from text or images, with distilled and LoRA variants for faster local generation.NewVideoReleased 2mo ago
-
MultimodalReleased 3mo ago
-
By GoogleT5Gemma 2 is Google’s next generation Gemma 3 based encoder-decoder family, a lightweight multilingual and multimodal LLM that reads text and images, outputs text, and offers 128K context with tied embeddings and merged attention for efficient on-device deployment.TextReleased 3mo ago
-
By AlibabaWan 2.6 is Alibaba’s latest Wan AI multimodal video model, turning text, images, audio and short reference clips into up to 15 s 1080p videos with native audio sync, multi-shot storytelling, and strong character and style consistency.MultimodalReleased 3mo ago
-
By MeiGen AILongCat-Video-Avatar is Meituan’s audio-driven avatar model built on LongCat-Video, generating super-realistic, lip-synced long videos from audio plus optional text and images, with stable identity, natural motion and support for multi-person scenes.VideoReleased 3mo ago
-
TextReleased 3mo ago
-
By Runway AIRunway GWM-1 is a family of general world models built on Gen-4.5 that generate action-conditioned video in real time, powering explorable simulated environments, conversational avatars and robotics simulators for training and interactive applications.VideoReleased 3mo ago
-
Kling Video 2.6 is Kling AI's latest video model that natively generates video plus dialogue, music and sound effects in one step, turning text or images into 5-10 second 1080p clips with tightly synced audio-visual storytelling for creators and advertisers.VideoReleased 3mo ago
-
By Mistral AIThe largest Ministral 3 model offers frontier text and vision capabilities comparable to larger 24B models. Edge-optimized for single GPU deployment (24GB VRAM in FP8), it delivers state-of-the-art performance for chat, document analysis, and complex reasoning tasks with multilingual support across 40+ languages.TextReleased 3mo ago
-
By Mistral AIBest-in-class text and vision model for edge deployment, optimized for single GPU operation with minimal footprint. Features interleaved sliding-window attention for efficient inference. Ideal for constrained environments, chat interfaces, image/document understanding, and balanced local deployment scenarios.TextReleased 3mo ago
-
By Mistral AIThe smallest yet robust Ministral model, edge-optimized for ultra-low-resource environments. Despite its compact size (~3GB), it provides strong language and vision capabilities, outperforming older 7B models. Runs entirely in browser via WebGPU. Ideal for IoT devices, mobile apps, and offline assistants.TextReleased 3mo ago
-
By PixversePixVerse V5.5 is PixVerse’s audio-visual text and image to video model that generates 5-10 s 1080p multi-shot clips with native speech, music and SFX, improved motion stability and multi-shot camera control for story driven, lip-synced short videos.VideoReleased 3mo ago
-
By Mistral AIMistral 3 is Mistral AI’s next-gen open multimodal, multilingual family, combining small dense Ministral 3 edge models with the frontier Mistral Large 3 MoE to deliver image-aware, long-context language intelligence.TextReleased 3mo ago
-
By Runway AIRunway Gen-4.5 is Runway's latest text-to-video and image-to-video model, ranked #1 on independent benchmarks for motion, realism and prompt adherence, delivering cinematic, physics-aware clips with fine camera, style and timing control for creators and studios.VideoReleased 3mo ago
-
Kling Video O1 is Kling AI’s unified multimodal video model that fuses text-to-video generation with image- and video-based editing, using advanced reasoning and motion control to create short, high-quality cinematic clips in a single workflow.VideoReleased 3mo ago
-
By ByteDanceVidi2 is ByteDance’s second generation large multimodal video model for understanding and creation, adding fine grained spatio temporal grounding, long video retrieval, and video question answering so it can find both the right time ranges and object boxes from natural language queries.VideoReleased 4mo ago
-
By MellisPika 2.5 is an upgraded text and image to video model that delivers sharper detail, smoother motion, better lip-sync and physics, plus more precise control over camera, pacing, and aspect ratios for social and production clips.VideoReleased 4mo ago
-
By TencentHunyuanVideo-1.5 is Tencent's 8.3B-parameter open-source video diffusion model for text-to-video and image-to-video generation, delivering high-quality, stable motion clips while running efficiently on consumer-grade GPUs.VideoReleased 4mo ago
-
SAM 3 is Meta’s third-generation Segment Anything foundation model that performs promptable segmentation and tracking in images and videos, finding all instances of open-vocabulary concepts from text or visual prompts.ImageReleased 4mo ago
-
By Nari LabsDia2 is an open source streaming dialogue TTS model that generates speech in real time from partial text, supports audio conditioning for natural back and forth conversations, and ships 1B and 2B checkpoints under Apache 2.0.AudioReleased 4mo ago
-
Hermes 4.3 is Nous Research’s 36B hybrid reasoning model, based on Seed-OSS-36B, offering long context (up to 512k) and very high helpfulness on RefusalBench while staying locally deployable.TextReleased 4mo ago
-
By BaiduA multimodal MoE model that “looks, reads, and reasons” across images, video, and text. It adds tool use and a Thinking with Images mode, supports long context, and activates about 3B parameters per token for flagship-level VLM quality at practical latency.TextReleased 4mo ago
-
By MeituanLongCat-Video is a long-horizon text and image to video model that keeps identity, style, and motion consistent over extended clips. It supports in-place edits, camera and pacing control, and fast previews to delivery-quality renders.VideoReleased 5mo ago
-
Speed tuned Q2 variant for rapid drafts and cost efficient iteration.VideoReleased 5mo ago
-
Premium Q2 profile with maximum temporal stability and micro detail.VideoReleased 5mo ago
-
Vidu Q2 is an upgraded text and image to video model. It delivers sharper detail, steadier motion, stronger identity and style locking, and precise control over camera and pacing. It supports text to video, image to video, and in-place edits for fast iteration to production-ready clips.VideoReleased 5mo ago
-
By AmazonNova Reel is the video model for text or image to video and in place edits. It delivers stable motion, identity and style locking, and precise control of camera and pacing for clips that drop into real production timelines.VideoReleased 5mo ago
-
By GoogleVeo 3.1 Fast is the speed tuned variant of Veo 3.1. It trades a little peak fidelity for much lower latency and cost, keeping the same controls for camera, motion, and style so teams can iterate rapidly.VideoReleased 5mo ago
-
By GoogleVeo 3.1 is a high fidelity text and image to video model. It delivers sharper detail, steadier motion, stronger identity and style locking, and precise control of camera and pacing for production ready clips.VideoReleased 5mo ago
-
By xAILing-flash-2.0 is a high-speed multilingual instruction model built for very low latency and high throughput. It supports long context, tool and function calling, and clean JSON outputs, which makes it ideal for live chat, voice assistants, and real-time automation.MultimodalReleased 5mo ago
-
VideoReleased 5mo ago
-
By sora AIFlagship text to video profile for complex scenes, realistic physics, and long, stable takes.VideoReleased 5mo ago
-
By OpenAISora 2 is OpenAI’s next-generation text/image-to-video model. It produces sharper, longer, more stable clips with better physics, identity and style locking, and precise control over camera and motion—built for fast iteration and production delivery.VideoReleased 5mo ago
-
By MiniMaxHailuo 2.3 is a high-quality text-to-video model for longer, steadier shots and stronger physicsImageReleased 6mo ago
-
By AlibabaWan 2.5 is the next-gen text-to-video system. It delivers sharper detail, longer and more stable shots, stronger physics, and tighter identity/style locking, with precise control over camera and motion. It supports text→video, image→video, and in-place edits for fast iteration to production-ready clips.VideoReleased 6mo ago
Loading more models...
