TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

nanowhale

nanowhale is a small language-model project that implements a full miniature version of the DeepSeek-V4 architecture. The repository says it includes Multi-Head Latent Attention, Mixture-of-Experts routing, Hyper-Connections, and Multi-Token Prediction, all compressed into an 8-layer model with about 110M parameters, a 129,280-token DeepSeek-V4 tokenizer, and a 2,048-token context window. It includes a nanowhale-100m-base pretrained model and a nanowhale-100m SFT chat model, and is positioned as an open research and learning project with code, configs, tokenizer files, and training scripts included.
New Text Gen 7
Released: May 4, 2026

Overview

nanowhale is Hugging Face’s miniature DeepSeek-V4-style language model project, built to reproduce the core DeepSeek-V4 architecture at about 110M parameters. It is designed as an educational and research-scale model, with both a pretrained base version and an SFT chat version, rather than as a production frontier assistant.

About Hugging Face

Hugging Face is an Artificial Intelligence company that specializes in Machine Learning, Natural Language Processing, and AI Chatbots.

Industry: Software Development
Company Size: 635
Location: San Francisco, California, US
View Company Profile

Tools using nanowhale

No tools found for this model yet.

Last updated: May 5, 2026
0 AIs selected
Clear selection
#
Name
Task