nanowhale

nanowhale

nanowhale is a small language-model project that implements a full miniature version of the DeepSeek-V4 architecture. The repository says it includes Multi-Head Latent Attention, Mixture-of-Experts routing, Hyper-Connections, and Multi-Token Prediction, all compressed into an 8-layer model with about 110M parameters, a 129,280-token DeepSeek-V4 tokenizer, and a 2,048-token context window. It includes a nanowhale-100m-base pretrained model and a nanowhale-100m SFT chat model, and is positioned as an open research and learning project with code, configs, tokenizer files, and training scripts included.

Overview

nanowhale is a miniature DeepSeek-V4-style language model project, built to reproduce the core DeepSeek-V4 architecture at about 110M parameters. It is designed as an educational and research-scale model, with both a pretrained base version and an SFT chat version, rather than as a production frontier assistant.

📚Large Language Models 🧠Model training 🦾Ai research

About Hugging Face

Hugging Face is an Artificial Intelligence company that specializes in Machine Learning, Natural Language Processing, and AI Chatbots.

Industry: Software Development

Company Size: 635

Location: San Francisco, California, US

Website: huggingface.co

View Company Profile

Last updated: July 17, 2026

Go to section

Search

Overview

About Hugging Face

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: