ZIPsZoo Proposals
ZIP-0267

Foundation Language Model Architecture (Zen Base)

Final

Architecture specification for the Zen Base foundation model family, from 600M to 480B parameters, using dense transformer with grouped-query attention

Type
Standards Track
Category
AI
Author
Zoo Labs Foundation
Created
2024-01-15
foundation-modelzen-basedense-transformergqalanguage-model

ZIP-0413: Foundation Language Model Architecture (Zen Base)

Abstract

This proposal specifies the architecture of the Zen Base foundation language model family, the core LLM powering all Hanzo and Zoo AI systems. Zen Base models range from 600M to 480B parameters, use dense transformer architecture with grouped-query attention (GQA), RoPE positional encoding, SwiGLU activations, and RMSNorm normalization. The family provides the foundation upon which all specialized Zen models (Code, VL, Live, Guard) are built through continued pre-training and domain-specific fine-tuning.

Motivation

The conservation AI (ZIP-0405) and multimodal systems (ZIP-0408) required a strong language backbone. Rather than relying on third-party models with licensing restrictions, usage limits, and no control over training data or architecture, Zoo and Hanzo co-developed the Zen model family as an open-weight foundation that can be:

  1. Fine-tuned for any domain (conservation, code, medical, legal)
  2. Deployed without per-token costs on self-hosted infrastructure
  3. Trained with conservation-specific data that commercial providers reject
  4. Extended with multimodal capabilities via the Jin architecture (ZIP-0408)

Specification

Architecture

ComponentDesign
ArchitectureDense transformer (decoder-only)
AttentionGrouped-Query Attention (GQA) with 8 KV heads
Positional EncodingRoPE (Rotary Position Embedding)
ActivationSwiGLU (SiLU-gated linear unit)
NormalizationRMSNorm (pre-norm)
Vocabulary152K tokens (byte-level BPE, 100+ languages)
Context32K base, extensible to 1M (ZIP-0426)

Model Scale

VariantParametersHidden DimLayersHeadsContext
Zen-Nano600M1024241632K
Zen-Mini1.5B1536282432K
Zen-Base7B40963232128K
Zen-Pro72B81928064128K
Zen-Max235B122889696256K
Zen-Ultra480B163841201281M

Training

  1. Pre-training: 15T tokens of multilingual web data, books, code, scientific papers
  2. Annealing: Learning rate decay with high-quality data mixture
  3. SFT: Supervised fine-tuning on 2M instruction-following examples
  4. RLHF/GRPO: Preference optimization using GRPO (ZIP-0421)

Key Innovations

  • Zen MoDE: Mixture of Diverse Experts architecture for efficient scaling (ZIP-0414)
  • YaRN-extended context: Native 128K context with YaRN extension to 1M (ZIP-0426)
  • Multilingual from pre-training: 100+ languages supported natively, not through fine-tuning

Research Papers

Implementation

  • hanzo/llm: LLM Gateway serving all Zen Base variants
  • hanzo/candle: Rust inference engine for Zen models
  • hanzo/chat: Chat interface with 14 Zen model variants

Timeline

  • Originated: January 2024 (Zen Base architecture design)
  • Research: zen-base_whitepaper published Q1 2024, zen4_whitepaper published 2025
  • Implementation: Zen Base family deployed via Hanzo LLM Gateway 2024