ZIPsZoo Proposals
ZIP-0283

Spatial Web Active Inference

Final

Framework for AI agents that operate in spatial environments (AR, VR, physical world) using active inference for navigation, interaction, and decision-making

Type
Standards Track
Category
AI
Author
Zoo Labs Foundation
Created
2025-08-01
spatial-webactive-inferencearvrembodied-aiworld-model

ZIP-0433: Spatial Web Active Inference

Abstract

This proposal specifies a framework for AI agents that operate in spatial environments -- augmented reality, virtual reality, and the physical world -- using active inference for navigation, interaction, and decision-making. The whitepaper (Sections 09 and 12) envisioned an AR application where Zoo animal companions exist in the user's physical environment and metaverse companions that inhabit virtual worlds. This ZIP formalizes the spatial reasoning and active inference mechanisms that make this possible.

Motivation

The whitepaper described two spatial experiences:

  1. AR App (Section 09): Point your phone at a park and see your Zoo animal companion exploring it, identifying real species, and teaching you about the local ecosystem
  2. Metaverse Companion (Section 12): Your Zoo animal lives in a virtual habitat, interacts with other animals, and can be visited in VR

Both require agents that understand 3D space, can navigate environments, interact with objects and other agents, and make spatial decisions. Active inference provides the theoretical framework: agents maintain a generative model of the world, predict the consequences of actions, and choose actions that minimize surprise (free energy).

Specification

Active Inference Framework

Agent State = (Beliefs, Preferences, Policies)

Loop:
  1. Observe: receive sensory input (camera, depth, audio, GPS)
  2. Update: update beliefs about world state (Bayesian inference)
  3. Predict: generate predictions about future states for each policy
  4. Evaluate: score policies by expected free energy (surprise + preference mismatch)
  5. Act: execute the policy with lowest expected free energy
  6. Learn: update generative model based on prediction errors

World Model

The agent maintains a generative model of its environment:

ComponentRepresentationUpdates
GeometryNeural radiance field (NeRF) or 3D Gaussian splattingContinuous from visual input
Semantics3D semantic map (what is each point in space?)From Zen-VL inference
DynamicsPhysics engine + learned dynamics modelFrom observation of motion
AgentsOther agents' positions, states, predicted behaviorsFrom social inference
ObjectsGraspable/interactable objects with affordancesFrom vision + common sense

Spatial Interactions

InteractionDescriptionTechnology
NavigationMove through 3D space toward goalsPathfinding + active inference
Object interactionPick up, examine, use objectsAffordance detection + physics
Social interactionCommunicate with other agentsNLP + gesture + spatial proximity
TeachingGuide user to interesting species/featuresPOI detection + pedagogical planning
ExplorationAutonomously explore and map environmentCuriosity-driven active inference

Research Papers

Implementation

  • hanzo/jin: Jin multimodal framework with 3D/spatial input
  • zoo/app: AR application with spatial agent rendering
  • zoo/core: Metaverse companion system with virtual habitats

Timeline

  • Originated: August 2025 (spatial web active inference specification)
  • Research: zoo-spatial-web-agents published 2024, zen-3d and zen-world published 2025
  • Implementation: AR companion app with spatial agents deployed 2025