ZIP-407: Distributed Training Incentive

Abstract

This proposal defines a token incentive mechanism for contributing compute resources to distributed AI model training on the Zoo network. Contributors register GPU/TPU nodes, receive training task assignments, submit gradient updates with proof-of-computation, and earn ZOO token rewards proportional to their verified contribution. The mechanism uses a commit-reveal scheme for gradient submission to prevent free-riding, and integrates with ZIP-406 (Model Attestation) to ensure training outputs are properly attested. Reward distribution is based on verified computation shares rather than self-reported hardware specs.

Motivation

Training large AI models requires significant compute. Centralized cloud providers charge premium prices and create dependency on a few corporations. The Zoo network can aggregate idle GPU capacity from the community, but contributors need economic incentives:

Cost reduction: Distributed training across community GPUs can reduce training costs by 60-80% compared to centralized cloud providers.
Decentralization: No single entity controls the training infrastructure. Model training cannot be censored or shut down.
Accessibility: Researchers with limited budgets (especially in conservation science) can access training compute through the network.
Fair compensation: GPU owners earn returns on idle hardware. The incentive mechanism ensures honest contributors are rewarded and free-riders are penalized.

Specification

1. Node Registration

Compute contributors register their hardware:

struct ComputeNode {
    address operator;
    bytes32 nodeId;
    string hardwareSpec;          // Standardized hardware descriptor
    uint256 benchmarkScore;       // Verified benchmark result
    uint256 stakeAmount;          // ZOO tokens staked
    bool active;
    uint64 registeredAt;
}

contract ComputeRegistry {
    uint256 public constant MIN_STAKE = 100e18;  // 100 ZOO

    function registerNode(
        bytes32 nodeId,
        string calldata hardwareSpec,
        uint256 benchmarkScore,
        bytes calldata benchmarkProof
    ) external payable {
        require(msg.value >= MIN_STAKE, "Insufficient stake");
        require(
            verifyBenchmark(nodeId, benchmarkScore, benchmarkProof),
            "Invalid benchmark"
        );
        nodes[nodeId] = ComputeNode({
            operator: msg.sender,
            nodeId: nodeId,
            hardwareSpec: hardwareSpec,
            benchmarkScore: benchmarkScore,
            stakeAmount: msg.value,
            active: true,
            registeredAt: uint64(block.timestamp)
        });
    }
}

2. Task Assignment

Training jobs are decomposed into tasks and assigned to registered nodes:

interface TrainingTask {
  taskId: string;
  jobId: string;                   // Parent training job
  modelId: string;                 // ZIP-406 model being trained
  epoch: number;
  batchRange: [number, number];    // Data batch indices
  hyperparameters: Record<string, number>;
  dataShardUri: string;            // Encrypted data shard location
  deadline: number;                // Seconds to complete
  rewardPool: number;              // ZOO allocated to this task
}

The task scheduler assigns work based on node benchmark scores and historical reliability. Nodes with higher scores receive larger batch ranges and proportionally higher rewards.

3. Gradient Submission (Commit-Reveal)

To prevent free-riding (copying another node's gradients), submissions use commit-reveal:

Phase 1: COMPUTE (task deadline)
  Node computes gradients for assigned batch range.
  Node commits: hash(gradients || salt || nodeId)

Phase 2: REVEAL (30 minutes after deadline)
  Node reveals: gradients + salt
  Contract verifies: hash matches commitment

Phase 3: AGGREGATE
  Coordinator aggregates verified gradients.
  Model weights updated.

Nodes that commit but fail to reveal are slashed (5% of stake). Nodes that do not commit forfeit their task reward.

4. Proof of Computation

Nodes prove they performed actual computation through gradient validation:

interface ComputationProof {
  nodeId: string;
  taskId: string;
  gradientHash: string;           // Hash of gradient tensor
  intermediateHashes: string[];   // Hashes at checkpoint steps
  computeTimeMs: number;
  gpuUtilization: number;         // 0.0 - 1.0
  teeAttestation?: string;       // TEE proof if available
}

The coordinator validates proofs by:

Verifying intermediate hashes are consistent with the final gradient
Checking compute time is plausible for the hardware spec and batch size
Cross-referencing a random subset of tasks by re-executing on trusted verifier nodes

5. Reward Distribution

contract TrainingRewards {
    function distributeRewards(
        bytes32 jobId,
        bytes32[] calldata nodeIds,
        uint256[] calldata contributions  // Verified compute shares
    ) external onlyCoordinator {
        uint256 totalContribution = 0;
        for (uint i = 0; i < contributions.length; i++) {
            totalContribution += contributions[i];
        }

        uint256 rewardPool = jobs[jobId].totalReward;
        for (uint i = 0; i < nodeIds.length; i++) {
            uint256 reward = (rewardPool * contributions[i])
                / totalContribution;
            pendingRewards[nodeIds[i]] += reward;
        }

        emit RewardsDistributed(jobId, nodeIds.length, rewardPool);
    }
}

Rewards are proportional to verified computation shares. A node that processes 10% of the total batch range receives 10% of the reward pool, adjusted by a quality multiplier based on gradient validation scores.

6. Slashing Conditions

Violation	Slash Amount	Additional Penalty
Failed to reveal after commit	5% of stake	Task reassigned
Invalid gradients (random noise)	20% of stake	7-day suspension
Copied another node's gradients	50% of stake	30-day ban
Repeated invalid submissions (3+)	100% of stake	Permanent ban

Rationale

Staking requirement: Stake aligns incentives. Nodes with skin in the game are less likely to submit garbage gradients or go offline mid-task.
Commit-reveal for gradients: Without commit-reveal, a lazy node could wait for another node to submit gradients and copy them. The commitment prevents this because gradients must be committed before any are revealed.
Benchmark-based assignment: Self-reported hardware specs are unreliable. Verified benchmarks ensure task assignments match actual capability, preventing nodes from claiming high-end hardware while running on consumer GPUs.
Proportional rewards: Nodes that contribute more compute earn more. This naturally attracts higher-capacity hardware to the network while allowing smaller contributors to participate at a smaller scale.

Security Considerations

Gradient poisoning: A malicious node could submit adversarial gradients to degrade model quality. Mitigation: Byzantine-tolerant aggregation (coordinate-wise median) rejects outlier gradients. Nodes whose gradients are consistently rejected face slashing.
Data leakage: Training data shards contain potentially sensitive conservation data. Mitigation: data shards are encrypted with per-task keys; nodes receive decryption keys only after committing their benchmark proof and stake.
Coordinator compromise: The task coordinator is a privileged role. Mitigation: coordinator is a multisig contract requiring 3-of-5 signatures from the Zoo AI Committee; coordinator actions are timelocked and auditable.
Stake grinding: An attacker could register many low-stake nodes. Mitigation: minimum stake of 100 ZOO per node; total network contribution is weighted by verified benchmark scores, not node count.
Timing attacks: Observing commit timestamps could leak information about which nodes are collaborating. Mitigation: commitments are batched and published together after the compute phase ends.

References

ZIP-0: Zoo Ecosystem Architecture
ZIP-1: Hamiltonian LLMs for Zoo
ZIP-400: Decentralized Semantic Optimization
ZIP-402: Proof of AI Consensus
ZIP-406: Model Attestation Protocol
Dean, J. et al. "Large Scale Distributed Deep Networks." NIPS 2012.
Kairouz, P. et al. "Advances and Open Problems in Federated Learning." FnTML 2021.

Distributed Training Incentive