Kaggle for the eyes of AI agents. Every AI agent struggles to click the right thing. DDX is a permanent open bounty that pays researchers to keep beating the state of the art, with every winning model published prize pool, open models, deterministic scoring.

image.png



1. Introduction: The Vision for Open GUI Grounding

DDX is a Bittensor subnet designed to produce a single commodity: production-grade, edge-deployable vision models that ground natural language instructions to pixel coordinates on real-world graphical interfaces. Our vision is to make GUI grounding a public good rather than a proprietary capability fragmented across frontier labs and individual agent startups.

Every autonomous agent that operates a browser, desktop, or mobile interface depends on one fundamental capability: given a screenshot and an instruction like "click the submit button" or "select the third row in the data table," output the correct pixel coordinates. This single capability gates the reliability of every computer-use agent shipping today, including Claude Opus 4.7 computer use (released April 2026), ChatGPT agent mode (formerly Operator, integrated into ChatGPT in July 2025), Google's Project Mariner, and the dozens of agent startups in recent.

The state of the art has advanced quickly but remains far from solved. On ScreenSpot-Pro, the established professional-software grounding benchmark, frontier models including GPT-5.2 now reach 86.3 percent accuracy, but on harder benchmarks the picture is different. On UI-Vision, the desktop-centric benchmark covering 83 software applications, end-to-end SOTA sits around 36 percent (Phi-Ground, Microsoft Research, July 2025). On OSWorld-G, the fine-grained grounding benchmark from XLang, current open-source SOTA is approximately 68 percent (Qwen3-VL-235B). These harder benchmarks reveal that grounding remains brittle on professional desktop software, on fine-grained UI manipulation, and on long-tail interface patterns. The frontier is moving, not stationary.

DDX inverts the current model. Today every agent startup either pays frontier-lab API costs at scale or trains its own proprietary grounding model from scratch. Both paths waste compute and produce no shared progress. DDX miners train and submit grounding models. Validators score them on rotating held-out screenshots from public benchmarks. The highest-scoring model is published openly and downloadable by any agent developer. The subnet does not produce a product. It produces a continuously improving open model that the entire agent ecosystem can build on, with TAO emissions providing the R&D budget that no single open-source project has been able to sustain.

This proposal details the DDX subnet architecture, including incentive mechanism, miner and validator roles, anti-gaming protections, and the market rationale that positions DDX as foundational infrastructure for the autonomous agent era.

2. Incentive and Mechanism Design

Emission and Reward Logic

DDX operates under Bittensor's current Taoflow emission model (active since November 2025), under which subnets earn TAO emissions based on net TAO inflows from staking activity rather than the previous price-based AMM allocation. Within the subnet, alpha token emissions follow the standard 41 percent miners, 41 percent validators and stakers, 18 percent subnet owner split. Daily network-wide TAO issuance is 3,600 TAO following the December 2025 halving.

DDX uses a steep top-n reward curve rather than strict winner-takes-all. Strict winner-takes-all creates volatile emissions and discourages exploration of architectural alternatives. A steep top-n curve concentrates rewards on the frontier while preserving enough emission flow to second and third place that miners pursuing different approaches (small specialist models, large general models, hybrid architectures) all have economic reason to keep iterating.

The reward function within the miner pool is:

R_i = E_miners × (W_i^k / Σ W_j^k)