Home Politics Web Agents Should Use Typed Actions Instead of...
Politics

Web Agents Should Use Typed Actions Instead of Click-Based Browsing

Key Points

arXiv:2602.17245v2 Announce Type: replace Abstract: This position paper argues that building a reliable agentic Web requires shifting from low-level interaction primitives to typed actions supported by a semantic layer. Today's web agents primarily operate through clicks, keystrokes, and DOM manipulation, which leads to brittle long-horizon behavior, high execution cost, and limited auditability. We propose web verbs as a concrete design for this layer.

arXiv:2602.17245v2 Announce Type: replace Abstract: This position paper argues that building a reliable agentic Web requires shifting from low-level interaction primitives to typed actions supported by a semantic layer. Today's web agents primarily operate through clicks, keystrokes, and DOM manipulation, which leads to brittle long-horizon behavior, high execution cost, and limited auditability. We propose web verbs as a concrete design for this layer. A verb exposes a web operation as a typed function with structured inputs, structured outputs, and documented behavior, whether it is backed by a server-side Web API or a maintained client-side workflow. Verb calls can carry preconditions, postconditions, policy tags, and logging hooks, allowing agents to synthesize concise programs with explicit control flow and data flow and to produce checkable execution traces. Using representative case studies, we illustrate how verb-level composition can produce correct, reproducible outcomes, while browser agents using low-level interaction primitives may produce brittle behavior or incorrect reasoning. We conclude with a call to action on standardization, developer tooling, and community processes needed to make this semantic layer deployable and trustworthy at web scale.
DOM (ORG) API (ORG)
Originally published by arXiv CS Read original →