AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld is an environment for building and benchmarking autonomous computer control agents. It runs on a live Android emulator and contains a highly reproducible benchmark of 116 hand-crafted tasks across 20 apps, which are dynamically instantiated with randomly-generated parameters to create millions of unique task variations. In addition to the built-in tasks, AndroidWorld also supports the popular web benchmark, MiniWoB++.
[project page], [paper]