BrowserAgent — Learning Human‑Inspired Web Browsing for Effective Information Search

Tao Yu1,2 Zhengbo Zhang1,2 Zhiheng Lyu2 Junhao Gong3 Hongzhu Yi1 Xinming Wang1 Yuxuan Zhou4 Jiabing Yang1 Ping Nie3 Yan Huang1 Wenhu Chen2
1 Chinese Academy of Sciences · 2 University of Waterloo · 3 Peking University · 4 Tsinghua University · 5 Independent Researcher

yutao2025@ia.ac.cn · wenhuchen@uwaterloo.ca

Tao Yu and Zhengbo Zhang: work done during the internship at University of Waterloo.

Web Agents Information Retrieval Human‑Inspired Interaction

Overview

The paper introduces BrowserAgent, a browser‑capable agent that aims to emulate human browsing behaviors to make information search more effective and reliable. It focuses on practical, end‑to‑end navigation: reading pages, following links, issuing queries, and synthesizing answers from web content.

Key Ideas

  • Model web browsing as a sequence of perception → planning → action steps.
  • Ground decisions in the rendered page content and UI structure.
  • Favor human‑like strategies: targeted queries, skim‑then‑drill‑down, source triangulation.
  • Prioritize answerability and attribution using on‑page evidence.

Contributions

  • A browser‑native agent design oriented around human‑inspired search tactics.
  • End‑to‑end workflow for reading, navigating, and aggregating web information.
  • Qualitative guidance for reliable, source‑aware answers with minimal hallucination.

Method (High‑Level)

  • Perception: extract salient snippets, links, and page structure from the DOM.
  • Planning: form next steps (refine query, follow link, scroll, backtrack) to reduce uncertainty.
  • Action: execute safe browser actions; repeat until answer quality is sufficient.
  • Attribution: cite and cross‑check sources before finalizing an answer.

Differences vs. Prior Agents

The figure highlights how BrowserAgent differs from common web agents. It emphasizes grounding decisions in rendered page content, human‑inspired planning routines (skim‑then‑drill‑down, targeted querying, backtracking), and source‑aware answer synthesis to reduce hallucination and improve reliability.

Key differences between BrowserAgent and prior web agents

Figure: Conceptual comparison of BrowserAgent with prior agent designs.

Training and Inference Pipeline

The pipeline illustrates the end‑to‑end flow: perceiving the current page, planning the next action, and executing browser operations during inference, as well as the training/feedback signals that shape these behaviors. The design encourages robust, human‑like browsing strategies that generalize across tasks and sites.

Training and inference pipeline of BrowserAgent

Figure: High‑level training and inference workflow for BrowserAgent.

Experimental Results

We summarize results across representative web‑based information‑seeking tasks. BrowserAgent demonstrates strong end‑to‑end performance and improved answerability with attribution, reflecting gains from human‑inspired planning and evidence grounding.

Experimental results comparing BrowserAgent to baselines

Figure: Summary of experimental results across benchmarks.

For full metrics and setup details, please refer to the paper PDF.

Notes

For full details, metrics, and ablations, please refer to the PDF below. This summary provides a high‑level orientation without claiming specifics beyond the paper.

Citation

@misc{yu2025browseragent,
  title={BrowserAgent — Learning Human-Inspired Web Browsing for Effective Information Search},
  author={Tao Yu and Zhengbo Zhang and Zhiheng Lyu and Junhao Gong and Hongzhu Yi and Xinming Wang and Yuxuan Zhou and Jiabing Yang and Ping Nie and Yan Huang and Wenhu Chen},
  year={2025},
  eprint={2510.10666},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2510.10666}
}

PDF: arXiv:2510.10666