A complete browser automation system combining LangChain DeepAgents with agent-browser, featuring Ralph Mode for iterative refinement and a Claude-inspired UI.
ralph-mode-ad.mp4
Browser Use is a full-stack browser automation agent that can:
- Plan and execute complex multi-step browser tasks
- Self-correct using Ralph Mode's iterative refinement
- Control browsers via
agent-browserCLI - Stream live browser viewport via WebSocket
- Show thinking in real-time like Claude
- Request approval for sensitive actions
- Isolate sessions per conversation thread
- Planning & Decomposition: Built-in
write_todostool with parallel vs sequential task identification - File System Tools: Manage large context with filesystem
- Parallel Subagents: Spawn multiple subagents concurrently for independent tasks
- File-Based Results: Subagents write results to files and return paths to avoid context bloat
- Long-term Memory: Persistent state across conversations
- Document Skills: PDF, PPTX, DOCX manipulation
- Browser Skills: Automated browser interactions
- Skill Creator: Guide for building custom skills
- Settings Integration: View and manage skills in UI
- AGENTS.md: Store learned patterns with enforced structure
- USER_PREFERENCES.md: Store user preferences with standardized sections
- Diary: Record task completions and learnings
- Skills: Create reusable workflows
- Guidance Requests: Agent can ask for help when stuck
- Credential Requests: Secure credential input form in UI
- Confirmation Dialogs: Approve/reject risky actions
- Subagent Support: Interrupts from subagents surface to UI
- Script Execution: Run Python/Node scripts
- Package Installation: pip/npm install commands
- Security Tiers: Auto-approve safe, require approval for others, block dangerous
- Unified Root: All paths resolve relative to
.browser-agent/
- Iterative Refinement: Agent retries with improvements
- Self-Reflection: Reviews mistakes and adapts approach
- Persistent Memory: Uses filesystem between iterations
- Configurable Iterations: Set max attempts per task
- Full Browser Control: Navigate, click, fill, type, screenshot
- Element Refs: Clean
@e1syntax for interactions - Session Isolation: Each thread gets its own browser
- Live Streaming: WebSocket viewport streaming
- Browserbase Support: Cloud browser infrastructure for anti-detection and serverless deployments
- Present Files to User: Agent can create and present files (PDFs, images, documents)
- File Preview Panel: Side panel for viewing presented files with inline preview
- Multi-Format Preview: Native preview support for PDF, Markdown, images, text, JSON, CSV, HTML, DOCX, and XLSX files
- Download Support: One-click download for all artifact types
- Type Detection: Automatic icon and preview mode based on file type
- Real-time Status: See active subagents spawned by the main agent
- Polling-based Updates: Status refreshes via backend polling
- Status Cards: Visual indicator showing subagent progress
- Waterfall Thought Process: Hierarchical, nested display of reasoning
- 3-Panel Layout: Resizable threads, chat, and browser panels
- Persistent Browser Preview: Right-side panel with live streaming
- File Preview Panel: Side panel for viewing file artifacts
- Clean Design: Anthropic-inspired minimal color palette
- Smooth Animations: 200ms transitions
All browser tools run in an isolated browser sandbox, so no approval is required for browser actions. The agent can freely navigate, click, fill forms, and execute JavaScript.
Auto-approved browser tools:
- Navigation:
browser_navigate,browser_back,browser_forward,browser_reload - Interaction:
browser_click,browser_fill,browser_type,browser_press_key - Observation:
browser_snapshot,browser_screenshot,browser_get_info,browser_console - State:
browser_is_visible,browser_is_enabled,browser_is_checked,browser_wait - Advanced:
browser_eval(JavaScript execution),browser_close
Bash commands use a tiered approval system (auto-approve safe commands, block dangerous ones).
- Python 3.11+ with
uvorpip - Node.js 18+ with
yarnornpm - OpenAI API or Azure OpenAI access
- agent-browser:
npm install -g agent-browser
git clone <repository-url>
cd Browser-Usecd browser-use-agent
# Create virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
uv pip install -e .
# Configure environment
cp .env.example .env
# Edit .env with your API credentials.env configuration (OpenAI):
USE_AZURE=false
OPENAI_API_KEY=your-openai-api-key
OPENAI_MODEL=gpt-5
TEMPERATURE=1.0.env configuration (Azure OpenAI):
USE_AZURE=true
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com
AZURE_OPENAI_API_KEY=your-api-key-here
OPENAI_API_VERSION=2025-01-01-preview
DEPLOYMENT_NAME=your-gpt-deployment-name
TEMPERATURE=1.0cd ../deep-agents-ui
# Configure environment variables (recommended)
cp .env.local.example .env.local
# Edit .env.local with your settings
# Install dependencies
yarn install
# Start development server
yarn devIn a separate terminal:
cd browser-use-agent
source .venv/bin/activate
langgraph dev --port 2024Navigate to http://localhost:3000 and start chatting!
Environment Variables (browser-use-agent/.env):
| Variable | Description | Default |
|---|---|---|
USE_AZURE |
Use Azure OpenAI | true |
OPENAI_API_KEY |
OpenAI API key (when USE_AZURE=false) | - |
OPENAI_MODEL |
OpenAI model name | gpt-5 |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI endpoint | - |
AZURE_OPENAI_API_KEY |
Azure OpenAI API key | - |
DEPLOYMENT_NAME |
Azure deployment name | gsds-gpt-5 |
TEMPERATURE |
Model temperature | 1.0 |
REASONING_ENABLED |
Enable reasoning API | true |
REASONING_EFFORT |
Reasoning effort level | medium |
AGENT_BROWSER_STREAM_PORT |
Base WebSocket port | 9223 |
USE_CDP |
Connect to existing Chrome | false |
CDP_PORT |
Chrome DevTools port | 9222 |
BROWSERBASE_API_KEY |
Browserbase API key (for cloud browser) | - |
BROWSERBASE_PROJECT_ID |
Browserbase project ID | - |
Environment Variables (.env.local):
NEXT_PUBLIC_DEPLOYMENT_URL=http://127.0.0.1:2024
NEXT_PUBLIC_ASSISTANT_ID=browser-agent
NEXT_PUBLIC_RALPH_MODE_ENABLED=false
NEXT_PUBLIC_RALPH_MAX_ITERATIONS=5
NEXT_PUBLIC_BROWSER_STREAM_PORT=9223Simple Navigation:
Navigate to example.com and tell me the main heading
Form Interaction:
Go to https://httpbin.org/forms/post, fill in the customer name
as "John Doe", fill in the telephone as "555-1234", and submit the form
Research Task (Ralph Mode):
Research the latest features in Next.js 15 and create a summary
with the top 3 most important improvements
# Standard mode
python agent.py --task "Navigate to google.com and search for 'LangChain'"
# Ralph Mode (iterative)
python agent.py --ralph --task "Research browser automation tools" --iterations 5βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (Next.js) β
β βββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β Thread β β Chat Interface β β Browser Panel β β
β β Sidebar β β ββββββββββββββ β β βββββββββββββββββββββββββββββ β β
β β β β β Messages β β β β Live Viewport Stream β β β
β β - Today β β β ββββββββββ β β β β β β β
β β - Yesterdayβ β β βThought β β β β β WebSocket Connection β β β
β β - Older β β β βProcess β β β β β ws://localhost:9223 β β β
β β β β β ββββββββββ β β β β β β β
β β β β β ββββββββββ β β β βββββββββββββββββββββββββββββ β β
β β β β β βTool β β β β β β
β β β β β βCalls β β β β Auto-expand on session start β β
β β β β β ββββββββββ β β β Auto-collapse on session end β β
β βββββββββββββββ β ββββββββββββββ β βββββββββββββββββββββββββββββββββββ β
β 15% β 50% β 35% β
βββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββ
β
HTTP/SSE Stream (LangGraph SDK)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND (LangGraph + Python) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β LangGraph Server (:2024) ββ
β β βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββββββββββββ ββ
β β β State Manager β β Checkpoint DB β β Thread Isolation β ββ
β β β β β (SQLite) β β β ββ
β β β - messages β β β β thread_id β browser_sessionβ ββ
β β β - todos β β Persistent β β thread_id β memory_context β ββ
β β β - files β β across β β thread_id β checkpoint β ββ
β β β - browser β β restarts β β β ββ
β β βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
β βββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββ β
β β DeepAgents Graph β β
β β βββββββββββ ββββββββββββ ββββββββββββ βββββββββββββββββββ β β
β β β Plan βββββΆβ Execute βββββΆβ Reflect βββββΆβ Ralph Iteration β β β
β β β(Todos) β β (Tools) β β(Memory) β β (if enabled) β β β
β β βββββββββββ ββββββββββββ ββββββββββββ βββββββββββββββββββ β β
β β β β β β β β
β β βΌ βΌ βΌ βΌ β β
β β write_todos Browser Tools AGENTS.md Max iterations β β
β β + Bash Tools USER_PREFS.md then return β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββ
β
subprocess (agent-browser CLI)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BROWSER LAYER (agent-browser) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Chromium Instance (Headless) ββ
β β βββββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββββ ββ
β β β Page Control β β Element Refs β β Screencast Stream β ββ
β β β β β β β β ββ
β β β - navigate(url) β β @e1, @e2, @e3... β β JPEG frames β β ββ
β β β - click(@ref) β β from snapshot -i β β WebSocket :9223 β ββ
β β β - fill(@ref) β β β β β ββ
β β β - screenshot() β β Valid per page β β 30fps streaming β ββ
β β βββββββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User Input Browser Viewport
β β²
βΌ β
βββββββββββ HTTP POST βββββββββββββββ subprocess βββββββββββββββββββ
β Next.js β ββββββββββββββΆ β LangGraph β βββββββββββββΆ β agent-browser β
β UI β β Server β β CLI β
βββββββββββ βββββββββββββββ βββββββββββββββββββ
β² β β
β SSE Stream β β
β (messages, todos, tools, β β
β thought, browser_session)β β
ββββββββββββββββββββββββββββββ β
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WebSocket Stream (ws://localhost:9223)
β - JPEG frames (base64)
β - viewport metadata
βΌ
βββββββββββββββββββ
β BrowserPanel β
β Live Preview β
βββββββββββββββββββ
page.tsx (Main Layout)
β
βββ ChatProvider (Context)
β βββ useChat hook
β βββ LangGraph SDK client
β βββ Thread state management
β βββ Message streaming
β βββ Browser session detection
β βββ Error handling
β
βββ ThreadList
β βββ SWR infinite loading
β βββ Time-based grouping
β βββ Interrupt count badge
β
βββ ChatInterface
β βββ ChatMessage[]
β β βββ ThoughtProcess (waterfall display)
β β βββ ToolCallBox (collapsible)
β β βββ SubAgentIndicator
β βββ TodoList (grouped by status)
β βββ FileExplorer
β βββ InputArea
β
βββ ChatWithBrowserPanel
βββ ResizablePanel (chat)
βββ ResizablePanel (browser)
βββ BrowserPanelContent
βββ WebSocket β img[src=base64]
browser_agent.py (Graph Definition)
β
βββ create_browser_agent()
β βββ create_deep_agent()
β βββ Planning node (write_todos)
β βββ Execution node (tools)
β βββ Reflection node (memory)
β βββ Subagent spawning (task tool)
β
βββ Tools
β βββ BROWSER_TOOLS (tools.py)
β β βββ browser_navigate, browser_back, browser_forward, browser_reload
β β βββ browser_click, browser_fill, browser_type, browser_press_key
β β βββ browser_snapshot, browser_screenshot, browser_console
β β βββ browser_is_visible, browser_is_enabled, browser_is_checked
β β βββ browser_wait, browser_eval, browser_close, browser_get_info
β β
β βββ BASH_TOOLS (bash_tool.py)
β β βββ bash_execute (with security tiers)
β β
β βββ HUMAN_TOOLS (human_loop.py)
β β βββ request_human_guidance
β β βββ request_credentials
β β βββ request_confirmation
β β
β βββ REFLECTION_TOOLS (reflection.py)
β β βββ reflect_on_session
β β
β βββ present_file (present_file.py)
β βββ Present generated files to user
β
βββ State (state.py)
βββ messages: BaseMessage[]
βββ todos: Todo[]
βββ files: dict
βββ browser_session: BrowserSession
βββ current_thought: ThoughtProcess
βββ presented_files: List[PresentedFile]
βββ active_subagents: Dict[str, SubagentStatus]
βββ pending_subagent_interrupts: List[SubagentInterrupt]
The .browser-agent/ directory serves as the unified root for all agent operations:
.browser-agent/ # Agent's "home directory"
β
βββ artifacts/ # Generated outputs
β βββ file_outputs/ # User-requested files (PDFs, CSVs, etc.)
β βββ screenshots/ # Browser screenshots
β βββ tool_outputs/ # Large tool results
β
βββ memory/ # Persistent memory
β βββ AGENTS.md # Learned patterns (website, task, error recovery)
β βββ USER_PREFERENCES.md # User preferences and settings
β βββ diary/ # Session completion logs
β
βββ skills/ # Reusable skill definitions
β βββ agent-browser/ # Browser automation skill
β βββ pdf.md # PDF manipulation
β βββ pptx.md # PowerPoint creation
β βββ docx.md # Word document handling
β
βββ checkpoints/ # LangGraph state persistence
β βββ browser_agent.db # SQLite checkpoint database
β
βββ traces/ # Debug traces (optional)
Path Resolution:
Both the DeepAgents FilesystemBackend and bash_execute tool use .browser-agent/ as root:
# DeepAgents FilesystemBackend
write_file("/artifacts/report.pdf", content) # β .browser-agent/artifacts/report.pdf
# bash_execute (cwd defaults to .browser-agent/)
bash_execute("python artifacts/script.py") # Runs from .browser-agent/Each conversation thread maintains isolated state:
thread_id = "abc-123"
# Isolated per thread:
- Browser session (sessionId, streamUrl, isActive)
- LangGraph checkpoint (messages, todos, files)
- WebSocket port (9223 + hash(thread_id) % 100)
# Shared across threads:
- Memory files (AGENTS.md, USER_PREFERENCES.md)
- Skills definitions
- Configurationβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LangGraph State β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β messages: BaseMessage[] β β
β β - HumanMessage (user input) β β
β β - AIMessage (agent response + tool_calls) β β
β β - ToolMessage (tool results) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β todos: Todo[] β β
β β - content: string β β
β β - status: "pending" | "in_progress" | "completed" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β browser_session: BrowserSession | null β β
β β - sessionId: string (thread_id) β β
β β - streamUrl: string (ws://localhost:9223) β β
β β - isActive: boolean β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β current_thought: ThoughtProcess | null β β
β β - content: string (streaming) β β
β β - isComplete: boolean β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Checkpoint on each node
βΌ
βββββββββββββββββββ
β SQLite DB β
β (persistent) β
βββββββββββββββββββ
βββββββββββββββββββ βββββββββββββββββββ
β agent-browser β β Frontend β
β (backend) β β BrowserPanel β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ
β β
β Start screencast β
β on browser_navigate β
βΌ β
βββββββββββββββββββ WebSocket βββββββββββββββββββ
β Screencast β ββββββββββββββββββΆ β WebSocket β
β Server :9223 β JPEG frames β Client β
βββββββββββββββββββ (base64) βββββββββββββββββββ
β β
β Frame message: β
β { β
β type: "frame", β
β data: "base64...", β
β metadata: { β
β deviceWidth, β
β deviceHeight, β
β ... β
β } β
β } β
β βΌ
β βββββββββββββββββββ
β β <img src= β
β β data:image/ β
β β jpeg;base64> β
β βββββββββββββββββββ
β
β On browser_close:
β - Stop screencast
β - Close WebSocket
β - Frontend auto-collapses panel
βΌ
Agent encounters need for human input
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β langgraph.types.interrupt({ β
β type: "guidance" | "credentials" β
β | "confirmation", β
β question: "...", β
β context: "..." β
β }) β
ββββββββββββββββββββ¬βββββββββββββββββββ
β
β Stream interrupted state
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Frontend detects interrupt β
β stream.interrupt !== null β
ββββββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Render appropriate UI: β
β - HumanLoopInterrupt (guidance) β
β - CredentialsForm (credentials) β
β - ConfirmationDialog (confirm) β
ββββββββββββββββββββ¬βββββββββββββββββββ
β
β User responds
βΌ
βββββββββββββββββββββββββββββββββββββββ
β resumeInterrupt(response) β
β β stream.submit(response) β
ββββββββββββββββββββ¬βββββββββββββββββββ
β
β Graph resumes
βΌ
Agent continues execution
Browser-Use/
βββ browser-use-agent/ # Python Backend (DeepAgents)
β βββ browser_use_agent/ # Core agent package
β β βββ browser_agent.py # Main agent + Ralph Mode
β β βββ configuration.py # Azure OpenAI config
β β βββ tools.py # Browser automation tools
β β βββ bash_tool.py # Bash execution with security tiers
β β βββ human_loop.py # Human-in-the-loop tools
β β βββ subagent_interrupt.py # Subagent interrupt forwarding
β β βββ state.py # State definitions
β β βββ prompts.py # System prompts + memory management
β β βββ reflection.py # Memory read/write tools
β β βββ storage/ # Checkpoint and config
β β βββ skills/ # Skill loader
β β βββ utils.py # StreamManager
β βββ agent.py # CLI entry point
β βββ langgraph.json # LangGraph config
β
βββ deep-agents-ui/ # Next.js Frontend
β βββ src/
β β βββ app/
β β β βββ api/skills/ # Skills API route
β β β βββ components/ # UI components
β β β β βββ ChatInterface.tsx
β β β β βββ ChatMessage.tsx
β β β β βββ BrowserPanel.tsx
β β β β βββ ThoughtProcess.tsx
β β β β βββ ToolCallBox.tsx
β β β β βββ ThreadList.tsx
β β β β βββ ...
β β β βββ hooks/
β β β β βββ useChat.ts # Main chat hook
β β β β βββ useThreads.ts # Thread list hook
β β β βββ providers/
β β β β βββ ChatProvider.tsx
β β β β βββ ClientProvider.tsx
β β β βββ types/
β β βββ components/ui/ # shadcn/ui components
β βββ .env.local.example
β
βββ .browser-agent/ # Agent memory and artifacts
β βββ artifacts/ # Generated files
β βββ memory/ # Persistent memory
β βββ skills/ # Skill definitions
β βββ checkpoints/ # State persistence
β
βββ agent.md # Technical reference
βββ CLAUDE.md # AI assistant instructions
βββ README.md # This file
- Backend README - Python agent details
- agent.md - Technical reference & implementation
- Skills:
.browser-agent/skills/- PDF, PPTX, DOCX, browser automation
Built with DeepAgents and agent-browser