Skip to content

QuartzUnit/watchdeck

Repository files navigation

watchdeck

Python Tests License

한국어 문서 · llms.txt

Web monitoring pipeline — track page changes, capture visual diffs, and guard against monitoring pitfalls. Built from QuartzUnit libraries.

flowchart LR
    A["🔗 diffgrab\nchange detection"] --> B["📄 markgrab\ncontent extraction"]
    A --> C["📸 snapgrab\nvisual capture"]
    B --> D["🛡️ llm-degen-guard\noutput quality"]
    B --> E["🔄 agent-loop-guard\nloop detection"]
    B --> F["📋 agent-action-policy\naction safety"]
Loading

Quick Start

pip install watchdeck

# Add URLs to monitor
watchdeck add https://example.com
watchdeck add https://news.ycombinator.com --interval 12

# Check for changes
watchdeck check

# View history
watchdeck history https://example.com

# See diff between snapshots
watchdeck diff https://example.com

What It Does

  1. Detect — Tracks page changes via diffgrab (content hashing + structured diffs)
  2. Extract — Pulls full content via markgrab for quality validation
  3. Screenshot — Captures visual snapshots via snapgrab on change (optional)
  4. Guard — Three safety layers:

No cloud services, no API keys. Everything runs locally.

Install

pip install watchdeck

Requirements: Python 3.11+, Playwright (for screenshots: playwright install chromium)

CLI Reference

watchdeck add <URL>

Add a URL to monitor. Blocked URLs (localhost, private IPs, file://) are automatically rejected.

watchdeck add https://example.com                  # default: check every 24h
watchdeck add https://news.ycombinator.com -i 12   # check every 12h
Option Short Default Description
--interval -i 24 Check interval in hours

watchdeck check

Check all monitored URLs for changes.

watchdeck check                              # check all
watchdeck check -u https://example.com       # specific URL
watchdeck check --screenshots                # capture screenshots on change

Output:

    Monitor Check (3 URLs, 1240ms)
┌──────────────────────────┬───────────┬─────────┬──────────┐
│ URL                      │ Status    │ Changes │ Warnings │
├──────────────────────────┼───────────┼─────────┼──────────┤
│ https://example.com      │ CHANGED   │ +5/-2   │          │
│ https://news.ycombinator │ unchanged │         │          │
│ https://old-page.com     │ unchanged │         │ stale    │
└──────────────────────────┴───────────┴─────────┴──────────┘

1 changes detected
1 stale URLs (consider reducing frequency)

watchdeck remove <URL>

Stop monitoring a URL.

watchdeck history <URL>

Show snapshot history.

watchdeck history https://example.com -n 10

watchdeck diff <URL>

Show diff between snapshots.

watchdeck diff https://example.com
watchdeck diff https://example.com --before 1 --after 3

Python API

import asyncio
from watchdeck import WatchDeck

async def main():
    deck = WatchDeck()

    # Add URLs (safety policy auto-applied)
    await deck.add("https://example.com", interval_hours=12)
    await deck.add("http://localhost:8080")  # → blocked by policy

    # Check for changes
    report = await deck.check()
    for result in report.results:
        if result.changed:
            print(f"{result.url}: {result.summary}")
        if result.stale_warning:
            print(f"  ⚠ {result.stale_warning}")
        if result.content_warning:
            print(f"  ⚠ {result.content_warning}")

    # History and diffs
    snapshots = await deck.history("https://example.com")
    diff = await deck.diff("https://example.com")

    await deck.close()

asyncio.run(main())

Safety Guards

watchdeck integrates three QuartzUnit guard libraries to prevent common monitoring pitfalls:

URL Policy (agent-action-policy)

Automatically blocks monitoring of:

  • localhost, 127.0.0.1
  • Private networks (192.168.*, 10.*, 172.16-31.*)
  • file:// URLs
deck = WatchDeck()
success, msg = await deck.add("http://192.168.1.1/admin")
# success=False, msg="Cannot monitor internal/private network URLs"

Loop Detection (agent-loop-guard)

Detects when a URL hasn't changed for N consecutive checks and suggests reducing frequency:

⚠ URL unchanged for 5 consecutive checks — consider reducing frequency

Content Quality (llm-degen-guard)

Flags pages that return garbage content (CAPTCHA pages, bot detection, repetitive filler):

⚠ Content appears degenerate (score=0.78) — possible CAPTCHA or anti-bot page

Configuration

Data is stored in ~/.watchdeck/ by default:

~/.watchdeck/
├── tracker.db     # diffgrab snapshots + change history

Custom location:

deck = WatchDeck(db_dir="/path/to/data")

How It Works

flowchart TD
    A["watchdeck add URL"] --> B["Initial snapshot\n(diffgrab + markgrab + snapgrab)"]
    B --> C["watchdeck check"]
    C --> D{"Content\nchanged?"}
    D -->|"yes"| E["Compute diff\n+ screenshot\n+ guard checks"]
    D -->|"no"| F["Check stale threshold"]
    E --> G["📊 Report"]
    F --> G
Loading

QuartzUnit Libraries Used

Library Role in watchdeck PyPI
diffgrab Page change detection + structured diffs pip install diffgrab
markgrab Content extraction for quality checks pip install markgrab
snapgrab Visual screenshot capture on change pip install snapgrab
agent-action-policy URL safety policy (block internal IPs) pip install agent-action-policy
agent-loop-guard Stale monitoring detection pip install agent-loop-guard
llm-degen-guard Garbage content detection pip install llm-degen-guard

See also: newswatch — news monitoring pipeline (feedkit + markgrab + embgrep + diffgrab)

License

MIT


Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.

About

Web monitoring pipeline — track page changes, capture visual diffs, and guard against monitoring pitfalls.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages