Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions skills/hotdata-cli/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,20 @@ API URL defaults to `https://api.hotdata.dev/v1` or overridden via `HOTDATA_API_

All commands that accept `--workspace-id` are optional. If omitted, the active workspace is used. Use `hotdata workspaces set` to switch the active workspace interactively, or pass a workspace ID directly: `hotdata workspaces set <workspace_id>`. The active workspace is shown with a `*` marker in `hotdata workspaces list`. **Omit `--workspace-id` unless you need to target a specific workspace.**

## Multi-step workflows (Model, Library, History, Chain, Indexes)

These are **patterns** built from the commands below—not separate CLI subcommands:

- **Model** — Markdown semantic map of your workspace (entities, keys, joins). Refresh using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md).
- **Library** — Curated **`hotdata queries`** entries for repeatable SQL (`queries create`, `queries run`, …).
- **History** — Find prior **`hotdata results`** and saved queries (`results list`, `results <id>`, `queries list`).
- **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets.main.<table>`.
- **Indexes** — Review SQL and schema, compare to existing indexes, create **sorted**, **bm25**, or **vector** indexes when it clearly helps; see [references/WORKFLOWS.md](references/WORKFLOWS.md#indexes).

Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md).

**Project-owned files:** Put `DATA_MODEL.md` or `data_model.md` (e.g. under `docs/`) in the **directory where you run `hotdata`**—your repo or project—not under `~/.claude/skills/` or other agent skill paths. Copy the template from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) to start; use [references/MODEL_BUILD.md](references/MODEL_BUILD.md) when you need the full procedure.

## Available Commands

### List Workspaces
Expand Down Expand Up @@ -259,8 +273,11 @@ hotdata jobs <job_id> [--workspace-id <workspace_id>] [--format table|json|yaml]
```
hotdata auth # Browser-based login
hotdata auth status # Check current auth status
hotdata auth logout # Remove saved auth for the default profile
```

Other commands (not covered in detail above): `hotdata connections new` (interactive connection wizard), `hotdata skills install|status`, `hotdata completions <bash|zsh|fish>`.

## Workflow: Running a Query

1. List connections:
Expand Down
89 changes: 89 additions & 0 deletions skills/hotdata-cli/references/DATA_MODEL.template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Data model — `<project name>`

> Copy this file to your **project** directory (e.g. `./DATA_MODEL.md`, `./data_model.md`, or `./docs/DATA_MODEL.md`).
> Do not commit workspace-specific content into agent skill folders.
> For a **full** build (per-table detail, connector enrichment, index summary), follow [MODEL_BUILD.md](MODEL_BUILD.md) from the installed skill’s `references/` (or this repo’s `skills/hotdata-cli/references/`). Relative links to `MODEL_BUILD.md` below work only while this file lives next to those references; in your project, open that path separately if the link 404s.

**Workspace (Hotdata):** `<workspace name or id>`
**Last catalog refresh:** `<YYYY-MM-DD>`

## Overview

What data exists, which business domains it covers, and who owns this document.
_(Large workspaces: add a **table of contents** here—per connection, table counts.)_

## Purpose

Short description of what this workspace is for and how the model should be used for queries.

## Connections & sources

| Connection ID | Name | Type | Role / domain |
|---------------|------|------|---------------|
| | | | |

### Per-table detail (optional — use for deep models)

_Use for important tables only, or expand all via [MODEL_BUILD.md](MODEL_BUILD.md). **Duplicate** this whole block (from the heading through the horizontal rule) for each table._

#### `<connection>.<schema>.<table>`

**Grain:** one row = one `…`
**Description:**

| Column | Type | Nullable | PK/FK | Notes |
|--------|------|----------|-------|-------|

**Relationships:** (PK, FKs, parent–child)
**Queryability:** (filters, joins, caveats)

---

## Entities and grain (summary view)

For each business entity:

- **Entity:**
- **Grain:** one row per …
- **Primary tables:** `connection.schema.table`
- **Key columns:**

## Cross-connection joins

Document safe join paths and caveats (fan-out, timing, different refresh cadence, type mismatches).

## Search & index summary (optional)

| Table | Column | Kind (vector / text / …) | Index status | Notes |
|-------|--------|--------------------------|--------------|-------|
| | | | | |

_Use `hotdata indexes list -c <connection_id> --schema <schema> --table <table>` per table as needed._

## Datasets (uploaded)

Catalog from `hotdata datasets list` / `hotdata datasets <id>`:

| Label | Table name (`datasets.main.…`) | Grain | Notes |
|-------|-------------------------------|-------|-------|
| | | | |

## Derived tables (Chain)

Stable `datasets.main.*` tables built for **Chain** workflows (not necessarily uploaded file datasets):

| Table name | Built from | Purpose | Owner / TTL |
|------------|------------|---------|-------------|
| | | | |

## Saved query index (Library)

Link business questions to saved queries (ids/names from `hotdata queries list`):

| Question / report | Saved query name | ID (optional) |
|-------------------|------------------|---------------|
| | | |

## Notes

Assumptions, known gaps, and refresh checklist.
125 changes: 125 additions & 0 deletions skills/hotdata-cli/references/MODEL_BUILD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Building a workspace data model (advanced)

Optional **deep pass** for a single authoritative markdown model. For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md).

**Output:** Save as `DATA_MODEL.md`, `data_model.md`, or `docs/DATA_MODEL.md` in the **project directory** where you run `hotdata` (not inside agent skill folders).

---

## 1. Discover connections

```bash
hotdata connections list
```

For each connection, record `id`, `name`, and `source_type`.

---

## 2. Enumerate tables, columns, and datasets

If the catalog may be **stale** (recent DDL, new tables missing), run **`hotdata connections refresh <connection_id>`** for affected connections **before** relying on `tables list`.

**Per connection:**

```bash
hotdata tables list --connection-id <connection_id>
```

**Uploaded datasets:**

```bash
hotdata datasets list
hotdata datasets <dataset_id>
```

Capture schema for each dataset (columns, types) from the detail view.

You can also refresh after enumeration if you discover drift:

```bash
hotdata connections refresh <connection_id>
```

---

## 3. Enrich beyond column names (optional but valuable)

Use **connector and tooling docs** when `source_type` (or table shapes) match:

- **Vendor / ELT docs** — Your loader or integration vendor’s published schemas for canonical tables, PKs/FKs, and field semantics (link what you use so a human can verify).
- **dlt** — [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources) for normalized layouts.
- **dlt-loaded data** — If you see `_dlt_id`, `_dlt_load_id`, `_dlt_parent_id`: treat as pipeline metadata; `_dlt_parent_id` often links flattened child rows to parents when no explicit FK exists. Exclude these from **grain** statements unless the question is specifically about loads.
- **Vectors** — Columns typed as lists of floats (e.g. embedding columns) are candidates for vector search; note them.
- **Well-known SaaS shapes** — Apply general patterns (e.g. Stripe charges/customers, HubSpot contacts/deals) only when naming and structure fit; **link** the doc you used so a human can verify.

Do **not** invent facts: if context is missing, say so and suggest a small sample query:

```bash
hotdata query "SELECT * FROM <connection>.<schema>.<table> LIMIT 5"
```

---

## 4. Infer relationships

For each table, capture where reasonable:

1. **Grain** — One row = one `…` (required per table; if unknown, say unknown).
2. **Primary keys** — `id`, `<entity>_id`, or composite patterns from names + types.
3. **Foreign keys** — `_id` / `_fk` / name matches to other tables; confirm with connector docs when possible.
4. **Parent–child** — Flattened API/JSON tables (often nested names) and dlt parent keys.
5. **Cross-connection** — Same logical entity in two connections (keys, type mismatches, caveats).

For **small** schemas (e.g. ≤5 tables in a domain), a short **ASCII diagram** helps. For larger ones, group by domain in prose (e.g. billing, identity, product).

---

## 5. Search and index awareness

For tables you care about:

```bash
hotdata indexes list -c <connection_id> --schema <schema> --table <table> [-w <workspace_id>]
```

Note:

- **Vector**-friendly columns (embeddings) vs **BM25**-friendly text (`title`, `body`, `description`, …).
- **Time** columns — event grain vs slowly changing dimensions.
- **Facts vs dimensions** — for analytics-oriented workspaces.

When suggesting a new index, use the same connection/schema/table/column names as in `tables list` and the main skill’s `indexes create` examples.

---

## 6. Document structure

Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed:

- **Overview** — Domains and what the workspace is for.
- **Per connection** — Optional subsection per source; for **deep** models, **repeat** one block per `connection.schema.table` (grain, column table with name/type/nullable/PK-FK/notes, relationships, queryability, caveats)—the template’s single `####` heading is a pattern to copy for each table.
- **Datasets** — Same treatment as connection tables where relevant.
- **Cross-connection joins** — Keys, semantics, type caveats.
- **Search / index summary** — Table, column, index status, intended use.

If the workspace has **many** tables (e.g. 50+), add a **table of contents** after the overview (connection → table counts).

---

## Error handling

- If a CLI command fails, record the error in the doc and **continue** when possible.
- Unreachable connections or empty table lists: note in the connections table (e.g. unreachable / no tables).
- Do not abort the whole model for one bad connection.

---

## Rules (keep quality high)

- Every table gets an explicit **grain** (or “unknown”).
- Prefer **documented** connector semantics over guesswork; **link** external docs when you use them.
- Flag **test/dev** tables (`test`, `tmp`, `dev`, `staging` in names) as non-production when applicable.
- Note **Utf8-stored numbers** and cast requirements where relevant.
- Do not leave column **Notes** empty when domain knowledge or docs apply; “—” is weak unless the column is opaque/internal.
- Align table names with **`hotdata tables list`** output (`connection.schema.table`).
Loading
Loading