Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d714e8d
Add workshop documentation series with troubleshooting guide
MitchellShiell Apr 5, 2026
3a9e3f8
minor updates
MitchellShiell Apr 6, 2026
0e6a7d5
Migrate workshop docs from guides to dedicated /workshop section
MitchellShiell Apr 10, 2026
fc749c8
powershell command references
MitchellShiell Apr 10, 2026
49e21be
Migrate Conductor to Docker wrapper and update image versions
MitchellShiell Apr 13, 2026
73d217f
Migrate Conductor to Docker wrapper and update image versions
MitchellShiell Apr 13, 2026
5de4cf9
removed vestigial images
MitchellShiell Apr 14, 2026
e0ccd02
Polish workshop intro page for IBC session
MitchellShiell Apr 14, 2026
006131a
Refine workshop intro page and compress images for web
MitchellShiell Apr 14, 2026
272d6ac
updated git clone command to be branch specific
MitchellShiell Apr 14, 2026
843425e
minor page naming update
MitchellShiell Apr 14, 2026
0de6906
Corrected branch name ISB >> IBC
MitchellShiell Apr 14, 2026
8721135
minor update
MitchellShiell Apr 14, 2026
8dd0829
Release full workshop content
MitchellShiell Apr 14, 2026
da0a7b1
image cleanup
MitchellShiell Apr 15, 2026
47fd241
Merge remote-tracking branch 'origin/main' into IBCworkshop-release
MitchellShiell Apr 15, 2026
71a5de0
Canadian spelling
MitchellShiell Apr 15, 2026
7bba927
image cleanup
MitchellShiell Apr 15, 2026
d872ff2
updated clone command to include branch name
MitchellShiell Apr 15, 2026
7838913
minor updates
MitchellShiell Apr 15, 2026
64774f2
updated config generation screenshots
MitchellShiell Apr 15, 2026
ae0a8e9
post workshop survery added
MitchellShiell Apr 15, 2026
d40fc3c
updated extension task
MitchellShiell Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed website/static/img/CDD.png
Binary file not shown.
Binary file removed website/static/img/arch.png
Binary file not shown.
Binary file removed website/static/img/basicPortal.png
Binary file not shown.
Binary file removed website/static/img/build.png
Binary file not shown.
Binary file not shown.
Binary file removed website/static/img/demo-portal-cross-table.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed website/static/img/demo-portal-exploration-page.png
Binary file not shown.
Binary file not shown.
Binary file removed website/static/img/demo-portal-homepage.png
Binary file not shown.
Binary file removed website/static/img/demo-portal-homepage.webp
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed website/static/img/demo-search-and-aggregation.gif
Binary file not shown.
Binary file removed website/static/img/documentation.png
Binary file not shown.
Binary file removed website/static/img/homepage.png
Binary file not shown.
Binary file removed website/static/img/overture-platform-overview.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed website/static/img/workshop-portal-preview.png
Binary file not shown.
Binary file removed website/static/img/workshop-portal-preview.webp
Binary file not shown.
8 changes: 4 additions & 4 deletions website/workshop/00-Intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ sidebar_position: 0
description: Build a searchable, FAIR-compliant data discovery portal from tabular CSV data using Elasticsearch, Arranger, and Stage.
---

:::caution Please complete the prerequisites below before arriving
Most importantly downloading the Docker images; the conference venue's Wi-Fi may be slow and unreliable. Thank you and looking forward to meeting you - **Mitchell Shiell, Ontario Institute for Cancer Research, [mshiell@oicr.on.ca](mailto:mshiell@oicr.on.ca)**
:::caution Please complete the prerequisites before arriving
Most importantly, download the Docker images in advance, the conference venue's Wi-Fi may be slow and unreliable. Looking forward to meeting you **Mitchell Shiell, Ontario Institute for Cancer Research, [mshiell@oicr.on.ca](mailto:mshiell@oicr.on.ca)**
:::

# IBC Workshop Prerequisites

This workshop has been developed as part of the 19th Annual International Biocuration Conference, it will guide you through building a foundational data discovery portal for tabular CSV data using Elasticsearch, Arranger, and Stage.

![Demo search and aggregation](/img/workshop-portal-preview.webp)
![Demo search and aggregation](./images/workshop-portal-preview.webp)

:::info 👋 Say hello
If you're attending, feel free to [**drop a quick introduction**](https://github.com/overture-stack/docs/discussions/new?category=new-deployments&title=%5BIBC+Workshop%5D+Hello+from+%5BName%2C+Institution%5D&body=%2A%2AName+%26+affiliation%3A%2A%2A+%0A%0A%2A%2AType+of+data+I+work+with%3A%2A%2A+%0A%0A%2A%2AWhat+I%27m+hoping+to+get+out+of+the+session%3A%2A%2A+%0A%0A%2A%2AData+management+challenges+%28optional%29%3A%2A%2A+) before the day, this helps tailor the session to the room. Entirely optional.
Expand Down Expand Up @@ -103,7 +103,7 @@ git clone -b IBCworkshop https://github.com/overture-stack/prelude.git
These are not required but will make the workshop easier to follow:

<details>
<summary><strong>6. (Optional) Elasticvue:</strong>browser-based Elasticsearch GUI</summary>
<summary><strong>6. (Optional) Elasticvue:</strong> browser-based Elasticsearch GUI</summary>

[Elasticvue](https://elasticvue.com/installation) is a browser-based Elasticsearch GUI useful for inspecting indices, browsing documents, and troubleshooting. It is not required but helpful for understanding what's happening inside Elasticsearch during the workshop.

Expand Down
27 changes: 16 additions & 11 deletions website/workshop/01-Running-the-Demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,23 @@ id: running-the-demo
title: Running the Demo
sidebar_position: 1
description: Deploy the pre-configured demo portal to see the finished result before building from scratch.
draft: true
---

import demoVideo from './images/demo-search-and-aggregation.webm';

# Running the Demo

Before building anything from scratch, let's deploy the pre-configured demo portal and see what the end result looks like. This gives you a mental model of what each component does before we dive into configuration details.

<video autoPlay loop muted playsInline style={{width: '100%', height: 'auto'}}>
<source src="/img/demo-search-and-aggregation.webm" type="video/webm" />

<source src={demoVideo} type="video/webm" />
</video>

If you have not done so yet clone the following repository.
If you have not done so yet, clone the following repository.

```
git clone https://github.com/overture-stack/prelude.git
git clone -b IBCworkshop https://github.com/overture-stack/prelude.git
cd prelude
```

Expand All @@ -30,12 +32,13 @@ make demo
<details>
<summary><strong>Running on Windows?</strong></summary>

| Platform | Command |
|---|---|
| Platform | Command |
| ------------------ | ----------------------------------- |
| WSL2 (recommended) | `make demo` (in an Ubuntu terminal) |
| Native PowerShell | `.\run.ps1 demo` |
| Native PowerShell | `.\run.ps1 demo` |

**One-time setup for native PowerShell:** allow local scripts to run by executing this once:

```powershell
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```
Expand Down Expand Up @@ -70,13 +73,13 @@ Once the portal loads, take a few minutes to explore:

The landing page provides an overview and navigation to available data tables. Note the navigation bar, branding, and layout, all of which are configurable.

![Portal home page](/img/homepage.webp)
![Portal home page](./images/homepage.webp)

#### Data Exploration Page

Navigate to the data exploration page from the top navigation. This is where Arranger's components are at work:

![exploration page](/img/basicPortal.webp)
![exploration page](./images/basicPortal.webp)

- **Facet Panel (left sidebar):** Filter data by clicking on field values. Each facet corresponds to a field in the Elasticsearch index. The fields shown, their order, and their display names are all controlled by Arranger configuration files.

Expand All @@ -90,7 +93,7 @@ Navigate to the data exploration page from the top navigation. This is where Arr

The portal includes built-in documentation pages rendered from markdown files in the `docs/` directory. The content you are reading right now may be served through this same mechanism.

![documentation page](/img/documentation.webp)
![documentation page](./images/documentation.webp)

### What's Running

Expand Down Expand Up @@ -176,16 +179,18 @@ Before moving on, confirm:

### Stopping the Demo

We'll keep the demo running as a reference while we walk through the architecture. When your ready to remove it run:
We'll keep the demo running as a reference while we walk through the architecture. When your ready to remove it, run:

```bash
make reset
```

:::tip Windows (PowerShell)

```powershell
.\run.ps1 reset
```

:::

**Next:** Now that you've seen the working portal, let's understand how the pieces fit together.
7 changes: 3 additions & 4 deletions website/workshop/02-Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,13 @@ id: architecture
title: Architecture
sidebar_position: 2
description: How data flows from a CSV file through PostgreSQL and Elasticsearch to the browser-based search portal.
draft: true
---

# Architecture

Now that you've seen the running portal, let's walk through how data flows from a CSV file to the search interface.

![Architecture Diagram](/img/workshop-architecture-diagram.webp "Architecture Diagram")
![Architecture Diagram](./images/workshop-architecture-diagram.webp "Architecture Diagram")

| Component | Type | Description |
| ---------------------------------------------------------------------------------------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand Down Expand Up @@ -88,7 +87,7 @@ Services start in dependency order: PostgreSQL and Elasticsearch must be healthy

The components used in this workshop are part of the broader [Overture](https://overture.bio) open-source platform for research data management. The search and exploration stack we're using can be extended with additional services:

![Platform Integration](/img/overture-platform-overview.webp)
![Platform Integration](./images/overture-platform-overview.webp)

- **Lectern:** Data dictionary management (define and enforce data schemas)
- **Lyric:** Tabular data submission with validation
Expand All @@ -104,7 +103,7 @@ Structuring data through a search API like Arranger makes it **machine-accessibl

The platform connects to Arranger via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and is designed around four core principles: data minimisation by default, no action without explicit researcher consent, sandboxed code execution, and fully reproducible sessions. Because research data is often sensitive, the platform runs on sovereign infrastructure rather than routing queries through commercial AI providers.

![CDD Conceptual Mock](/img/conversational-data-discovery-mockup.webp)
![CDD Conceptual Mock](./images/conversational-data-discovery-mockup.webp)

:::info
**The interface shown above is a conceptual mock-up**. CDD is under active development and is not covered in this workshop.
Expand Down
5 changes: 2 additions & 3 deletions website/workshop/03-Data-Preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ id: data-preparation
title: Data Preparation
sidebar_position: 3
description: How to structure and format your CSV data to meet the requirements for loading into the portal.
draft: true
---

# Data Preparation
Expand Down Expand Up @@ -33,7 +32,7 @@ Your CSV column headers become field names in PostgreSQL, Elasticsearch, and Gra
| **Format** | CSV (comma-separated); other delimiters supported via `--delimiter` but for simplicity we recommend using comma-separated files. |
| **Header row** | Required as the first line |
| **Prohibited characters** | `: > < . [space] , / \ ? # [ ] { } " * \| + @ & ( ) ! ^` |
| **Max length** | A maximum 63 characters per header name, PostgreSQL silently truncates longer identifiers, which can cause mismatches between your schema and index |
| **Max length** | A maximum of 63 characters per header name, PostgreSQL silently truncates longer identifiers, which can cause mismatches between your schema and index |
| **Reserved words** | These are internal field names used by Elasticsearch and GraphQL. Using them will conflict with system internals and cause indexing or query errors: `_type` `_id` `_source` `_all` `_parent` `_field_names` `_routing` `_index` `_size` `_timestamp` `_ttl` `_meta` `_doc` `__typename` `__schema` `__type` |
| **Best practices** | Use `snake_case` or `camelCase`, lowercase, descriptive but concise, no special characters or spaces |

Expand Down Expand Up @@ -103,7 +102,7 @@ If you're working with data that has any access restrictions, use anonymized or

#### Recommended Data Size

There are no strict size limits beyond Docker and Elasticsearch resource constraints. In fact we've scaled this resource to hundreds millions of records. However, for development and testing, a representative sample of approximately **500 records** works well. You can start small and load larger datasets once your configuration is working.
There are no strict size limits beyond Docker and Elasticsearch resource constraints. In fact we've scaled this resource to hundreds of millions of records. However, for development and testing, a representative sample of approximately **500 records** works well. You can start small and load larger datasets once your configuration is working.

### Checkpoint

Expand Down
11 changes: 5 additions & 6 deletions website/workshop/04-Generating-Configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ id: generating-configurations
title: Generating Configurations
sidebar_position: 4
description: Auto-generate PostgreSQL schemas, Elasticsearch mappings, and Arranger configuration files from your CSV data.
draft: true
---

# Generating Configurations
Expand All @@ -12,13 +11,13 @@ Instead of writing PostgreSQL schemas, Elasticsearch mappings, and Arranger conf

Navigate to **Config Generator** in the Stage portal navigation bar (visible once Stage is running).

<!-- IMAGE: screenshot of the Config Generator page in the Stage nav -->
![Config Generator page in the Stage portal navigation](./images/config-generator-page.webp)

### Step 1: Provide CSV Data

Upload a `.csv` file using the **Upload .csv file** button, or paste CSV content directly into the text area. Once loaded, a preview of the first five rows is shown so you can confirm the correct file was used.

<!-- IMAGE: screenshot of the CSV upload area and preview table -->
![CSV upload area and preview table](./images/csv-upload-area.webp)

### Step 2: Configure Options

Expand All @@ -27,7 +26,7 @@ Upload a `.csv` file using the **Upload .csv file** button, or paste CSV content
| **Index name** | The name of the Elasticsearch index. Auto-populated from the CSV filename; edit if needed (e.g. `datatable1`). |
| **Table name** | The name of the PostgreSQL table. Defaults to the same value as the index name. |

<!-- IMAGE: screenshot of the Configure Options fields -->
<!-- IMAGE: screenshot of the Configure Options fields (no image provided) -->

### Step 3: Generate and Copy

Expand All @@ -45,7 +44,7 @@ Click **Generate Configs**. Once complete, the output panel shows a tabbed view

Use the **Copy** button on each tab to copy the content, then paste it into the corresponding file in your project.

<!-- IMAGE: screenshot of the generated output panel with tabs and Copy button -->
![Generated output panel with tabs and Copy button](./images/generated-output-panel.webp)

### Reviewing the Output

Expand Down Expand Up @@ -264,7 +263,7 @@ Display names are auto-generated by converting `snake_case` to Title Case. Revie
</details>

:::tip
For a full reference on `extended.json`, see the [Arranger extended configuration docs](https://docs.overture.bio/docs/core-software/Arranger/usage/arranger-components).
For a full reference on `extended.json`, see the [Arranger extended configuration docs](https://docs.overture.bio/docs/core-software/arranger/usage/arranger-components).
:::

#### arranger/table.json
Expand Down
5 changes: 2 additions & 3 deletions website/workshop/05-Docker-Configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ id: docker-configuration
title: Docker Configuration
sidebar_position: 5
description: Walk through the docker-compose.yml service definitions and environment variables to wire configuration files into each container.
draft: true
---

# Docker Configuration
Expand Down Expand Up @@ -190,7 +189,7 @@ make platform
:::

:::info
For future configuration changes (once your own data is loaded), `make restart` is sufficient, it reloads configs without wiping data. If you wish to wipe the data as-well run `make reset`
For future configuration changes (once your own data is loaded), `make restart` is sufficient, it reloads configs without wiping data. If you wish to wipe the data as well, run `make reset`
:::

#### Troubleshooting
Expand All @@ -214,7 +213,7 @@ curl -u elastic:myelasticpassword http://localhost:9200/_cluster/health?pretty
make reset
```

:::tip Windows (PowerShell) full reset
:::tip Windows (PowerShell) - full reset

```powershell
.\run.ps1 reset
Expand Down
5 changes: 2 additions & 3 deletions website/workshop/06-Loading-Data.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,13 @@ id: loading-data
title: Loading Data
sidebar_position: 6
description: Use the Conductor CLI to load CSV data into PostgreSQL and index it into Elasticsearch for search.
draft: true
---

# Loading Data

With the infrastructure configured, it's time to load data into the portal. Conductor is a CLI tool that reads CSV files, loads each row into PostgreSQL (persistent storage), then indexes them into Elasticsearch as structured documents for search.

Conductor runs as a Docker container no Node.js installation required. A wrapper script at the root of the repository handles the Docker details for you.
Conductor runs as a Docker container, no Node.js installation required. A wrapper script at the root of the repository handles the Docker details for you.

:::info
Run all `./conductor` commands from the **root of the `prelude` repository** (i.e. where `docker-compose.yml` lives).
Expand All @@ -31,7 +30,7 @@ Add that line to your `~/.zshrc` (Zsh) or `~/.bashrc` (Bash) and reload:
source ~/.zshrc # or source ~/.bashrc
```

You can then run `conductor upload ...` from any directory. The script resolves the `data/` folder relative to its own location in the repo, so paths like `./data/datatable1.csv` still refer to the repo's `data/` directory run the command from the repo root or use an absolute path to a file outside it.
You can then run `conductor upload ...` from any directory. The script resolves the `data/` folder relative to its own location in the repo, so paths like `./data/datatable1.csv` still refer to the repo's `data/` directory, run the command from the repo root or use an absolute path to a file outside it.

</details>

Expand Down
3 changes: 1 addition & 2 deletions website/workshop/07-Troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ id: troubleshooting
title: Troubleshooting
sidebar_position: 7
description: A layered approach to diagnosing issues in the portal stack, from Docker and databases through to the browser.
draft: true
---

# Troubleshooting
Expand Down Expand Up @@ -115,7 +114,7 @@ If Arranger responds to the GraphQL query above but the data table or facet pane
| `extended.json` | Dot notation | `data.field_name` |
| `base.json` | Alias name (`esIndex`) | `datatable1_centric` |

Another common `table.json` issue is the `query` field, which must use the correct GraphQL traversal path (`hits`, `edges`, `nodes`) to reach the field value. An incorrect path here will cause columns to render empty even when data is present. See the [Arranger table configuration docs](https://docs.overture.bio/docs/core-software/Arranger/usage/arranger-components#table-configuration-tablejson) for the expected structure.
Another common `table.json` issue is the `query` field, which must use the correct GraphQL traversal path (`hits`, `edges`, `nodes`) to reach the field value. An incorrect path here will cause columns to render empty even when data is present. See the [Arranger table configuration docs](https://docs.overture.bio/docs/core-software/arranger/usage/arranger-components#table-configuration-tablejson) for the expected structure.

### Step 5: Check Stage and the Browser

Expand Down
Loading