ChunkLab

The ultimate developer sandbox for text chunking experimentation.

🚀 Pitch

EN: ChunkLab is a powerful browser-based sandbox designed for developers to test, visualize, and validate text chunking pipeline configurations. Optimize your RAG (Retrieval-Augmented Generation) ingestion process with real-time feedback and detailed metrics.

ID: ChunkLab adalah sandbox berbasis browser yang dirancang bagi pengembang untuk menguji, memvisualisasikan, dan memvalidasi konfigurasi pipeline chunking teks. Optimalkan proses ingestion RAG (Retrieval-Augmented Generation) Anda dengan umpan balik real-time dan metrik mendetail.

📸 Screenshot

⚡ Quickstart

Prerequisites

Python 3.12+
Node.js 18+

1. Clone & Setup Backend

git clone https://github.com/ziffan/ChunkLab.git
cd ChunkLab/backend

python -m venv .venv

Windows:

.venv\Scripts\activate

Linux/macOS:

source .venv/bin/activate pip install -r requirements.txt cp .env.example .env python main.py


#### 2. Setup Frontend
```bash
cd ../frontend
npm install
npm run dev

3. Access

Open http://localhost:5173 in your browser.

🏗️ Architecture

Backend: FastAPI (Python 3.12) - Handles regex processing, tokenization (Tiktoken), and chunking logic.
Frontend: React + TailwindCSS + Vite - Interactive UI for configuration and visualization.
Processing Engine: Custom chunking logic with support for various strategies (Recursive, Semantic, Fixed-size).

✨ Feature List

Live Visualization: Real-time preview of how text is split into chunks.
Multiple Tokenizers: Support for GPT-4, Llama, and custom token counters.
Regex Playground: Test and debug custom split patterns.
Metadata Extraction: Automatically extract titles, headers, and keywords from chunks.
Provider Mocks: Integrated mocks for major LLM providers (OpenAI, Gemini, Anthropic).
Exportable Configs: Export your validated pipeline to JSON/YAML for production use.

📚 Documentation

Getting Started
API Reference (Coming Soon)
Deployment Guide (Coming Soon)

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
backend		backend
docs		docs
electron		electron
frontend		frontend
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
conftest.py		conftest.py
package-lock.json		package-lock.json
package.json		package.json
regex-reference-plan.md		regex-reference-plan.md
test_req.json		test_req.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChunkLab

🚀 Pitch

📸 Screenshot

⚡ Quickstart

Prerequisites

1. Clone & Setup Backend

Windows:

Linux/macOS:

3. Access

🏗️ Architecture

✨ Feature List

📚 Documentation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

ChunkLab

🚀 Pitch

📸 Screenshot

⚡ Quickstart

Prerequisites

1. Clone & Setup Backend

Windows:

Linux/macOS:

3. Access

🏗️ Architecture

✨ Feature List

📚 Documentation

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages