Skip to content

ziffan/ChunkLab

ChunkLab Banner

ChunkLab

Test Lint Security Scan DCO License Python React

The ultimate developer sandbox for text chunking experimentation.


🚀 Pitch

EN: ChunkLab is a powerful browser-based sandbox designed for developers to test, visualize, and validate text chunking pipeline configurations. Optimize your RAG (Retrieval-Augmented Generation) ingestion process with real-time feedback and detailed metrics.

ID: ChunkLab adalah sandbox berbasis browser yang dirancang bagi pengembang untuk menguji, memvisualisasikan, dan memvalidasi konfigurasi pipeline chunking teks. Optimalkan proses ingestion RAG (Retrieval-Augmented Generation) Anda dengan umpan balik real-time dan metrik mendetail.


📸 Screenshot

ChunkLab Dashboard

⚡ Quickstart

Prerequisites

  • Python 3.12+
  • Node.js 18+

1. Clone & Setup Backend

git clone https://github.com/ziffan/ChunkLab.git
cd ChunkLab/backend

python -m venv .venv

Windows:

.venv\Scripts\activate

Linux/macOS:

source .venv/bin/activate pip install -r requirements.txt cp .env.example .env python main.py


#### 2. Setup Frontend
```bash
cd ../frontend
npm install
npm run dev

3. Access

Open http://localhost:5173 in your browser.


🏗️ Architecture

  • Backend: FastAPI (Python 3.12) - Handles regex processing, tokenization (Tiktoken), and chunking logic.
  • Frontend: React + TailwindCSS + Vite - Interactive UI for configuration and visualization.
  • Processing Engine: Custom chunking logic with support for various strategies (Recursive, Semantic, Fixed-size).

✨ Feature List

  • Live Visualization: Real-time preview of how text is split into chunks.
  • Multiple Tokenizers: Support for GPT-4, Llama, and custom token counters.
  • Regex Playground: Test and debug custom split patterns.
  • Metadata Extraction: Automatically extract titles, headers, and keywords from chunks.
  • Provider Mocks: Integrated mocks for major LLM providers (OpenAI, Gemini, Anthropic).
  • Exportable Configs: Export your validated pipeline to JSON/YAML for production use.

📚 Documentation


📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Copyright © 2026 Ziffan (Ziffany Firdinal).

About

ChunkLab is a powerful browser-based sandbox designed for developers to test, visualize, and validate text chunking pipeline configurations. Optimize your RAG (Retrieval-Augmented Generation) ingestion process with real-time feedback and detailed metrics.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages