EN: ChunkLab is a powerful browser-based sandbox designed for developers to test, visualize, and validate text chunking pipeline configurations. Optimize your RAG (Retrieval-Augmented Generation) ingestion process with real-time feedback and detailed metrics.
ID: ChunkLab adalah sandbox berbasis browser yang dirancang bagi pengembang untuk menguji, memvisualisasikan, dan memvalidasi konfigurasi pipeline chunking teks. Optimalkan proses ingestion RAG (Retrieval-Augmented Generation) Anda dengan umpan balik real-time dan metrik mendetail.
- Python 3.12+
- Node.js 18+
git clone https://github.com/ziffan/ChunkLab.git
cd ChunkLab/backendpython -m venv .venv
.venv\Scripts\activate
source .venv/bin/activate pip install -r requirements.txt cp .env.example .env python main.py
#### 2. Setup Frontend
```bash
cd ../frontend
npm install
npm run dev
Open http://localhost:5173 in your browser.
- Backend: FastAPI (Python 3.12) - Handles regex processing, tokenization (Tiktoken), and chunking logic.
- Frontend: React + TailwindCSS + Vite - Interactive UI for configuration and visualization.
- Processing Engine: Custom chunking logic with support for various strategies (Recursive, Semantic, Fixed-size).
- Live Visualization: Real-time preview of how text is split into chunks.
- Multiple Tokenizers: Support for GPT-4, Llama, and custom token counters.
- Regex Playground: Test and debug custom split patterns.
- Metadata Extraction: Automatically extract titles, headers, and keywords from chunks.
- Provider Mocks: Integrated mocks for major LLM providers (OpenAI, Gemini, Anthropic).
- Exportable Configs: Export your validated pipeline to JSON/YAML for production use.
- Getting Started
- API Reference (Coming Soon)
- Deployment Guide (Coming Soon)
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Copyright © 2026 Ziffan (Ziffany Firdinal).

