Skip to content

Feat/langchain vectorstore#73

Open
alessandrostone wants to merge 15 commits intomainfrom
feat/langchain-vectorstore
Open

Feat/langchain vectorstore#73
alessandrostone wants to merge 15 commits intomainfrom
feat/langchain-vectorstore

Conversation

@alessandrostone
Copy link
Copy Markdown
Contributor

InputLayerVectorStore — LangChain VectorStore interface

Summary

Adds InputLayerVectorStore, a LangChain VectorStore implementation backed by InputLayer. This makes InputLayer a drop-in replacement for Chroma, Pinecone, Weaviate, FAISS, etc. in any existing LangChain RAG tutorial or chain — change the import, keep the code.

Why this matters

LangChain's VectorStore is the most common abstraction in the LangChain ecosystem. Hundreds of tutorials, courses, and example projects assume you have a VectorStore. Until now, those flowed past InputLayer because we only offered a custom Retriever. With this PR, every VectorStore tutorial works with InputLayer by changing one import.

What's new

Component File What it does
InputLayerVectorStore integrations/langchain/vectorstore.py Full VectorStore implementation with sync + async paths

Implemented methods

Method Sync Async
from_texts (classmethod, required) yes afrom_texts
add_texts (with UUIDs, metadata, explicit ids) yes aadd_texts
add_documents yes aadd_documents
similarity_search (required) yes asimilarity_search
similarity_search_by_vector yes asimilarity_search_by_vector
similarity_search_with_score yes asimilarity_search_with_score
get_by_ids yes aget_by_ids
delete (by ids) yes adelete
as_retriever() inherited from base — works automatically

All sync methods go through the run_sync bridge, so they work safely in Jupyter, FastAPI, LangGraph, and any running event loop.

Usage

pip install inputlayer-client-dev[langchain]
from langchain_openai import OpenAIEmbeddings
from inputlayer.integrations.langchain import InputLayerVectorStore

embeddings = OpenAIEmbeddings()

# Bulk-load — same as Chroma/Pinecone/etc
store = await InputLayerVectorStore.afrom_texts(
    texts=["Python is a language", "Rust is fast"],
    embedding=embeddings,
    metadatas=[{"category": "lang"}, {"category": "lang"}],
    kg=kg,
    collection_name="my_docs",
)

# Search
docs = await store.asimilarity_search("programming", k=3)

# As a retriever in an LCEL chain — drop-in compatible
retriever = store.as_retriever(search_kwargs={"k": 5})
chain = retriever | prompt | llm | StrOutputParser()

How it stores data

A single relation per instance:

+<collection>(id: string, content: string, metadata: string, embedding: vector)
  • id — UUID by default, user-provided ids supported
  • content — the document text
  • metadata — JSON-encoded for arbitrary structure
  • embedding — the dense vector from the embeddings model

Distance is computed via InputLayer's cosine/euclidean/dot/manhattan functions in a Datalog query.

Tests

27 unit tests (tests/test_vectorstore.py) using a mock KG with realistic Datalog parsing:

  • Setup (creation, idempotency, custom collection name)
  • Add (texts, with metadata, explicit ids, documents)
  • Search (basic, k limit, with score, by vector, metadata roundtrip)
  • Get by ids (existing + missing)
  • Delete (by ids + none)
  • from_texts / afrom_texts (with metadatas, requires kg)
  • Sync bridge (all major methods via run_sync)
  • as_retriever (correct type, invokes search)

Example

examples/langchain/ex18_vectorstore.py — end-to-end demo:

  1. from_texts — bulk-load 6 documents with metadata
  2. similarity_search — find most similar docs to a query
  3. similarity_search_with_score — show distances
  4. as_retriever — wrap as a LangChain retriever
  5. Full LCEL chainretriever → prompt → llm (when LM Studio is available)

Falls back to deterministic fake embeddings when no LLM server is running, so the example always works for demos and CI.

Files changed

  • src/inputlayer/integrations/langchain/vectorstore.py — new (~330 lines)
  • src/inputlayer/integrations/langchain/__init__.py — export InputLayerVectorStore
  • tests/test_vectorstore.py — new (27 tests)
  • examples/langchain/ex18_vectorstore.py — new
  • examples/langchain/runner.py — register example test: hardening and comprehensive regression tests #18

Test plan

  • uv run pytest tests/test_vectorstore.py -v — 27 unit tests
  • uv run python -m examples.langchain.ex18_vectorstore — runs against a server, demonstrates full lifecycle
  • uv run python -m examples.langchain.runner 18 — runs via the runner

Design notes

  • Single-collection model: each InputLayerVectorStore instance maps to one relation. To use multiple collections, instantiate the store multiple times with different collection_name values.
  • Metadata as JSON: arbitrary metadata is JSON-encoded into a string column. This is the simplest cross-version-compatible approach until InputLayer supports nested types in schemas.
  • Search is currently a full scan + Python sort: a future optimization could use kg.vector_search() with HNSW indexes for large collections (~100K+ documents). For typical RAG corpora (1K-10K documents) the scan is fast enough.

… wrapper

  The old sync wrapper crashed inside running event loops (Jupyter, FastAPI,
  LangGraph). Uses a dedicated daemon thread with its own loop instead —
  same pattern as httpx. Prerequisite for LangChain integration.
  InputLayerRetriever supports raw Datalog queries with {input} placeholder
  and vector search mode. InputLayerTool exposes KG queries to LangChain agents.
  Both provide native async and sync paths via the run_sync bridge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant