feat: reference contextual signal encoder with model descriptor by esteininger · Pull Request #11 · IABTechLab/agentic-audiences

esteininger · 2026-04-13T18:53:04Z

Summary

The scoring service consumes embeddings but the spec has no reference for producing them. This adds a contextual signal encoder — content in, ORTB-compatible embedding out — using the model metadata fields already defined in embedding_format.schema.json (version, embedding_space_id, metric).

No new schema — this is a reference implementation of what the spec already describes.

Public lane / private lane

Ships with sentence-transformers as the "public lane" — a standardized open-source model any party can use. The EmbeddingProvider interface supports private-lane providers. The spec metadata fields travel with every embedding regardless of which lane produced it.

Output format

Uses fields from specs/v1.0/embedding_format.schema.json:

{
  "ext": {
    "ver": "1.0",
    "vector": [0.042, -0.118, "..."],
    "model": "all-MiniLM-L6-v2",
    "dimension": 384,
    "type": "context",
    "version": "2.0.0",
    "embedding_space_id": "aa://spaces/contextual/sentence-transformers/minilm-l6-v2",
    "metric": "cosine"
  }
}

Output plugs directly into POST /score as a user.data[] entry.

Designed as a foundation for benchmarking and re-embedding

The pluggable provider interface and spec-defined metadata fields support two follow-on capabilities:

1. Benchmarking — run multiple providers against the same test corpus, compare results. The interface already supports swapping encoders; the spec metadata makes results reproducible.

2. Re-embedding / upgrade path — when a new model is adopted, embedding_space_id + version identify which vectors need migration. A migration tool re-encodes through a new provider, outputs to a new space, and both run in parallel. The scoring service partitions by model name, so old and new coexist.

The provider interface is minimal (2 methods) so any embedding backend can be dropped in without changing the encoder, wire format, or scoring integration.

Changes

POST /encode endpoint (text or URL input)
Uses spec-defined metadata fields (no custom schema)
EmbeddingProvider interface for pluggable public/private backends
sentence-transformers default provider
9 tests, Dockerfile

Test plan

pytest tests/ -v — 9 passing
Review field alignment with embedding_format.schema.json

Adds a public-lane encoder that generates ORTB-compatible contextual embeddings with a model descriptor envelope (name, version, embedding space ID, metric) for cross-party interoperability. - POST /encode: text or URL → embedding segment with model metadata - Pluggable provider interface for public/private lane pattern - sentence-transformers default (shared embedding space) - 8 tests covering wire format, model descriptor, and space identification

…criptor Removed the custom ModelDescriptor type. Now uses the fields already defined in specs/v1.0/embedding_format.schema.json (version, embedding_space_id, metric) directly on the embedding ext object.

Ethan Steininger added 2 commits April 13, 2026 14:52

refactor: use spec-defined model metadata fields, not custom ModelDes…

52be04f

…criptor Removed the custom ModelDescriptor type. Now uses the fields already defined in specs/v1.0/embedding_format.schema.json (version, embedding_space_id, metric) directly on the embedding ext object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reference contextual signal encoder with model descriptor#11

feat: reference contextual signal encoder with model descriptor#11
esteininger wants to merge 2 commits intoIABTechLab:mainfrom
esteininger:feat/contextual-signal-encoder

esteininger commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esteininger commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Public lane / private lane

Output format

Designed as a foundation for benchmarking and re-embedding

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

esteininger commented Apr 13, 2026 •

edited

Loading