refactor: SVS shared thread pool with rental model (MOD-9881)#925
refactor: SVS shared thread pool with rental model (MOD-9881)#925
Conversation
…l terminology** Rename `setNumThreads`/`getNumThreads` → `setParallelism`/`getParallelism` and `getThreadPoolCapacity` → `getPoolSize` across VectorSimilarity. Public info API fields (`numThreads`, `lastReservedThreads`, `NUM_THREADS`, `LAST_RESERVED_NUM_THREADS`) remain unchanged. No behavioral changes.
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #925 +/- ##
==========================================
- Coverage 96.98% 96.90% -0.09%
==========================================
Files 129 129
Lines 7574 7628 +54
==========================================
+ Hits 7346 7392 +46
- Misses 228 236 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
thpool tests fix fp16 tests
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2163e56. Configure here.
align bm
rfsaliev
left a comment
There was a problem hiding this comment.
LGFM, but some suggestions.
src/VecSim/algorithms/svs/svs.h
Outdated
| // Singleton accessor for the shared SVS thread pool. | ||
| // Always valid — initialized with size 1 (write-in-place mode: 0 worker threads, | ||
| // only the calling thread participates). Resized on VecSim_UpdateThreadPoolSize() calls. | ||
| static std::shared_ptr<VecSimSVSThreadPoolImpl> getSharedThreadPool() { |
There was a problem hiding this comment.
It seems like the VecSimSVSThreadPoolImpl is expected to be always used as a singleton.
If this assumption is true, then I would suggest to use more generic idiom:
class VecSimSVSThreadPoolImpl {
...
private:
explicit VecSimSVSThreadPoolImpl(size_t num_threads = 1) {...}
public:
static std::shared_ptr<VecSimSVSThreadPoolImpl> instance() {...}
...
};- Such idiom prevents non-singleton usages.
- Moves TP lifetime managing responsibilities out of SVSIndex class
Moreover, I would move the full ownership to VecSimSVSThreadPool with possible hiding VecSimSVSThreadPoolImpl to the details namespace:
class VecSimSVSThreadPool {
...
public:
explicit VecSimSVSThreadPool(void *log_ctx = nullptr)
: pool_(details::VecSimSVSThreadPoolImpl::instance()),
parallelism_(std::make_shared<std::atomic<size_t>>(1)),
log_ctx_(log_ctx) {
assert(pool_ && "Pool must not be null");
}
static auto resize(size_t new_size) {
return details::VecSimSVSThreadPoolImpl::instance()->resize(new_size);
}
...
};| void manage_workers_after_run(const std::string &main_thread_error, RentedThreads &rented, | ||
| size_t rented_count) { |
There was a problem hiding this comment.
Why there are 3 arguments if rented.count() == rented_count here?
There was a problem hiding this comment.
no good reason :)
|
|
||
| // Move-only | ||
| RentedThreads(RentedThreads &&other) noexcept : slots_(std::move(other.slots_)) {} | ||
| RentedThreads &operator=(RentedThreads &&other) noexcept { |
There was a problem hiding this comment.
Is the move operator being used?
| // to keep slots alive even if the pool shrinks while the rental is active. | ||
| class RentedThreads { | ||
| public: | ||
| RentedThreads() = default; |
There was a problem hiding this comment.
Needs to be private or should also be used outside of rent()?
|
|
||
| size_t count() const { return slots_.size(); } | ||
|
|
||
| svs::threads::Thread &operator[](size_t i) { return slots_[i]->thread; } |
There was a problem hiding this comment.
Should we add and aseertion on the i values or the outside assertion is enough?
| for (auto &slot : slots_) { | ||
| if (rented.count() >= count) { | ||
| break; | ||
| } | ||
| // Run on the main function. | ||
| try { | ||
| f(0); | ||
| } catch (const std::exception &error) { | ||
| manage_exception_during_run(error.what()); | ||
| bool expected = false; | ||
| if (slot->occupied.compare_exchange_strong(expected, true, std::memory_order_acq_rel)) { | ||
| rented.add(slot); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can be checked one time outside and use a local var for faster loop:
if (rented.count() < count) {
int rent = 0;
for (auto &slot : slots_) {
bool expected = false;
if (slot->occupied.compare_exchange_strong(expected, true, std::memory_order_acq_rel)) {
rented.add(slot);
if (++rent >= count) {
break;
}
}
}
}

Overview
Replaces per-index SVS thread pools with a single shared thread pool that can be physically resized at runtime, and introduces a rental-based execution model for
parallel_for. This eliminates over-provisioning (previously K indexes × N threads = K×N OS threads) and enables the SVS pool to actually grow/shrink when the external pool capacity changes — something the old design could not do.Also adds a new C API
VecSim_UpdateThreadPoolSize()that combines write-mode toggling with pool resizing, and deprecatesSVSParams::num_threads.Key Guarantees and Assumptions
The fundamental scheduling invariant
The shared SVS thread pool has the same size as the RediSearch worker thread pool (
search-workers). Every SVS multi-threaded operation (update, GC) is dispatched viaSVSMultiThreadJob, which submits N reserve jobs to the RediSearch worker pool — one per worker thread. Only the threads that actually check in (are not busy with other work) participate in the SVS operation. This means:rent()requests can never exceed the pool capacity, because each rented SVS thread corresponds to a RediSearch worker that checked in via a reserve job. If a worker is busy, it doesn't check in, so it's never counted inavailableThreadsand never requested from the SVS pool.setParallelism(n)is always called withn = min(availableThreads, problemSize), whereavailableThreads ≤ poolSize(). This is why the assertionn <= pool->size()is safe.ThreadSlots are held viashared_ptr; shrinking drops the pool's reference but the renter keeps the slot (and its OS thread) alive until~RentedThreadsreleases it. Resize never blocks on in-flight operations.Other invariants
VecSimSVSThreadPoolImpl::instance()). There is nonullptrstate; write-in-place mode uses a pool of size 1 (0 worker threads, calling thread only).parallelism_ >= 1— the calling thread always participates, so parallelism can never be 0.ThreadSlots are held viashared_ptr; shrinking drops the pool's reference but the renter keeps the slot (and OS thread) alive until~RentedThreads.shared_ptr<atomic<size_t>> parallelism_;setParallelism()on one index never affects another.Relationship with RediSearch
VecSim_UpdateThreadPoolSize(N)when the worker count changes, so VecSim can create/destroy the matching number of OS threads.Changes from Old Implementation
VecSimSVSThreadPoolImplVecSimSVSThreadPoolImplshared across all indexes viashared_ptrresize()clamped a logical counterresize()spawns/destroys OS threadsparallel_forthreadingatomic<bool>setNumThreads(n)/resize(n)— mutated the poolsetParallelism(n)— sets a per-indexatomic<size_t>, pool unchangedcapacity()(max pre-allocated)poolSize()(current shared pool size)manage_exception_during_run()threw unconditionallymanage_workers_after_run()only throws if an error occurredAPI Changes
Renamed methods
setNumThreads(n)setParallelism(n)SVSIndex,SVSIndexBasevirtual interfacegetNumThreads()getParallelism()SVSIndex,SVSIndexBasevirtual interfacegetThreadPoolCapacity()getPoolSize()SVSIndex,SVSIndexBasevirtual interfaceRemoved methods
VecSimSVSThreadPool::capacity()poolSize()for shared pool sizeVecSimSVSThreadPool::resize()setParallelism()for per-index control,VecSimSVSThreadPool::resize(size_t)(static) for global poolNew C API
New internal APIs
VecSimSVSThreadPoolImpl::instance()shared_ptr<VecSimSVSThreadPoolImpl>by valueVecSimSVSThreadPool(void* log_ctx)VecSimSVSThreadPool::resize(size_t)VecSimSVSThreadPool::poolSize()VecSimSVSThreadPoolImpl::rent(count, log_ctx)RentedThreads)Deprecated
VecSim_SetWriteMode()VecSim_UpdateThreadPoolSize()VecSim_UpdateThreadPoolSize) and unit tests. Not called by RediSearch — RediSearch will use onlyVecSim_UpdateThreadPoolSizeSVSParams::num_threadsVecSim_UpdateThreadPoolSize()New types
ThreadSlot— wrapssvs::threads::Thread+atomic<bool> occupied; non-copyable, non-movableRentedThreads— RAII guard; move-only; releases slots via lock-free atomic stores in destructorBehavioral Changes
setParallelism(0)resize(0)silently clamped to 1setParallelism(n > poolSize)search-workersn <= pool->size()— scheduling bugn > parallelisminparallel_forThreadingExceptionifn > size_n <= parallelism_on the wrapper — bug in SVS partitioningparallelism_initial valuelogCallback, asserts in debug, usesrented.count()for graceful degradation in releaseparallel_formanage_exception_during_run()always threwmanage_workers_after_run()only throws if an error actually occurredupdateSVSIndexWrapperlabels_to_moveis emptySVSParams::num_threadsset to non-zeroPublic info API fields — unchanged
The debug info field names
numThreadsandlastReservedThreads(and their string keysNUM_THREADS,LAST_RESERVED_NUM_THREADS) are unchanged. Their semantics shift slightly:numThreads→ now reportsgetPoolSize()(shared pool size, not per-index capacity)lastReservedThreads→ now reportsgetParallelism()(per-index parallelism for last operation)What Does NOT Change
deps/ScalableVectorSearch/) — no modificationsSVSMultiThreadJobpattern — still submits N reserve jobs to the external worker poolThreadstate machine — used as-is; lifecycle managed viashared_ptrFiles Changed
src/VecSim/algorithms/svs/svs_utils.hThreadSlot,RentedThreads, rental-basedVecSimSVSThreadPoolImpl, per-indexVecSimSVSThreadPoolwrappersrc/VecSim/algorithms/svs/svs.hnum_threadsdeprecation warningsrc/VecSim/algorithms/svs/svs_tiered.hupdateSVSIndexWrappersrc/VecSim/vec_sim.h/vec_sim.cppVecSim_UpdateThreadPoolSize()C APIsrc/VecSim/vec_sim_common.hSVSParams::num_threadsmarked as deprecated in commentstests/unit/test_svs_tiered.cppVecSim_UpdateThreadPoolSizein setup, removednum_threadsfrom params, updated assertionstests/unit/test_svs_fp16.cpptests/unit/test_svs.cppnumThreadsinfo field expects default; newNumThreadsParamIgnoredtesttests/benchmark/bm_utils.h/bm_vecsim_svs.hTesting
testThreadPoolrewritten to cover: default parallelism,setParallelismboundary assertions (ASSERT_DEATH),parallel_forwithn < parallelism, write-in-place mode, exception handlingTestDebugInfoThreadCount/TestDebugInfoThreadCountWriteInPlaceupdated:lastReservedThreadsnow starts at 1 (notnum_threads)ThreadsReservationtest usestieredIndexMock(num_threads)to properly size the shared poolNumThreadsParamIgnored— verifies that settingSVSParams::num_threadsemits a deprecation warning, does not affect the shared pool size, and that omitting it produces no warningnum_threads = 1inSVSParamsnow usetieredIndexMock(1)to resize the shared pool insteadMark if applicable
Note
High Risk
High risk because it rewrites SVS threading/concurrency primitives (shared pool, rental, resize semantics) and introduces a new global C API that changes write-mode behavior; bugs here could cause deadlocks, performance regressions, or incorrect parallel execution under load.
Overview
Refactors SVS to use a single shared, physically resizable thread pool with a rental-based
parallel_for, replacing per-index thread pools and renaming the public per-index controls fromget/setNumThreadstoget/setParallelism(andgetThreadPoolCapacitytogetPoolSize).Adds
VecSim_UpdateThreadPoolSize()to globally resize the shared SVS pool and toggle write mode (0 → in-place, >0 → async), and deprecates/ignoresSVSParams::num_threadswhile emitting a warning when set.Updates tiered SVS scheduling to use pool size for job creation, sets per-operation parallelism before backend updates/GC, skips backend locking when there’s nothing to move, and adjusts debug-info semantics/tests; adds extensive new unit tests covering pool resize/rental/concurrency and the deprecation behavior.
Reviewed by Cursor Bugbot for commit b90ac4d. Bugbot is set up for automated code reviews on this repo. Configure here.