[Draft PR] Support router as replica with pipelines by Bihan · Pull Request #3721 · dstackai/dstack

Bihan · 2026-03-31T12:32:47Z

Refer design document for this PR is here.

r4victor · 2026-04-08T06:19:13Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+
+
+class ServiceRouterWorkerSyncFetcher(Fetcher[ServiceRouterWorkerSyncPipelineItem]):
+    @sentry_utils.instrument_named_task("pipeline_tasks.ServiceRouterWorkerSyncFetcher.fetch")


I recently added @sentry_utils.instrument_pipeline_task – use it to avoid hardcoding pipeline_tasks prefix.

r4victor · 2026-04-08T06:28:33Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+            run_model = sync_row.run
+            if run_model is None:
+                await session.delete(sync_row)
+                await session.commit()
+                return


How can run_model be None here?

I thought what if the run row can be hard-deleted, so sync_row.run becomes None. If this is not possible we can delete this block.

But you defined run_id as non-optional with ondelete="CASCADE" - how can it be possible?

You are right. Maybe I delete this block.

r4victor · 2026-04-08T06:34:42Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+                .options(
+                    selectinload(RunModel.project),
+                    selectinload(RunModel.jobs).selectinload(JobModel.project),
+                    selectinload(RunModel.jobs)
+                    .selectinload(JobModel.instance)
+                    .selectinload(InstanceModel.project),
+                )
+            )


This is potentially a very inefficient select – a run can have thousands of job submissions. Select only the jobs that the processing needs, i.e. only the router replica job. Also every selectinload will be a separate query here – not sure if it's justified. joinedload may be a better suited for a one-to-one rel. Also, try to avoid loading all models's columns and use load_only to select only the necessary.

Please check if below proposed query addresses the concerns

Avoid loading thousands of job submissions: no longer load RunModel.jobs unconditionally. The selectinload(RunModel.jobs.and_(...)) restricts the loaded jobs to only RUNNING + registered replicas, which are the only ones sync_router_workers_for_run_model() can use (router job selection and worker list building both ignore non‑running / unregistered jobs).

selectinload is intentional: RunModel.jobs is a one‑to‑many collection; using joinedload would duplicate the RunModel row per job.

joinedload for one‑to‑one/many‑to‑one: RunModel.project, JobModel.project, JobModel.instance, InstanceModel.project are loaded with joinedload because these are scalar relationships from from run,job and instance.

Use load_only: This limits columns required by sync_router_workers_for_run_model(run_for_sync) and _get_service_replica_client(job_model)

res = await session.execute( select(RunModel) .where(RunModel.id == item.run_id) .options( load_only(RunModel.id, RunModel.run_spec), selectinload( RunModel.jobs.and_( JobModel.status == JobStatus.RUNNING, JobModel.registered == true(), ) ) .load_only( JobModel.id, JobModel.status, JobModel.registered, JobModel.job_spec_data, JobModel.job_provisioning_data, JobModel.job_runtime_data, ) .options( joinedload(JobModel.project).load_only(ProjectModel.id, ProjectModel.ssh_private_key), joinedload(JobModel.instance) .load_only(InstanceModel.id, InstanceModel.remote_connection_info) .joinedload(InstanceModel.project) .load_only(ProjectModel.id, ProjectModel.ssh_private_key), ), ) )

looks good, at least at a glance

r4victor · 2026-04-08T06:39:31Z

src/dstack/_internal/server/services/router_worker_sync.py

+    router_jobs = [
+        j
+        for j in run_model.jobs
+        if job_belongs_to_group(j, group_name) and j.status == JobStatus.RUNNING
+    ]
+    if not router_jobs or not is_replica_registered(router_jobs):
+        return None
+    return router_jobs[0]


Can there be multiple router jobs? If so, how does that work?

For the first iteration, I suggest restricting the router replica group to count: 1 via configuration validation. The current sync logic effectively assumes a single active router job. We can extend this later to support multiple router replicas for HA.

it's worth a comment!

r4victor · 2026-04-08T06:43:05Z

src/dstack/_internal/server/services/runs/__init__.py

+def run_spec_has_router_replica_group(run_spec: RunSpec) -> bool:
+    if run_spec.configuration.type != "service":
+        return False
+    cfg = run_spec.configuration
+    if not isinstance(cfg, ServiceConfiguration):
+        return False
+    return any(g.router is not None for g in cfg.replica_groups)
+
+
+async def ensure_service_router_worker_sync_row(


Why put these router-speicfic functions in top of runs services.

I kept it there because they are used by run lifecycle. Should I shift them to src/dstack/_internal/server/services/router_worker_sync.py?

I mean at least they should not be at the top of the file.

r4victor · 2026-04-08T06:45:29Z

src/dstack/_internal/server/services/runs/__init__.py

                            ],
                        )
                    global_replica_num += 1
+            await ensure_service_router_worker_sync_row(session, run_model, run_spec)


I think in-place update supports replicas. What happens if a user adds a router replica in in-place update if ensure_service_router_worker_sync_row() gets called only on submit_run()?

Thanks for pointing out. I need to call ensure_service_router_worker_sync_row after this

r4victor · 2026-04-08T06:46:34Z

src/dstack/_internal/server/services/runs/__init__.py

+    if not run_spec_has_router_replica_group(run_spec):
+        return
+    res = await session.execute(
+        select(ServiceRouterWorkerSyncModel.id).where(
+            ServiceRouterWorkerSyncModel.run_id == run_model.id
+        )
+    )
+    if res.scalar_one_or_none() is not None:
+        return


How can it be that ServiceRouterWorkerSyncModel already exists for a run if ensure_service_router_worker_sync_row is called only on run submit?

r4victor · 2026-04-08T06:48:48Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

+                return
+            run_model = sync_row.run
+            if run_model is None:
+                await session.delete(sync_row)


We generally use soft deletes in dstack server easier debugging and historical data. Assuming there will be very few ServiceRouterWorkerSyncModel rows (one per service replica router), I'd also soft-delete it for consistency.

r4victor · 2026-04-08T06:50:11Z

src/dstack/_internal/server/models.py

    )


+class ServiceRouterWorkerSyncModel(PipelineModelMixin, BaseModel):


Let's put it somewhere in the end of the file so that "core" models come first.

r4victor · 2026-04-08T06:52:14Z

src/dstack/_internal/server/services/job_replica_http_client.py

@@ -0,0 +1,49 @@
+"""SSH-tunneled async HTTP client to a job's service port (same path as probes)."""


put this file in jobs services?

r4victor · 2026-04-08T06:53:05Z

src/dstack/_internal/server/services/router_worker_sync.py

@@ -0,0 +1,345 @@
+"""Reconcile SGLang router /workers with dstack's registered worker replicas (async, SSH-tunneled)."""


put this file in runs services

r4victor

Did a quick review of the pipeline code. Haven't looked into the worker sync logic.

jvstme · 2026-04-09T22:06:59Z

src/dstack/_internal/server/services/job_replica_http_client.py

+
+
+@asynccontextmanager
+async def _get_service_replica_client(


(nit) This function is supposed to be imported in other modules, so it shouldn't be private (prefixed with _).

jvstme · 2026-04-09T23:00:55Z

src/dstack/_internal/core/models/configurations.py

        Field(description="The shell commands to run for replicas in this group"),
    ] = []
+    router: Annotated[
+        Optional[AnyServiceRouterConfig],


AnyServiceRouterConfig has the policy and pd_disaggregation properties that are not applicable here. I think we might need a separate class for ReplicaGroup.router, without these options.

(see this thread)

jvstme · 2026-04-09T23:21:28Z

src/dstack/_internal/server/services/router_worker_sync.py

+    try:
+        async with _get_service_replica_client(router_job) as client:
+            await _update_workers_in_router_replica(client, target_workers)
+    except Exception as e:
+        logger.warning(
+            "%s: failed to sync workers with router: %r",
+            fmt(router_job),
+            e,
+        )


except Exception may result in bugs being silenced (warnings are not reported to Sentry, so we generally don't see them).

If there are any specific exceptions that should be ignored with a warning, consider listing them explicitly in except. I assume this is the case for communication issues with the router (such as SSH or HTTP errors), because the router can become unreachable or can be misconfigured or broken by the user

jvstme · 2026-04-09T23:22:41Z

src/dstack/_internal/server/services/router_worker_sync.py

+async def _stream_response_body_bytes(resp: Response, max_bytes: int) -> bytes:
+    buf = bytearray()
+    async for chunk in resp.aiter_bytes():
+        buf.extend(chunk)
+        if len(buf) > max_bytes:
+            raise _ResponseTooLargeError()
+    return bytes(buf)


(nit) We have the join_byte_stream_checked function that appears to do the same thing

jvstme · 2026-04-09T23:57:55Z

src/dstack/_internal/proxy/gateway/services/registry.py

+    router_replicas = [r for r in service.replicas if r.is_router_replica]
+    if router_replicas:
+        replica_configs_for_nginx = [c for c in replica_configs if c.id == router_replicas[0].id]


What if the router replica is not yet registered, or temporarily unregistered (e.g., if it failed and is being restarted)? It seems that the gateway will then assume that the service doesn't have a router replica, and Nginx will direct incoming requests directly to worker replicas, which is not expected.

Also, do we actually need to register worker replicas on the gateway, considering the gateway should only communicate with the router replica? My initial proposal was not to register them, and I think that would fix the problem above, and also optimize and simplify a few things (no need for extra network communication, no need to distinguish between router and worker replicas on the gateway, etc).

jvstme · 2026-04-10T00:02:44Z

src/dstack/_internal/core/models/configurations.py

        CommandsList,
        Field(description="The shell commands to run for replicas in this group"),
    ] = []
+    router: Annotated[


Add excludes in core/compatibility, for client compatibility with older servers

jvstme · 2026-04-10T00:12:21Z

src/dstack/_internal/proxy/gateway/services/registry.py

Is there anything to prevent exposing the router replica's /workers API on Nginx?

See the second comment in this thread

jvstme · 2026-04-10T01:01:32Z

src/dstack/_internal/server/background/pipeline_tasks/service_router_worker_sync.py

If you have some time, consider covering the pipeline by unit tests. I think all (or at least most of) our existing pipelines and background tasks currently have good coverage

Bihan force-pushed the support_router_replica_with_pipelines branch from 2fe5e14 to bafd2d9 Compare April 1, 2026 07:22

Bihan requested review from jvstme and r4victor April 7, 2026 10:33

r4victor reviewed Apr 8, 2026

View reviewed changes

r4victor requested changes Apr 8, 2026

View reviewed changes

Bihan Rana added 4 commits April 9, 2026 10:27

Resolve Merge Conflict

f4c238c

Resolve pyright test

5763558

Resolve tests

f5a4c37

Optimize ServiceRouterWorkerSyncWorkerProcess select query

7b268cb

Bihan force-pushed the support_router_replica_with_pipelines branch from e155d17 to 7b268cb Compare April 9, 2026 10:36

jvstme reviewed Apr 10, 2026

View reviewed changes



		class ServiceRouterWorkerSyncFetcher(Fetcher[ServiceRouterWorkerSyncPipelineItem]):
		@sentry_utils.instrument_named_task("pipeline_tasks.ServiceRouterWorkerSyncFetcher.fetch")

		)


		class ServiceRouterWorkerSyncModel(PipelineModelMixin, BaseModel):

		@@ -0,0 +1,49 @@
		"""SSH-tunneled async HTTP client to a job's service port (same path as probes)."""

		@@ -0,0 +1,345 @@
		"""Reconcile SGLang router /workers with dstack's registered worker replicas (async, SSH-tunneled)."""

Conversation

Bihan commented Mar 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bihan Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r4victor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bihan Apr 8, 2026 •

edited

Loading