feat: per-deployment internal RPC ClusterIP Service + status.rpcService#99
Merged
feat: per-deployment internal RPC ClusterIP Service + status.rpcService#99
Conversation
Adds a cluster-internal ClusterIP Service to every SeiNodeDeployment so
in-cluster consumers can dial a single stable DNS name
({deployment}-rpc.{namespace}.svc) rather than chasing ordinals on
per-node headless Services. kube-proxy L4 load-balances across ready
child pods via the existing sei.io/nodedeployment pod label.
Reconciled unconditionally — lives alongside (not replacing) the
.spec.networking / HTTPRoute path.
- API: new RpcServiceStatus / RpcServicePorts types; additive pointer
field .status.rpcService on SeiNodeDeployment.
- Generator: pure generateInternalRpcService with named ports
(rpc/evm-http/evm-ws/rest/grpc) per the milestone interface contract.
- Reconcile: new reconcileInternalRpcService invoked from the deployment
reconcile loop; populates status.rpcService in-memory for the existing
single Status().Patch() flush.
- Orphan path: retain-policy now strips the internal Service's
ownerRef alongside the external one.
- Tests: pure-generator and fake-client reconcile coverage (status
stamping, ownerRef shape, idempotency, orphan path).
Ports use "evm-http" in the Service (not seiconfig's "evm-rpc") because
the milestone interface contract fixes those names for kube-native tools.
Refs: platform#96
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concerns folded into one follow-up: 1. Rename `rpcService` → `internalService` (status field, types, Service name suffix, tests, godoc). The Service is the single internal access point; naming it mode-neutral ages better than "RPC"-scoped naming, especially with the stateful ports dropped below. 2. Drop stateful ports (evm-ws 8546, grpc 9090) from the Service and status schema. A kube-proxy L4 LB spreads connections across pods, which breaks WebSocket subscriptions and pins HTTP/2 gRPC per-connection — neither load-balances correctly. Remaining ports: rpc (26657), evm-http (8545), rest (1317) — all stateless HTTP request/response. Stateful consumers use per-node headless Services. 3. Move internal Service orphan handling out of `orphanNetworkingResources` into a new `orphanInternalService` method. The internal Service's lifecycle is unconditional; it should not be bundled with the networking-resources teardown. Added a test for `.spec.networking → nil` transitions confirming the internal Service survives. All tests green (lint + test). CRD + DeepCopy regenerated via `make manifests generate`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
Apr 17, 2026
…only ports) Picks up the post-#99 controller binary: - new SeiNodeDeployment.status.internalService field - per-deployment ClusterIP Service with the stateless HTTP port set (rpc/evm-http/rest only — evm-ws and grpc deliberately excluded) - internal Service lifecycle is independent of .spec.networking Unblocks the autobake workflow (platform repo, M2b) which reads status.internalService.name to dial the chain's RPC. Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:20a1a9038725109d434f3940797200afaf75aa44 sha256:05ee5a60d3541c10e0409086381284a1e1695aabd771a14b049de170e1ac0a37 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
Apr 17, 2026
…only ports) (#100) Picks up the post-#99 controller binary: - new SeiNodeDeployment.status.internalService field - per-deployment ClusterIP Service with the stateless HTTP port set (rpc/evm-http/rest only — evm-ws and grpc deliberately excluded) - internal Service lifecycle is independent of .spec.networking Unblocks the autobake workflow (platform repo, M2b) which reads status.internalService.name to dial the chain's RPC. Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:20a1a9038725109d434f3940797200afaf75aa44 sha256:05ee5a60d3541c10e0409086381284a1e1695aabd771a14b049de170e1ac0a37 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
Apr 17, 2026
Controller pods crash on startup with:
Failed to initialize OTel MeterProvider
error: building OTel resource: conflicting Schema URL:
https://opentelemetry.io/schemas/1.40.0 and
https://opentelemetry.io/schemas/1.26.0
resource.Merge rejects merging resources whose schema URLs differ.
resource.Default() reports schema v1.40.0 (embedded in the SDK at
v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the
schema for the custom resource overlay.
Bump the semconv import to v1.40.0 so the two schema URLs agree.
All three symbols in use here (semconv.SchemaURL, ServiceName,
ServiceVersion) are stable across semconv versions — drop-in
substitution.
Unblocks the controller image bump that #100 landed. Post-#99
controller pods stop CrashLoopBackOff and roll out cleanly, which
in turn unblocks SeiNodeDeployment.status.internalService for the
autobake workflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
Apr 17, 2026
Controller pods crash on startup with:
Failed to initialize OTel MeterProvider
error: building OTel resource: conflicting Schema URL:
https://opentelemetry.io/schemas/1.40.0 and
https://opentelemetry.io/schemas/1.26.0
resource.Merge rejects merging resources whose schema URLs differ.
resource.Default() reports schema v1.40.0 (embedded in the SDK at
v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the
schema for the custom resource overlay.
Bump the semconv import to v1.40.0 so the two schema URLs agree.
All three symbols in use here (semconv.SchemaURL, ServiceName,
ServiceVersion) are stable across semconv versions — drop-in
substitution.
Unblocks the controller image bump that #100 landed. Post-#99
controller pods stop CrashLoopBackOff and roll out cleanly, which
in turn unblocks SeiNodeDeployment.status.internalService for the
autobake workflow.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
Apr 17, 2026
Picks up the OTel schema-URL fix from #101, which unblocks the post-#99 controller image. Prior image (20a1a90) crashes at startup with: Failed to initialize OTel MeterProvider conflicting Schema URL: ...1.40.0 and ...1.26.0 Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3 sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4 Post-Flux-sync verification: kubectl -n sei-k8s-controller-system get pods # expect: all pods Running on the new image, no CrashLoopBackOff Unblocks autobake workflow (platform#101) which reads status.internalService.name from the post-#99 controller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham
added a commit
that referenced
this pull request
Apr 17, 2026
Picks up the OTel schema-URL fix from #101, which unblocks the post-#99 controller image. Prior image (20a1a90) crashes at startup with: Failed to initialize OTel MeterProvider conflicting Schema URL: ...1.40.0 and ...1.26.0 Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3 sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4 Post-Flux-sync verification: kubectl -n sei-k8s-controller-system get pods # expect: all pods Running on the new image, no CrashLoopBackOff Unblocks autobake workflow (platform#101) which reads status.internalService.name from the post-#99 controller. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a cluster-internal
ClusterIPService to everySeiNodeDeploymentso in-cluster consumers can dial a single stable DNS name —{deployment}-rpc.{namespace}.svc— rather than picking an ordinal on the per-node headless Service and handling failover themselves.kube-proxyL4 load-balances across ready child pods.Reconciled unconditionally — this path is independent of
.spec.networkingand lives alongside (not replacing) the existing externalreconcileNetworking/ HTTPRoute / External-DNS pipeline.References
wave-autobake/milestones/M1.mdInterface contract (frozen)
ClusterIP,PublishNotReadyAddresses: false.sei.io/nodedeployment=<deployment>(already stamped on child pod templates viagenerateSeiNode→spec.PodLabels[groupLabel]).OwnerReferencespoint at the parentSeiNodeDeployment(cascade delete onDeletionPolicy: Delete; orphaned viaorphanNetworkingResourcesonRetain).Files changed
api/v1alpha1/seinodedeployment_types.goRpcServiceStatus/RpcServicePortstypes;.status.rpcServicepointer field.api/v1alpha1/zz_generated.deepcopy.goconfig/crd/sei.io_seinodedeployments.yaml,manifests/sei.io_seinodedeployments.yamlinternal/controller/nodedeployment/internal_service.go(new)generateInternalRpcService(pure) +reconcileInternalRpcService(SSA, stamps status in-memory).internal/controller/nodedeployment/internal_service_test.go(new)internal/controller/nodedeployment/controller.goreconcileInternalRpcServiceunconditionally in the reconcile loop.internal/controller/nodedeployment/networking.goorphanNetworkingResourcesnow also strips ownerRef on the internal Service.Judgment calls / deviations
evm-http(not seiconfig'sevm-rpc): the milestone brief explicitly pins the five port names in the Service spec. Keeping the brief's names, which happen to be more readable to kube-native tools; the numeric port is unchanged.groupSelector); the internal Service usesgroupOnlySelectorso kube-proxy continues to route to whichever pods areReadyacross incumbents/entrants. Covered byTestGenerateInternalRpcService_SelectorIgnoresRevision.reconcileInternalRpcServicemutatesgroup.Status.RpcServicein-memory.updateStatuscalls the existingStatus().Patch()at the end of every reconcile path (including the DNS-pending early return), so no second patch was added.generateSeiNodealready writesspec.PodLabels[groupLabel] = group.Name, which flows throughnoderesource.ResourceLabelsonto the StatefulSet's pod template. No changes needed to the labeling path — the selector contract is already satisfied.Manual testing
Test plan
make lintclean.make testgreen (new tests: 9 generator + 5 reconcile/orphan cases).make manifests generateidempotent after commit.make build/make cisucceed.