federation: split controlplane hook into pre- and post-deploy#3847
federation: split controlplane hook into pre- and post-deploy#3847vakwetu wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Switch the Stage 5 post_stage_run from federation-controlplane-config.yml to federation-controlplane-config-postdeploy.yml. The post-deploy hook is designed for pipelines where the OpenStackControlPlane CR already exists. It reads the live OSCP to determine the CA bundle secret name, preserves any existing secret data, and merges in the Keycloak CA cert before patching caBundleSecretName when not already set. The original federation-controlplane-config.yml (pre-deploy hook) is intended for the component pipeline where the OSCP does not yet exist at hook execution time. Relates-to: OSPCIX-1321 Depends-On: openstack-k8s-operators/ci-framework#3847 Signed-off-by: Ade Lee <alee@redhat.com> Co-authored-by: Claude Sonnet 4.6 <no-reply@anthropic.com>
xek
left a comment
There was a problem hiding this comment.
Nice split — separating the pre-deploy and post-deploy cases makes this much cleaner.
One concern: hook_controlplane_config_postdeploy.yml Step 5 still uses kubernetes.core.k8s state: patched for the OpenStackControlPlane CR. This is the same pattern that was hitting the "Failed to find exact match for core.openstack.org/v1beta1.OpenStackControlPlane" bug in the component pipeline. The SKMO/architecture pipeline may run a different kubernetes Python client version so it might work there, but it's the same underlying resource-discovery issue (kubernetes-client/python#1536, kubernetes.core#553 — both wontfix upstream).
If the SKMO pipeline is hitting the same error, the fix would be to replace Step 5 with oc patch using the plural resource name:
- name: Patch OpenStackControlPlane to set caBundleSecretName (when unset)
when: not cifmw_federation_oscp_has_ca_bundle | bool
ansible.builtin.command:
cmd: >-
oc patch openstackcontrolplanes.core.openstack.org controlplane
-n {{ cifmw_federation_run_osp_cmd_namespace }}
--type merge
-p '{"spec":{"tls":{"caBundleSecretName":"{{ cifmw_federation_ca_bundle_secret_name }}"}}}'
environment:
KUBECONFIG: "{{ cifmw_openshift_kubeconfig }}"
PATH: "{{ cifmw_path }}"
changed_when: trueKey detail: oc patch needs the plural name openstackcontrolplanes (not singular openstackcontrolplane). The CRD defines plural: openstackcontrolplanes, singular: openstackcontrolplane, shortNames: [osctlplane, oscp].
Also — the postdeploy file is missing a Step 7 that patches the OSCP with the Keystone federation config (httpdCustomization, customServiceConfig). The original had this as Step 8. Without it, the SKMO pipeline won't actually configure Keystone for OIDC.
For the pre-deploy file: this looks correct — the component pipeline doesn't need to patch the OSCP directly since kustomize_deploy handles it.
The pre-deploy fix should unblock OSPCIX-1321 for the component pipeline. Our PR #3845 has the same fix for the pre-deploy path (plus the IPA kube-rbac-proxy image fix for the LDAP job) — happy to close ours in favor of this one if you want, or we can keep the IPA fix separate.
xek
left a comment
There was a problem hiding this comment.
Correction on my earlier comment — you're right that the root cause is the execution flow, not the kubernetes.core.k8s resource discovery or plural/singular naming.
In the component pipeline the hook runs at pre_deploy stage. Before 398d55b, this only wrote kustomization files and secrets — kustomize_deploy would apply them later when creating the controlplane. After 398d55b, Steps 5 and 8 tried to directly patch the OSCP, but the CRD isn't even registered with the API server at pre_deploy time. The k8s_info returns ok with an empty resource list, _federation_oscp_has_ca_bundle resolves to False, and the patch task fires against a non-existent resource type.
The split into pre-deploy (kustomization-only) and post-deploy (direct patching for SKMO) is the right fix. state: patched in the postdeploy file should work fine since in the SKMO pipeline the OSCP will actually be running by then.
LGTM for the pre-deploy revert. Our PR #3845 can be narrowed down to just the IPA kube-rbac-proxy image fix (separate issue from the federation breakage).
c163ce6 to
d2313eb
Compare
|
tested - this woks both for SKMO and the keystone component jobs. |
evallesp
left a comment
There was a problem hiding this comment.
I have some problems reviewing this.
I see that we have mismatch cert name inside the secret.
ALso the way the secret name in the post is calculated, is make me hard to find if it boths run in the same pipeline the post adapts to the name created in the pre (by reading OSCP).
But if both are running in different pipelines, then it has different naming but we don't mind.
I think the certificate name inside the secret should be the same to avoid duplication, but I might be wrong.
| type: Opaque | ||
| metadata: | ||
| name: keystone-httpd-override | ||
| name: keycloakca |
There was a problem hiding this comment.
(non-blocking) question: Should this match keycloak-ca.crt ?
Here it creates: data: {KeyCloakCA: }
And in the post it merges at:
data: >-
{{
cifmw_federation_ca_bundle_existing_data |
combine({'keycloak-ca.crt': cifmw_federation_sso_ca.content})
}}
so: data: {KeyCloakCA: , keycloak-ca.crt: } ?
There was a problem hiding this comment.
No, it doesn't have to match. The way this works is you define a secret that contains all your custom ca certs. In this case, that secret would be keycloakca.
libcommon will read that secret and iterate over all the fields and extract the contents and add those to a chain of ca certs. The field names do not matter at all.
There was a problem hiding this comment.
@evallesp - the two playbooks run in different pipelines. One runs in the keystone federation component job and the other runs in the SKMO uni-job. There would be no reason to run this twice.
The pre-config job just reverts to what was there before so that we can be sure that the pre-existing component job just runs as before.
The post-config job is the one with newer, more robust code - as it needs to be because the control plane already exists for this case.
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4f67b41cf27e4774a52daef7851faa70 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 14m 12s |
The existing hook_controlplane_config.yml assumed the OpenStackControlPlane (OSCP) was not yet present, which is correct for the component / CRC pipeline. The SKMO integration pipeline runs after kustomize_deploy has already brought up the OSCP, so it needs a different hook that reads the live OSCP, preserves existing CA bundle data, and patches the resource in place. Split into two task files: - hook_controlplane_config.yml (pre-deploy / component pipeline, unchanged) - hook_controlplane_config_postdeploy.yml (post-deploy / SKMO pipeline) Add a new playbook federation-controlplane-config-postdeploy.yml that wraps the new post-deploy task file. The architecture/automation/vars/multi-namespace-skmo.yaml is updated separately to call the new post-deploy playbook. Fix variable names in both task files to use the required cifmw_ prefix so they pass the var-naming[pattern] rule enforced by ansible-lint. Relates-To: OSPCIX-1321 Signed-off-by: Ade Lee <alee@redhat.com> Co-authored-by: Claude Sonnet 4.6 <no-reply@anthropic.com> Made-with: Cursor
Recent changes were made to hook_controlplane_config.yml to accomodate adding federation to the SKMO deployment. This deployment differs though, in that the control plane already exists in the SKMO case. The changes broke the existing component job.
The cases are sufficiently different that its just cleaner to just use two different playbooks. That way, we do not have to worry about getting different conditionals right. This PR just does this - reverting the existing playbook to its original state (for the component job) and adding a new one for the SKMO case.
split into two task files:
Add a new playbook federation-controlplane-config-postdeploy.yml that wraps the new post-deploy task file.
The architecture/automation/vars/multi-namespace-skmo.yaml is updated separately to call the new post-deploy playbook.
Fix variable names in both task files to use the required cifmw_ prefix so they pass the var-naming[pattern] rule enforced by ansible-lint.
Related-To: OSPCIX-1321