Cluster Upgrade

Check List #

Prerequisites
Upgrade path TBC
Non Prod Upgrade TBC
Openshift Release Notes TBC
Operator Release Notes TBC
Proactive Tickets TBC
Get approval (CAB) TBC
Plan for Test team resources TBC
Send Communications TBC
Backup NA
Before Upgrade Checks
Check cluster health TBC
Check core micro services TBC
Check and remove duplicates TBC
Stop Traffic (Front Door) TBC
Apply the Upgrade
Cluster Upgrade TBC
Upgrade Operators TBC
** Post Upgrade Checks**
Review the status of the Cluster Version Operator TBC
Review clusteroperators, nodes, workloads TBC
Verify core microservices TBC
Tahi Smoke Tests TBC
Check ArgoCD TBC
Check 3scale TBC
Send Communications TBC
Resume Traffic (Front Door) TBC
Close Tickets TBC

Cluster Upgrade #

Prerequisites #

Verify Upgrade path #

Log in to OpenShift cluster

oc login --token=xxxxxxxxxxxxxxxxx --server=https://xxxxxxxxxxxxxxxxxx

Ensure that cluster is available:

oc get clusterversion

If we are upgrading to a next channel, set the correct channel for the version that we want to update to. In this example, updating next channel to 4.12, as current channel is 4.11:

Review the current update channel information and confirm that the channel is set to stable-4.12 :

oc get clusterversion -o json|jq ".items[0].spec"

If it’s not set to stable-4.12 , patch the channel to stable-4.12 :

 oc patch clusterversion version --type="merge" -p '{"spec":{"channel":"stable-4.12"}}'

View the available updates and note the version number of the update that we want to apply:

oc adm upgrade

Non Prod Upgrade #

Maks sure upgrade is tested in Nonprod. Nonpord Upgrade Change No :

Openshift Release Notes #

Anlyse the openshift release notes, and do necessary changes.

Operator Release Notes #

Check the installed operators for upgrade.

Proactive Tickets #

Raise Proactive support cases with Redhat and include case number Redhat Support Case No :

Get approvals (CAB) #

Raise chnage request and get approval from CAB.

Chane Request No

Plan for Test team resources #

Infor any external members for testing and make sure their availability during the change window.

Send Communications #

Send comms to stakeholders regarding the upgrade

Backup NA #

Before Upgrade Checks #

Check cluster health #

Confirm the general cluster status, no degraded or progressing operators , all pods running, all nodes ready, etc:

oc get clusterversion
oc get clusteroperators
oc get nodes -o wide
oc get pods -A | grep -v "Running\|Completed\|Terminated|\Succeeded"

Check cluster utilization

oc adm top node

Check there is enough capacity to drain nodes. Check each node for cpu/memeory requests utilization/

 oc describe node <nodename> | grep  -A10 Allocated

Check core micro services #

Check core micro services logs. Make sure services are healthy before doing any upgrade.

Check ArgoCD #

Login to each Argocd and make sure all apps are synced properly

Apply the Upgrade #

Cluster Upgrade #

Apply the upgrade

oc adm upgrade --to=<targeted version>
oc adm upgrade --to=4.12.36

Monitor the upgrade

watch -n10 "oc get clusterversion && echo && oc get co && echo && oc get nodes -o wide"

Upgrade Operators #

Upgrade any operators if required. Sometimes this may be prior to the cluster upgrade. You can verify this during operator release notes review.

Post Upgrade Checks #

Review the status of the Cluster Version Operator #

Confirm the general cluster status, no degraded or progressing operators etc:

oc get clusterversion
oc get clusteroperators

Review nodes, workloads #

Confirm the all pods running, all nodes ready, etc:

oc get nodes -o wide
oc get pods -A | grep -v "Running\|Completed\|Terminated|\Succeeded"

Check cluster utilization

oc adm top node

Verify core micro services #

Check core micro services logs. Make sure services are healthy before doing any upgrade.

Check ArgoCD #

Login to each Argocd and make sure all apps are synced properly

Send Communications #

Inform that upgrade is completed

Close Tickets #

Cluster upgrade is completed close Redhat support tickets and Change Request

Issues #

Log any issues during/post the upgrade