Upgrade the global cluster

consists of a global cluster and one or more workload clusters. The global cluster must be upgraded before any workload clusters.

This document walks you through the upgrade procedure for the global cluster.

If the global cluster is configured with the global DR (Disaster Recovery) solution, follow the global DR procedure strictly. Otherwise, follow the Standard procedure .

TOC

Standard procedure

Upload images

Copy the core package to any control plane node of the global cluster. Extract the package and cd into the extracted directory.

  • If the global cluster uses the built-in registry, run:

    bash upgrade.sh --only-sync-image=true
  • If the global cluster uses an external registry, you also need to provide the registry address:

    bash upgrade.sh --only-sync-image=true --registry <registry-address> --username <username> --password <password>

If you plan to upgrade the Operator and Cluster Plugin while upgrading the global cluster, please push the corresponding packages to the corresponding cluster's registry in advance. For bulk upload instructions, see Push all packages at once.

INFO

Uploading images typically takes about 2 hours, depending on your network and disk performance.

If your platform is configured for global disaster recovery (DR), remember that the standby global cluster also requires image upload. Be sure to plan your maintenance window accordingly.

WARNING

When using violet to upload packages to a standby cluster, the parameter --dest-repo <VIP addr of standby cluster> must be specified.
Otherwise, the packages will be uploaded to the image repository of the primary cluster, preventing the standby cluster from installing or upgrading extensions.

Also be awared that either authentication info of the standby cluster's image registry or --no-auth parameter MUST be provided.

For details of the violet push subcommand, please refer to Upload Packages.

Trigger the upgrade

After the image upload is complete, run the following command to start the upgrade process:

bash upgrade.sh --skip-sync-image

Wait for the script to finish before proceeding. It will take about 10–15 minutes for the upgrade button of the Functional Components tab to be available. You will then be able to upgrade the Operator and Cluster Plugin in the following upgrade instructions.

Upgrade the global cluster

  1. Log in to the Web Console of the global cluster and switch to Administrator view.
  2. Navigate to Clusters > Clusters.
  3. Click on the global cluster to open its detail view.
  4. Go to the Functional Components tab.
  5. Click the Upgrade button.

Review the available component updates in the dialog, and confirm to proceed.

INFO
  • If the Alauda Container Platform GitOps plugin is installed in the global cluster and its pods are running abnormally after the upgrade, refer to .

Install Alauda Container Platform Cluster Enhancer Plugin

INFO

This step is only to ensure that the cluster enhancer plugin is installed. If you found this cluster plugin already installed, nothing needs to be done.

  1. Navigate to Administrator.

  2. In the left sidebar, click Marketplace > Cluster Plugins and select the global cluster.

  3. Locate the Alauda Container Platform Cluster Enhancer plugin and click Install.

(Conditional) Upgrade Service Mesh Essentials

If Service Mesh v1 is installed, refer to the documentation before upgrading the workload clusters.

Post-upgrade

global DR procedure

Verify data consistency

Follow your regular global DR inspection procedures to ensure that data in the standby global cluster is consistent with the primary global cluster.

If inconsistencies are detected, contact technical support before proceeding.

On both clusters, run the following command to ensure no Machine nodes are in a non-running state:

kubectl get machines.platform.tkestack.io

If any such nodes exist, contact technical support to resolve them before continuing.

Uninstall the etcd sync plugin

  1. Access the Web Console of the standby cluster via its IP or VIP.
  2. Switch to the Administrator view.
  3. Navigate to Marketplace > Cluster Plugins.
  4. MAKE SURE you have switched to the global cluster.
  5. Find the Alauda Container Platform etcd Synchronizer plugin and Uninstall it. Wait for the uninstallation to complete.

Upload images

Perform the Upload images step on both the standby cluster and the primary cluster.

See Upload images in Standard procedure for details.

Upgrade the standby cluster

INFO

Accessing the standby cluster Web Console is required to perform the upgrade.

Before proceeding, verify that the ProductBase resource of the standby cluster is correctly configured with the cluster VIP under spec.alternativeURLs.

If not, update the configuration as follows:

apiVersion: product.alauda.io/v1alpha2
kind: ProductBase
metadata:
  name: base
spec:
  alternativeURLs:
    - https://<standby-cluster-vip>

On the standby cluster, follow the steps in the Standard procedure to complete the upgrade.

Upgrade the primary cluster

After the standby cluster has been upgraded, proceed with the Standard procedure on the primary cluster.

Reinstall the etcd sync plugin

Before reinstalling, verify that port 2379 is properly forwarded from both global cluster VIPs to their control plane nodes.

To reinstall:

  1. Access the Web Console of the standby global cluster via its IP or VIP.
  2. Switch to Administrator view.
  3. Go to Marketplace > Cluster Plugins.
  4. Select the global cluster.
  5. Locate Alauda Container Platform etcd Synchronizer, click Install, and provide the required parameters.

To verify installation:

kubectl get po -n cpaas-system -l app=etcd-sync  # Ensure pod is 1/1 Running

kubectl logs -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | awk '{print $1}' | head -1) | grep -i "Start Sync update"
# Wait until the logs contain "Start Sync update"

# Recreate the pod to trigger synchronization of resources with ownerReferences
kubectl delete po -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | awk '{print $1}' | head -1)

Check Synchronization Status

Run the following to verify the synchronization status:

curl "$(kubectl get svc -n cpaas-system etcd-sync-monitor -ojsonpath='{.spec.clusterIP}')/check"

Explanation of output:

  • "LOCAL ETCD missed keys:" – Keys exist in the primary cluster but are missing in the standby. This often resolves after a pod restart.
  • "LOCAL ETCD surplus keys:" – Keys exist in the standby cluster but not in the primary. Review these with your operations team before deletion.