Kaspersky Unified Monitoring and Analysis Platform

KUMA Core availability under various scenarios

KUMA Core availability in various scenarios:

  • The worker node on which the KUMA Core service is deployed fails or loses network connectivity.

    Access to the KUMA web interface is lost. After 6 minutes, Kubernetes initiates the migration of the Core bucket to an operational node of the cluster. After the deployment, which takes less than one minute, is completed, the KUMA web interface becomes available again at URLs based on the FQDN of the load balancer. To find out which host is hosting the Core now, run the following command in the terminal of one of the controllers:

    k0s kubectl get pod -n kuma -o wide

    When the failed worker node recovers or its network connectivity is restored, the Core bucket remains on its current worker node and is not migrated back to the recovered node. The recovered node can participate in the replication of the Core service's disk volume.

  • A worker node that contains a replica of the KUMA Core disk, and which is not hosting the Core service at the moment, fails or loses network connectivity.

    The KUMA web interface remains available at URLs based on the FQDN of the load balancer. The network storage creates a replica of the currently operational Core disk volume on other healthy nodes. There is also no disruption of access to KUMA at URLs based on the FQDNs of currently operational nodes.

  • One or more cluster controllers become unavailable, but quorum is maintained.

    Worker nodes work normally. Access to KUMA is not disrupted. A failure of cluster controllers extensive enough to break quorum leads to the loss of control over the cluster.

    How many machines are needed for high availability

    Number of controllers when installing the cluster

    Minimum number (quorum) of controllers to keep the cluster operational

    How many controllers may fail without breaking quorum

    1

    1

    0

    2

    2

    0

    3

    2

    1

    4

    3

    1

    5

    3

    2

    6

    4

    2

    7

    4

    3

    8

    5

    3

    9

    5

    4

  • All controllers of the Kubernetes cluster fail simultaneously.

    Control of the cluster is lost, and the cluster is not operational.

  • Simultaneous loss of availability of all worker nodes of a cluster with replicas of the Core volume and the Core pod.

    Access to the KUMA web interface is lost. If all replicas are lost, information loss occurs.