---
canonical: "https://safekit.eviden.com/products/high-availability-software-for-application-clustering/kubernetes-k3s-the-simplest-high-availability-cluster-with-synchronous-replication-and-failover-between-two-redundant-servers/safekit-quick-installation-guide-with-kunenetes/"
llms_index: "https://safekit.eviden.com/llms.txt"
llms_section: "Web"
topics: "K3S High Availability with SafeKit: Install the k3s.safe Module for Failover, Configuring k3s.safe for Synchronous Replication & 2-Node SANless Clustering, SafeKit High Availability Limitations, High Availability Quick Installation Guide for Kubernetes, Overview of the SafeKit / Kubernetes solution, Installation and configuration of the SafeKit / Kubernetes solution on Linux (k3s.safe), What are the different scenarios in case of network isolation in a cluster?, 🔍 SafeKit High Availability Navigation Hub"
---

# K3S High Availability with SafeKit: Install the k3s.safe Module for Failover

## Configuring k3s.safe for Synchronous Replication & 2-Node SANless Clustering

[🧑 Contact us](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/contact-us-for-safekit/)

[🎁 SafeKit free trial](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/safekit-free-trial/)

[🏅 Free certification](https://training.my.evidian.com/mod/page/view.php?id=712)

[💰 Perpetual license cost](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/get-a-quote-safekit-en/)

## SafeKit High Availability Limitations

### Why a replication of a few Tera-bytes?

Resynchronization time after a failure ([step 3](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/file-replication-byte-level-with-failover-mirror-cluster/#mirrorcluster))

* 1 Gb/s network ≈ 3 Hours for 1 Tera-bytes.
* 10 Gb/s network ≈ 1 Hour for 1 Tera-bytes or less depending on disk write performances.

#### Alternative

* For a large volume of data, use [external shared storage](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/san-vs-nas-shared-storage-for-a-failover-cluster/).
* More expensive, more complex.

### Why a replication < 1,000,000 files?

* Resynchronization time performance after a failure ([step 3](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/file-replication-byte-level-with-failover-mirror-cluster/#mirrorcluster)).
* Time to check each file between both nodes.

#### Alternative

* Put the many files to replicate in a virtual hard disk / virtual machine.
* Only the files representing the virtual hard disk / virtual machine will be replicated and resynchronized in this case.

### Why a failover ≤ 32 replicated VMs?

* Each VM runs in an independent mirror module.
* Maximum of 32 mirror modules running on the same cluster.

#### Alternative

* Use an external shared storage and another VM clustering solution.
* More expensive, more complex.

### Why a LAN/VLAN network between remote sites?

* Automatic failover of the [virtual IP address](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/how-a-virtual-ip-address-works/) with 2 nodes in the same subnet.
* Good bandwidth for resynchronization ([step 3](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/file-replication-byte-level-with-failover-mirror-cluster/#mirrorcluster)) and good latency for [synchronous replication](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/synchronous-replication-vs-asynchronous-replication/) (typically a round-trip of less than 2ms).

#### Alternative

* Use a [load balancer for the virtual IP address](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/how-a-virtual-ip-address-works/) if the 2 nodes are in 2 subnets (supported by SafeKit, especially in the cloud).
* Use backup solutions with asynchronous replication for high latency network.

## High Availability Quick Installation Guide for Kubernetes

This guide explains how to set up a **mirror cluster** for Kubernetes using SafeKit, ensuring automatic failover and synchronous replication without the need for shared storage.

You can use the SafeKit AI 🤖 for assistance at any time.

### 1. Overview

* **Architecture:** Uses a two-node system (Primary/Secondary).
* **Data Protection:** Implements real-time synchronous replication for **zero data loss** (RPO=0).

### 2. Installation

* **Software:** Install the SafeKit engine on both servers.
* **Module:** Download the pre-configured `k3s.safe` application module.

### 3. Configuration

* **Web Console:** Configure the specific folders containing the Kubernetes files.
* **Monitoring:** Start monitoring and protecting the Kubernetes application.

## Overview of the SafeKit / Kubernetes solution

The solution is described here: **[Kubernetes K3s High Availability: 2-Node Synchronous Replication & Failover](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/kubernetes-k3s-the-simplest-high-availability-cluster-with-synchronous-replication-and-failover-between-two-redundant-servers/)**

## Installation and configuration of the SafeKit / Kubernetes solution on Linux (k3s.safe)

### 1. Download packages

* Download the free version of SafeKit 8.2 on Linux (safekitlinux\_xx.bin)
* Download the k3s.safe Linux module
* Download the k3sconfig.sh script
* Documentation (pptx)

Note: the k3sconfig.sh script installs K3S, MariaDB, NFS, SafeKit on 2 Linux Ubuntu 24.04 nodes.

[Download SafeKit (Linux) >](/products/high-availability-software-for-application-clustering/safekit-free-trial/)

[Download k3s.safe (Linux) >](/wp-content/uploads/downloads_safekit/version-82/modules_linux/k3s.safe)

[Download k3sconfig.sh >](/wp-content/uploads/downloads_safekit/version-82/modules_linux/k3sconfig.sh)

[Documentation (pptx) >](https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fwww%2Eevidian%2Ecom%2Fsafekit%2Fdownloads%2Fversion-82%2Fslides-en%2Fsafekit82-k3s-en%2Epptx)

### 2. First on both nodes

On 2 Linux Ubuntu 24.04 nodes, as root:

* Make sure the node has internet access (could be through a proxy)
* Copy k3sconfig.sh, k3s.safe and the safekit\_xx.bin package into a directory and cd into it
* Rename the .bin file as "safekit.bin"
* Make sure k3sconfig.sh and safekit.bin are executable.
* Edit the k3sconfig.sh script and customize the environment variables according to your environment (including a virtual IP)
* Execute on both nodes: `./k3sconfig.sh prereq`

The script will:

* Install required debian packages: alien, nfs-kernel-server, nfs-common, mariadb-server
* Secure mariadb installation
* Create directories for file replication
* Prepare the NFS server for sharing replicated directories
* Install SafeKit

### 3. On the first node

Execute on the first node: `./k3sconfig.sh first`

The script will:

* Create the K3S configuration database and the k3s user
* Create the replicated storage volume file (sparse file) and format it as an xfs filesystem
* Create the safekit cluster configuration and apply it
* Install and configure the k3s.safe module on the cluster
* Start the k3s module as "prim" on the first node
* Download, install and start k3s
* Download and install nfs-subdir-external-provisioner Helm chart
* Display K3S token (to be used during second node installation phase)

### 4. On the second node

Execute on the second node: `./k3sconfig.sh second <token>`

* <token> is the string displayed at the end of the "k3sconfig.sh first"  execution on the first node

The script will:

* Make sure the k3s module is started as prim on the first node
* Install k3s on the second node
* Start the k3s module

### 5. Check that the k3s SafeKit module is running on both nodes

Check with this command on both nodes: `/opt/safekit/safekit –H "*" state`

The reply should be similar to the image.

```
/opt/safekit/safekit –H "*" state
---------------- Server=http://10.0.0.20:9010 ----------------
admin action=exec
--------------------- k3s State ---------------------

  Local  (127.0.0.1)    : PRIM (Service : Available)(Color : Green)
Success
---------------- Server=http://10.0.0.21:9010 ----------------
admin action=exec
--------------------- k3s State ---------------------

  Local  (127.0.0.1)    : SECOND (Service : Available)(Color : Green)
Success
```

### 6. Start the SafeKit web console to administer the cluster

* Connect a browser to the SafeKit web console url `http://server0-IP:9010`.
* You should see a page similar to the image.
* Check with Linux command lines that K3S is started on both nodes (started in `start_prim` and `start_second`) and that MariaDB is started on the primary node (started in `start_prim`).

[![Kubernetes cluster started in the SafeKit web console](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20604%20498'%3E%3C/svg%3E "View full size")![Kubernetes cluster started in the SafeKit web console](https://safekit.eviden.com/wp-content/uploads/2024/03/14-monitoring-prim-second.png "View full size")](/wp-content/uploads/2024/03/14-monitoring-prim-second.png)

### 7. Testing

* Stop the PRIM node by scrolling down its contextual menu and clicking `Stop`.
* Verify that there is a failover on the SECOND node which should become ALONE (green).
* And with command lines on Linux, check the failover of services (stopped on node 1 in the `stop_prim` script and started on node 2 in the `start_prim` script). MariaDB and K3S should run on node2.

![Warning](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2025%2025'%3E%3C/svg%3E)![Warning](https://safekit.eviden.com/wp-content/uploads/2022/07/warning-small.png)

If ALONE (green) is not reached on node2, analyze why with the module log of node 2.

* click on `node2` to display the module log.
* [example of a SQL Server module log](/wp-content/uploads/2024/03/24-module-log-script.png) where the service name in `start_prim` is invalid. The sqlserver.exe process is monitored but as it is not started, at the end the module stops.

![Warning](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2025%2025'%3E%3C/svg%3E)![Warning](https://safekit.eviden.com/wp-content/uploads/2022/07/warning-small.png)

If everything is okay, initiate a start on node1, which will resynchronize the replicated directories from node2.

If things go wrong, stop node2 and [force the start as primary](/wp-content/uploads/2024/03/10-monitoring-mirror-stop-stop-prim.png) of node1, which will restart with its locally healthy data at the time of the stop.

![Note](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2025%2025'%3E%3C/svg%3E)![Note](https://safekit.eviden.com/wp-content/uploads/2022/07/note.png)

[More information on tests in the User's Guide.](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/safekit-user-guide-82/#tests)

[![Stop the  module on the PRIM server](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20607%20586'%3E%3C/svg%3E "View full size")![Stop the  module on the PRIM server](https://safekit.eviden.com/wp-content/uploads/2024/03/16-monitoring-prim-second-stop.png "View full size")](/wp-content/uploads/2024/03/16-monitoring-prim-second-stop.png)

### 8. Try the cluster with a Kubernetes application like WordPress

You have the example of a WordPress installation in the image: a web portal with a backend database implemented by pods.

You can deploy your own application in the same way.

WordPress is automatically highly available:

* with its data (php + database) in persistent volumes replicated in real-time by SafeKit
* with a virtual IP address to access the WordPress site for users
* with automatic failover and automatic failback

Notes:

* The WordPress chart defines a load balanced service that listens on <service.port> and <service.httpsport> ports.
* WordPress is accessible through the url: `http://<virtual-ip>:<service.port>.`
* The virtual IP is managed by SafeKit and automatically switched in case of failover.
* By default, K3S implements load balancers with Klipper.
* Klipper listens on <virtual ip>:<service.port> and routes the TCP/IP packets to the IP address and port of the WordPress pod that it has selected.

```
$ export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/wordpress --set global.storageClass=nfs-client --set service.ports.http=8099,service.ports.https=4439
```

The previous helm command should download the WordPress image from `registry-1.docker.io`. You may encounter authentication issues on `registry-1.docker.io`. In this case, you should:

* create `/etc/rancher/k3s/registries.yaml` on both nodes with inside:

  ```
  configs:
    "registry-1.docker.io":
      auth:
        username: your_user_name
        password: your_password
      tls:
        insecure_skip_verify: true
  ```
* stop and start k3s to take it into account, with `systemctl stop k3s` and `systemctl start k3s`.
* Execute `helm registry login -u your_user_name docker.io`, then enter your password

### 9. Support

* For getting support, take 2 SafeKit `Snapshots` (2 .zip files), one for each node.

![Note](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%2025%2025'%3E%3C/svg%3E)![Note](https://safekit.eviden.com/wp-content/uploads/2022/07/note.png)

[Troubleshooting in the User's Guide.](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/safekit-user-guide-82/#Troubleshooting)

[![Take the  snaphots for support](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20607%20443'%3E%3C/svg%3E "View full size")![Take the  snaphots for support](https://safekit.eviden.com/wp-content/uploads/2024/03/30-snapshots-mirror.png "View full size")](/wp-content/uploads/2024/03/30-snapshots-mirror.png)

### 10. If necessary, configure a splitbrain checker

* See below ["What are the different scenarios in case of network isolation in a cluster?"](#isolation) to know if you need to configure a splitbrain checker.
* In the module configuration, click on `Advanced Configuration` (see image) to edit `userconfig.xml`.
* Declare the splitbrain checker by adding in the `<check>` section of `userconfig.xml`:

  ```
  <service>
   ...
   <check>
    ...
    <splitbrain ident="witness" exec="ping" arg="witness IP"/>
   </check>
  ```
* `Save and apply` the new configuration to redeploy the modified userconfig.xml file on both nodes (module must be stopped on both nodes to save and apply).

**Parameters**:

* `ident="witness"` identifies the witness with a resource name: `splitbrain.witness`. You can change this value to identify the witness.
* `exec="ping"` references the ping code to execute. Do not change this value.
* `arg="witness IP"` is an argument for the ping. Change this value with the IP of the witness (a robust element, typically a router).

[![Enter the  parameters](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20530%20654'%3E%3C/svg%3E "View full size")![Enter the  parameters](https://safekit.eviden.com/wp-content/uploads/2024/03/05-module-mirror-edit-config.png "View full size")](/wp-content/uploads/2024/03/05-module-mirror-edit-config.png)

## What are the different scenarios in case of network isolation in a cluster?

### A single network

* ![SafeKit primary secondary](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20300%20169'%3E%3C/svg%3E)![SafeKit primary secondary](https://safekit.eviden.com/wp-content/uploads/2023/02/safekit-prim-second-300.png)
* ![SafeKit alone alone](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20300%20169'%3E%3C/svg%3E)![SafeKit alone alone](https://safekit.eviden.com/wp-content/uploads/2023/02/safekit-alone-alone-300.png)

When there is a network isolation, the default behavior is:

* as heartbeats are lost for each node, each node goes to ALONE and runs the application with its virtual IP address (double execution of the application modifying its local data),
* when the isolation is repaired, one ALONE node is forced to stop and to resynchronize its data from the other node,
* at the end the cluster is PRIM-SECOND (or SECOND-PRIM according the duplicate virtual IP address detection made by Windows).

### Two networks with a dedicated replication network

* ![SafeKit primary secondary](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20300%20169'%3E%3C/svg%3E)![SafeKit primary secondary](https://safekit.eviden.com/wp-content/uploads/2023/02/safekit-prim-second-300.png)
* ![SafeKit failover](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20300%20169'%3E%3C/svg%3E)![SafeKit failover](https://safekit.eviden.com/wp-content/uploads/2023/02/safekit-prim-second-300.png)

When there is a network isolation, the behavior with a dedicated replication network is:

* a dedicated replication network is implemented on a private network,
* heartbeats on the production network are lost (isolated network),
* heartbeats on the replication network are working (not isolated network),
* the cluster stays in PRIM/SECOND state.

### A single network and a splitbrain checker

* ![SafeKit primary secondary](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20300%20169'%3E%3C/svg%3E)![SafeKit primary secondary](https://safekit.eviden.com/wp-content/uploads/2023/02/safekit-prim-second-300.png)
* ![SafeKit alone wait](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20300%20169'%3E%3C/svg%3E)![SafeKit alone wait](https://safekit.eviden.com/wp-content/uploads/2023/02/safekit-alone-wait-300.png)

When there is a network isolation, the behavior with a split-brain checker is:

* a split-brain checker has been configured with the IP address of a witness (typically a router),
* the split-brain checker operates when a server goes from PRIM to ALONE or from SECOND to ALONE,
* in case of network isolation, before going to ALONE, both nodes test the IP address,
* the node which can access the IP address goes to ALONE, the other one goes to WAIT,
* when the isolation is repaired, the WAIT node resynchronizes its data and becomes SECOND.

Note: If the witness is down or disconnected, both nodes go to WAIT and the application is no more running. That's why you must choose a robust witness like a router.

## 🔍 SafeKit High Availability Navigation Hub

Explore SafeKit: Features, technical videos, documentation, and free trial

| Resource Type | Description | Direct Link |
| --- | --- | --- |
| **Key Features** | Why Choose SafeKit for Simple and Cost-Effective High Availability? | [See Why Choose SafeKit for High Availability](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#why-choose-safekit-for-ha "Discover SafeKit features for simple and cost-effective high availability") |
| **Use Cases** | Explore How SafeKit Ensures the High Availability of Critical Infrastructure | [See All Use Cases (OEM Software, Edge Servers, SCADA, and more)](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-use-cases "Explore SafeKit high availability use cases") |
| **Deployment Model** | All-in-One SANless HA: Shared-Nothing Software Clustering | [See SafeKit All-in-One SANless HA](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#all-in-one-sanless-ha "Learn about all-in-one SANless high availability with shared-nothing software clustering") |
| **HA Strategies** | SafeKit: Infrastructure (VM) vs. Application-Level High Availability | [See SafeKit HA & Redundancy: VM vs. Application Level](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-ha-redundancy-choices "Compare VM-level redundancy with SafeKit application-level high availability strategies") |
| **Technical Specifications** | Technical Limitations for SafeKit Clustering | [See SafeKit High Availability Limitations](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-ha-limitations "Technical requirements and limitations for SafeKit application clustering") |
| **Proof of Concept** | SafeKit: High Availability Configuration & Failover Demos | [See SafeKit Failover Tutorials](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-failover-tutorials "Step-by-step videos on SafeKit high availability, from installation to automated failover") |
| **Architecture** | How the SafeKit Mirror Cluster works (Real-Time Replication & Failover) | [See SafeKit Mirror Cluster: Real-Time Replication & Failover](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-mirror-cluster "See technical architecture and failover mechanism of SafeKit Mirror Cluster") |
| **Architecture** | How the SafeKit Farm Cluster works (Network Load Balancing & Failover) | [See SafeKit Farm Cluster: Network Load Balancing & Failover](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-farm-cluster "Technical overview of SafeKit Farm Cluster architecture with network load balancing") |
| **Competitive Advantages** | Comparison: SafeKit vs. Traditional High Availability (HA) Clusters | [See SafeKit vs. Traditional HA Cluster Comparison](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-ha-comparison "Detailed comparison of SafeKit software vs traditional hardware-based HA clusters") |
| **Technical Resources** | SafeKit High Availability: Documentation, Downloads & Trial | [See SafeKit HA Free Trial & Technical Documentation](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-ha-technical-resources "Access SafeKit free trial, technical documentation, and high availability white papers") |
| **Pre-configured Solutions** | SafeKit Application Module Library: Ready-to-Use HA Solutions | [See SafeKit High Availability Application Modules](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/#safekit-ha-application-modules "Browse the library of pre-configured SafeKit modules for automated application failover") |

[🧑 Contact us](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/contact-us-for-safekit/)

[🎁 SafeKit free trial](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/safekit-free-trial/)

[🏅 Free certification](https://training.my.evidian.com/mod/page/view.php?id=712)

[💰 Perpetual license cost](https://safekit.eviden.com/products/high-availability-software-for-application-clustering/get-a-quote-safekit-en/)
