Going Beyond Limits: Scalability Test CI for Kubernetes CNI Operator with Simulated Cluster

Sunyanan Choochotkaew
9 min readFeb 16, 2023

CNI (Container Network Interface) is a framework to configure Linux container network interfaces for Kubernetes Pods . It requires putting an executable file implementing the CNI to every Node. To do that plus enabling pod-to-pod communication, most CNI projects adopt operator framework to have a controller do the magic. When the cluster becomes larger, the controller can become a bottleneck. That’s why we need scalability test CI in CNI operator development.

Yet, do we really need to provision hundreds of nodes to run the test?

Multi-NIC CNI

Multi-NIC CNI operator is an open source project that I’m working on. This CNI helps dealing with multi-network complexity and dynamicity by leveraging Multus CNI. Similar to other CNI project, it is composed of controller, daemon process, and CNI executable file.

Multi-NIC CNI controller runs the daemon process and mounts the CNI file to every node via Kubernetes DaemonSet. The controller communicates with the daemon to automatically discover secondary interfaces of each host and create HostInterface custom resource. Defined by MultiNICNetwork, it calculates CIDR ranges for each host interface and creates CIDR and IPPool custom resources. When the pod is created or deleted, the CNI file will be executed. The CNI communicates with daemon to select network device interfaces (NIC), to allocate, and to deallocate IP from the IPPool. The communications are as shown in Fig. 1.

Fig 1. Multi-NIC CNI System Design

kwok (Kubernetes without Kubelet) and system design for scalability test

kwok (Kubernetes without Kubelet) is a toolkit that can simulate a number of fake nodes and pods that actually have no kubelet process running.

Multi-NIC CNI controller uses spec.nodeName and status.hostIP attributes of daemon pod to map the daemon to its host. These attributes are automatically set by Kubernetes control plane and immutable. Accordingly, we need the daemon pods with a number of difference node name and host IP to test the controller scalability on discovery and CIDR calculation.

With kwok, we can create a number of fake nodes and set a Multi-CNI daemon DaemonSet to be scheduled to those fake nodes. The test cluster has been designed as shown in Fig. 2. The desired scale number of daemon-stub pods are created on the real node first. This stub component will serve the same API to the real daemon implementation with a couple of fake secondary interface information. Fake nodes are then created with the daemon pod IP. Thus, any request sent to discovered multi-nicd daemon will be actually sent to the corresponding daemon-stub.

Fig 2. Scalability test design using kwok for Multi-NIC CNI operator

Modification to kwok

With current release of kwok (v0.0.1), podIP is basically set by go-cni and hostIP is statically set by a single value.

funcMap = template.FuncMap{
"NodeIP": func() string {
return n.nodeIP
},
"PodIP": func() string {
return n.ipPool.Get()
},
}

However, CNI daemon needs to share the host network (i.e., podIP and hostIP must be set by the same value of the corresponding fake node). To fix the status.hostIP of the fake multi-nicd pod to comply with its spec.nodeName, we need to modify the kwok NodeController to save mapping of node name and its IP and PodController to apply that mapping. Key modifications are as below.

// key modifications on node_controller.go  

func (c *NodeController) WatchNodes(ctx context.Context, ch chan<- string, opt metav1.ListOptions) error {
...
switch event.Type {
case watch.Added, watch.Modified:
node := event.Object.(*corev1.Node)
if c.needHeartbeat(node) {
c.nodesSets.Put(node.Name)
// add default nodeIP to map
c.nodeIPStatus.SetCache(node.Name, nil)
if c.needLockNode(node) {
ch <- node.Name
}
}
case watch.Deleted:
node := event.Object.(*corev1.Node)
if c.nodesSets.Has(node.Name) {
// delete nodeIP from map
c.nodeIPStatus.UnsetCache(node.Name)
c.nodesSets.Delete(node.Name)
}
}
}
...
}
// key modifications on pod_controller.go

func (c *PodController) configurePod(pod *corev1.Pod) ([]byte, error) {
...
podFuncMap := template.FuncMap{}
// update NodeIP/PodIP(host network) from map
hostIP := c.nodeIPStatus.GetCache(pod.Spec.NodeName)
if hostIP != nil {
podFuncMap["NodeIP"] = func() string {
return hostIP.(string)
}
if pod.Spec.HostNetwork {
podFuncMap["PodIP"] = func() string {
return hostIP.(string)
}
} else {
podFuncMap["PodIP"] = func() string {
return c.ipPool.Get()
}
...
patch, err := configurePod(pod, c.podStatusTemplate, podFuncMap)
...
}

(see full modification)

Demo with 100-scale simulated cluster

Here I demonstrated how kwok being used for 100-scale testing on the Multi-NIC CNI operator on a single-node kind cluster. Check the script and template here.

Step 1 Prepare Cluster

Let’s start from creating a kind cluster with enough max-pod. As one fake node requires at least 4x service pods (kindnet, kubeproxy, daemon-stub, multi-nicd), we set max-pod to 1000.

# ./deploy/kind/kind-1000.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
max-pods: "1000"
make create-kind

Step 2 Load images and prepare controller

Next, build and load all images to the kind cluster and deploy with build-load-image make target. Then, running prepare-controllermake target will deploy Multi-NIC CNI controller, kwok, and necessary resources (Multus’s NetworkAttachmentDefinition).

make build-load-images
make prepare-controller

The script will also patch the DaemonSet with the following fake image, kwok nodeSelector, and tolerations.

...
image: fake-image
nodeSelector:
type: kwok
tolerations:
- effect: NoSchedule
key: kwok.x-k8s.io/node
operator: Exists

Step 3 Simulate 100 nodes

./script.sh deploy_n_node 1 100

When running the script with deploy_n_node function, it will (1) create multi-nicd-stub Pods, (2) use multi-nicd-stub’s Pod IP to create fake Nodes.

> kubectl get pods -n multi-nic-cni-operator-system -owide --sort-by=.metadata.name                                                                                                                              aa404681noMacBook-Pro.local: Thu Feb 16 11:42:51 2023

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-nic-cni-operator-controller-manager-545fc6dff4-pb2m8 2/2 Running 0 10m 172.19.0.2 kind-1000-control-plane <none> <none>
multi-nicd-stub-1 1/1 Running 0 3m7s 10.244.0.6 kind-1000-control-plane <none> <none>
multi-nicd-stub-2 1/1 Running 0 3m7s 10.244.0.15 kind-1000-control-plane <none> <none>
multi-nicd-stub-3 1/1 Running 0 3m6s 10.244.0.11 kind-1000-control-plane <none> <none>
multi-nicd-stub-4 1/1 Running 0 3m7s 10.244.0.14 kind-1000-control-plane <none> <none>
multi-nicd-stub-5 1/1 Running 0 3m7s 10.244.0.13 kind-1000-control-plane <none> <none>
multi-nicd-stub-6 1/1 Running 0 3m7s 10.244.0.7 kind-1000-control-plane <none> <none>
multi-nicd-stub-7 1/1 Running 0 3m7s 10.244.0.12 kind-1000-control-plane <none> <none>
multi-nicd-stub-8 1/1 Running 0 3m7s 10.244.0.10 kind-1000-control-plane <none> <none>
multi-nicd-stub-9 1/1 Running 0 3m7s 10.244.0.9 kind-1000-control-plane <none> <none>
multi-nicd-stub-10 1/1 Running 0 3m7s 10.244.0.8 kind-1000-control-plane <none> <none>
...
> kubectl get nodes -owide --sort-by=.metadata.name                                                                                                                                                              aa404681noMacBook-Pro.local: Thu Feb 16 11:40:15 2023

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kind-1000-control-plane Ready control-plane 46m v1.25.3 172.19.0.2 <none> Ubuntu 22.04.1 LTS 5.10.104-linuxkit containerd://1.6.9
kwok-node-1 Unknown agent 53s fake 10.244.0.6 <none> <unknown> <unknown> <unknown>
kwok-node-2 Unknown agent 51s fake 10.244.0.15 <none> <unknown> <unknown> <unknown>
kwok-node-3 Unknown agent 52s fake 10.244.0.11 <none> <unknown> <unknown> <unknown>
kwok-node-4 Unknown agent 51s fake 10.244.0.14 <none> <unknown> <unknown> <unknown>
kwok-node-5 Unknown agent 51s fake 10.244.0.13 <none> <unknown> <unknown> <unknown>
kwok-node-6 Unknown agent 53s fake 10.244.0.7 <none> <unknown> <unknown> <unknown>
kwok-node-7 Unknown agent 52s fake 10.244.0.12 <none> <unknown> <unknown> <unknown>
kwok-node-8 Unknown agent 52s fake 10.244.0.10 <none> <unknown> <unknown> <unknown>
kwok-node-9 Unknown agent 52s fake 10.244.0.9 <none> <unknown> <unknown> <unknown>
kwok-node-10 Unknown agent 52s fake 10.244.0.8 <none> <unknown> <unknown> <unknown>
...

Correspondingly, 100 fake multi-nicd pods will be created and assigned to each fake node.

> kubectl get po -n multi-nic-cni-operator-system --sort-by=.spec.nodeName -owide --selector app=multi-nicd
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-nicd-4m9b6 1/1 Running 0 12m 10.244.0.6 kwok-node-1 <none> <none>
multi-nicd-d4xft 1/1 Running 0 12m 10.244.0.15 kwok-node-2 <none> <none>
multi-nicd-vzng2 1/1 Running 0 12m 10.244.0.11 kwok-node-3 <none> <none>
multi-nicd-pkrqw 1/1 Running 0 12m 10.244.0.14 kwok-node-4 <none> <none>
multi-nicd-wv6qj 1/1 Running 0 12m 10.244.0.13 kwok-node-5 <none> <none>
multi-nicd-4r9lz 1/1 Running 0 12m 10.244.0.7 kwok-node-6 <none> <none>
multi-nicd-ftn4t 1/1 Running 0 12m 10.244.0.12 kwok-node-7 <none> <none>
multi-nicd-sq4rg 1/1 Running 0 12m 10.244.0.10 kwok-node-8 <none> <none>
multi-nicd-255s5 1/1 Running 0 12m 10.244.0.9 kwok-node-9 <none> <none>
multi-nicd-vch2q 1/1 Running 0 12m 10.244.0.8 kwok-node-10 <none> <none>
...

After that, the controller will detect these fake multi-nicd daemons and creat HostInterface custom resource.

> kubectl get hostinterfaces --sort-by=.metadata.name
NAME AGE
kwok-node-1 5m16s
kwok-node-2 5m17s
kwok-node-3 5m17s
kwok-node-4 5m17s
kwok-node-5 5m17s
kwok-node-6 5m17s
kwok-node-7 5m16s
kwok-node-8 5m17s
kwok-node-9 5m16s
kwok-node-10 5m17s
...

Step 4 Deploy network definition

Let’s deploy MultiNicNetwork custom resource.

cat <<EOF | kubectl apply -f -
apiVersion: multinic.fms.io/v1
kind: MultiNicNetwork
metadata:
name: multi-nic-sample
spec:
subnet: "192.168.0.0/16"
ipam: |
{
"type": "multi-nic-ipam",
"hostBlock": 8,
"interfaceBlock": 2,
"vlanMode": "l3"
}
multiNICIPAM: true
plugin:
cniVersion: "0.3.0"
type: ipvlan
args:
mode: l3
EOF

As a result, CIDR and IPPool custom resources are expected to be created by controller.

> kubectl get cidr
NAME AGE
multi-nic-sample 8m27s
> kubectl get ippools
NAME AGE
multi-nic-sample-192.168.0.0-26 10m
multi-nic-sample-192.168.0.128-26 10m
multi-nic-sample-192.168.0.192-26 10m
...

The progress of CIDR calculation can be quickly checked with function watch_network.

./script.sh watch_network
NAME Total Available Processed Time
multi-nic-sample 100 100 100 2023-02-16T02:57:08Z

Step 5 Simulate CNI allocation request

The following script will simulate adding a new two Pods per one fake Node for the first 10 fake Nodes.

./script.sh add_pod 1 10 1 2

After the job of CNI stub finished, IPPool should be updated.

# kubectl get ippool multi-nic-sample-192.168.0.0-26 -oyaml

apiVersion: multinic.fms.io/v1
kind: IPPool
metadata:
creationTimestamp: "2023-02-16T02:57:06Z"
finalizers:
- finalizers.ippool.multinic.fms.io
generation: 3
name: multi-nic-sample-192.168.0.0-26
resourceVersion: "6793"
uid: a50e72fb-07d1-401b-aaf6-817baa6a3535
spec:
allocations:
- address: 192.168.0.1
index: 1
namespace: default
pod: pod-kwok-node-1-1
- address: 192.168.0.2
index: 2
namespace: default
pod: pod-kwok-node-1-2
excludes: []
hostName: kwok-node-1
interfaceName: eth0
netAttachDef: multi-nic-sample
podCIDR: 192.168.0.0/26
vlanCIDR: 192.168.0.0/18

Pushed to GitHub CI Actions

Having all scripts prepared, demo steps can be easily pushed to the CI.

# ./script.sh

test_step_scale() {
deploy_network 8
echo $(date -u +"%Y-%m-%dT%H:%M:%SZ")
START=$(date +%s)
time deploy_n_node 1 10
time wait_n 10
check_cidr 1 10
time deploy_n_node 11 20
time wait_n 20
check_cidr 1 20
time deploy_n_node 21 50
time wait_n 50
check_cidr 1 50
time deploy_n_node 51 100
time wait_n 100
check_cidr 1 100
time deploy_n_node 101 200
time wait_n 200
check_cidr 1 200
test_step_clean
export END=$(date +%s)
echo $((END-START)) | awk '{print "Test time: "int($1/60)":"int($1%60)}'
}
# .github/workflows/integration_test.yaml
....
env:
DAEMON_IMAGE_NAME: e2e-test/daemon-stub
CNI_IMAGE_NAME: e2e-test/cni-stub
CONTROLLER_IMAGE_NAME: e2e-test/multi-nic-cni-controller
CLUSTER_NAME: kind-500
steps:
- uses: actions/checkout@v2
- name: Prepare tools
run: |
sudo chmod +x ./e2e-test/script.sh
- name: Tidy
run: |
go mod tidy
- name: Build controller
uses: docker/build-push-action@v2
with:
context: .
push: false
tags: |
${{ env.CONTROLLER_IMAGE_NAME }}:latest
file: ./Dockerfile
- name: Build daemon-stub
uses: docker/build-push-action@v2
with:
context: e2e-test/daemon-stub
push: false
tags: |
${{ env.DAEMON_IMAGE_NAME }}:latest
file: ./e2e-test/daemon-stub/Dockerfile
- name: Build cni-stub
uses: docker/build-push-action@v2
with:
context: e2e-test/cni-stub
push: false
tags: |
${{ env.CNI_IMAGE_NAME }}:latest
file: ./e2e-test/cni-stub/Dockerfile
- uses: engineerd/setup-kind@v0.5.0
with:
wait: 300s
version: v0.11.1
image: kindest/node:v1.20.7
config: ./e2e-test/deploy/kind/kind-1000.yaml
name: ${{ env.CLUSTER_NAME }}
- name: Load images to kind
working-directory: ./e2e-test
run: make load-images
- name: Prepare controller
working-directory: ./e2e-test
run: make prepare-controller
shell: bash
- name: Test add/delete scale=200
working-directory: ./e2e-test
run: ./script.sh test_step_scale
shell: bash
- name: Test allocate/deallocate
working-directory: ./e2e-test
run: ./script.sh test_allocate
shell: bash
- name: Test taint/untaint
working-directory: ./e2e-test
run: ./script.sh test_taint
shell: bash
- name: Test resilience
working-directory: ./e2e-test
run: ./script.sh test_resilience
shell: bash
Fig 3. GitHub Action

Cheers !

Conclusion

With some modifications and stub Pod implementation, kwok can become very useful in scalability test. Here I demonstrated one use case of scalability test CI for the CNI operator. Nevertheless, these steps can be modified to suit other controllers that communicate to DaemonSet Pods, which scale based on the number of Nodes.

Last but not least,
If you’re working on running HPC/AI on Kubernetes-like cluster and want to know more about how Multi-NIC CNI deliver network parallelism in OpenShift on IBM Cloud, check out our project, page, blog post, and video !

--

--