Saturday, April 4, 2026

Deep Dive of Kubernetes Network

 


K8S is a dynamic network. Pods are ephemeral.  IP change on every restart.

Containers with in the pod shared a single network namespace.

K8S networking Model:

1) Every Pod receive a unique and cluster wide IP address.

2) All pods on the same node can communicate directly without NAT

3) All pods on different nods can communicate directly without NAT

4) A Pod self seen IP is identical to the IP other pods use to reach it [Flat network]

Kubernetes specifies what is required and CNI plugins decide How to implement it

Communication pattern in K8S

Container to Container - within same pod via loopbackup [127.0.0.1]

Pod to Pod - Direct IP communication across nodes without address translation

Pod to Service - Kube proxy intercepts traffic and load balancing to healthy end points

External to Service - Exposed via NodePort, LoadBalance type or Ingress controller

Node to Pod - Kubelet and monitoring agents


Kube-Proxy:

Kube proxy runs on every node as a DaemonSet and part of the Kubernetes control plane. It watches API sever for any change of resource or end points. API server initate a end point object when selector create a resourece.  Kube proxy is maintaining a chain of IP table mode. I used to maintain local and forward routing.  IPVS is a kernel level virutal load balancer. It will handle thousand of service request and routing at a same time.

Pod to service will take care of kube proxy and pod to pod communication will take care of CNI.

CoreDNS:

CoreDNS is the cluster DNS server and deployed as a deployment in the kube system namespace.

Every pod of /etc/resolv.conf is inject to point into CoreDNS.

Pod Networking:

Each pod has an Own network namespace and fully isolated stack. The namespace contain virutal vNICs, routing table and iptable rules.

Infra [pause] container creates and own a network namespace for the pod. All application containers in the Pod share the Infra container namespace at startup.

Virtual [veth] pair : Two virtual NICs connect between Pod and Node side. One end lives inside the Pod's network namespace [eth0] and other end is attached to Node like linux bridge [cbr0]

Traffic flow : Pod [eth0] -> veth pair -> host bridge -> node routing table -> destination 

Cross Node communication:

Node to Node communication is used Overlay approach and Underlay approach.

Overlay approach is encapsulated a traffic and decapsulated from destionation node.

Underlay approach is a direct routing method.

Modern CNIs like calico & cilium will support both approach.

Overlay (VxLAN/Geneve) - It is universal compatibility and cloud friendly. It will support upto 50 bytes per packet if MTU set to 1450

Underlay - It required physical network to accept and route through BGP routing

No comments:

Post a Comment