kubernetes-pod-migration

Live Pod Migration in Kubernetes

What Are Containers?

Modern applications run inside containers, which encapsulate software and its dependencies into a portable, consistent execution environment. Unlike virtual machines, containers share the host OS kernel, making them lightweight and efficient.

Why Kubernetes?

Managing hundreds or thousands of containers manually is impractical. Kubernetes automates container deployment, scaling, and orchestration across clusters of machines. The fundamental unit in Kubernetes is a Pod, which groups containers that work together.

However, Kubernetes does not support live pod migration—relocating a running Pod from one node to another without downtime. This is a major limitation for stateful workloads such as AI inference, databases, and real-time analytics.


Why Does Live Pod Migration Matter?

Currently, when a Pod must be moved due to node failures, autoscaling, or resource rebalancing, Kubernetes follows a terminate-and-recreate model:

For stateful applications, this results in downtime, performance degradation, and potential data loss. Live migration would:

img

✔ Preserve application state, network connections, and execution progress.

✔ Enable smoother autoscaling and resource optimization.

✔ Improve fault tolerance without service interruption.


How Kubernetes Handles Pod Movement Today

Default Rescheduling (Current Behavior)

Forensic Checkpointing (Introduced in Kubernetes v1.25)

img

Third-Party Solutions

No Kubernetes-native solution for live pod migration exists today.
Our research aims to close this gap.


Our Research: Building a Kubernetes-Native Live Migration System

We analyzed Kubernetes Enhancement Proposals (KEPs) to understand how new features integrate with the ecosystem.

Key Components

What We Built

A proof-of-concept live migration system that:

The design document for one of the PoCs is here: https://docs.google.com/document/d/1n4tEj2LaNzL7lkq6jqTy4O-3v2dTfrn0lIdWDaeifnM/edit?tab=t.0#heading=h.nxlv0abv8hql


Next Steps & Research Directions

We are currently improving:

Join the discussion and contribute to the project! 🚀