What Job Interviews Taught Me About Kubernetes
The most common question in a Kubernetes technical interview is "what is a Pod?" The second most common is "what's the difference between a Deployment and a StatefulSet?" Both are valid. Both are almost irrelevant to 80% of the problems that actually break a cluster in production.
That bugged me for a long time. Until I realized that interviews don't measure what you can use — they measure what you can define fast under pressure. And that creates a deeply distorted map of Kubernetes — packed with API objects you'll barely touch, with almost no room for the operational decisions that actually matter.
My thesis: the gap between what interviews ask and what production demands is a valuable technical signal. Not about Kubernetes itself, but about which parts of the system are worth mastering first and which ones you can safely defer.
The Real Problem That Kubernetes Interviews Are Pointing At
Kubernetes has over 50 object types in its API. The average interview covers 8 to 12. The problem isn't coverage — it's selection.
The most frequently asked objects tend to be the easiest to define in one sentence:
- Pod: minimum executable unit
- Service: network abstraction over Pods
- ConfigMap / Secret: external configuration
- Ingress: HTTP routing
What almost never shows up in interviews but does show up in real postmortems:
- PodDisruptionBudget: how many Pods can go down simultaneously during a rolling update
- ResourceQuota / LimitRange: what happens when a namespace consumes more memory than expected
- HorizontalPodAutoscaler with custom metrics (not CPU): scaling by message queue depth or P95 latency
- Affinity and Tolerations: why your machine learning workload ends up running on the smallest node if you don't configure
nodeSelector
The distance between those two lists is the map I was missing when I first started taking Kubernetes seriously.
What's Actually Worth Mastering First (And What to Look at Before Anything Else)
Let's get concrete. Before touching a production cluster — or preparing for a serious interview — there's a subset of concepts with the highest impact-to-complexity ratio:
Real Priority Checklist
# Verify a Deployment rolled out correctly
kubectl rollout status deployment/my-api
# See recent events in a namespace (first place to look when something breaks)
kubectl get events -n production --sort-by='.lastTimestamp'
# See limits and requests configured on pods in a deployment
kubectl get pods -n production -o jsonpath='{.items[*].spec.containers[*].resources}'
# Check if an HPA is active and what metrics it's using
kubectl get hpa -n production
# Validate there are no pods in CrashLoopBackOff or Pending state
kubectl get pods -n production --field-selector=status.phase!=RunningThese five commands cover 70% of the initial diagnostics any team runs on day one when something breaks. They're not the most sophisticated. They're the most useful.
What to Understand Before Configuring Anything
Before you touch kubectl apply, it's worth having these straight:
-
Requests vs Limits:
requestsis what the scheduler uses to decide which node a Pod lives on.limitsis the usage ceiling. If you don't configurerequests, the scheduler is flying blind. If you setlimitstoo tight, the OOMKiller will murder your process without warning. -
liveness vs readiness probes:
livenessProbekills and restarts the container if it fails.readinessProbepulls it out of the Service without killing it. Mixing them up is the most silent error that exists — the Pod restarts in a loop, or traffic hits a container that isn't ready yet. -
Rolling update vs Recreate: the default strategy is
RollingUpdate, but if your app can't tolerate two versions running in parallel (say, a database with non-backwards-compatible migrations), you needRecreateor a more explicit deploy strategy.
# deploy strategy that avoids two versions running simultaneously
# useful when DB migrations are not backwards-compatible
spec:
strategy:
type: RecreateWhere People Go Wrong (And What It Actually Costs)
The most common mistake I see in technical conversations about Kubernetes is treating it like Docker Compose with more YAML. It isn't.
Docker Compose solves "how do I run this set of containers on this machine." Kubernetes solves "how do I distribute workloads across a cluster of nodes, with scheduling, self-healing, and resource control." Different problems, different tools.
The hidden cost of using Kubernetes when Docker Compose is enough:
- Real operational overhead: a minimal functional cluster (control plane + 2 worker nodes) has fixed costs that don't disappear when there's no traffic.
- Longer debugging curve: when something breaks in Docker Compose,
docker logs containeris enough 90% of the time. In Kubernetes, the container log is just the first layer — then come Pod events, scheduler logs, node state. - Abstractions that hide the problem: Kubernetes can restart a container that's failing in a loop without anyone noticing if there are no alerts configured. The system "works" — except that Pod has 300 restarts on it.
For a stack like this blog's — Next.js, PostgreSQL, and stateless services on Railway — Kubernetes would be overkill. Railway already handles scheduling, restart policies, and networking. Slapping a K8s cluster on top adds complexity with no visible operational gain.
The question isn't "is Kubernetes good?" It's "what specific problem does it solve in this context?"
Decision Matrix: When It Makes Sense and When It Doesn't
Before adopting Kubernetes — or before answering interview questions as if everything were K8s — it's worth running through this:
| Criterion | K8s makes sense | K8s is probably overkill |
|---|---|---|
| Number of services | 10+ microservices with independent scaling | 1-5 services, same deploy cadence |
| Advanced scheduling needs | GPU, affinity, topology spread | Standard CPU/RAM only |
| Multi-tenancy | Namespaces with per-team quotas | One team, one environment |
| Team availability | Someone dedicated to maintaining the cluster | Everyone is backend |
| Current platform | On-prem or cloud without PaaS | Railway, Render, Fly.io already available |
| Stateful workloads | PostgreSQL with HA, Kafka, Redis cluster | External managed database |
If most of your checks land in the right column, the right answer for your system probably isn't Kubernetes. And that's fine.
On the topic of when a tool is the right answer even when it seems excessive, I wrote something similar in the analysis of formal methods and the future of programming — the pattern of "this seems like too much for my case" comes up more often than you'd expect.
The Limits of What You Can Conclude Without Logs or Production Data
I want to be explicit here: everything I've written so far is pattern analysis, not my own production measurements. There are things you can't conclude without real data:
- How much performance improves with K8s vs Docker Compose in a specific case: depends on node count, workload type, and networking overlay overhead. Without a reproducible experiment, any number you throw out is folklore.
- Whether HPA with custom metrics works well for your use case: the Metrics Server configuration and the lag between the metric and the scale-out event varies by implementation. You have to measure it.
- How much the cluster costs to operate in person-hours: highly dependent on the team, the cloud provider, and how stable the workload is. The ranges floating around in blogs ("2 hours per week") are averages from very different contexts.
What can be said with public evidence: the official Kubernetes documentation is the most up-to-date source for understanding object lifecycles. The CNCF's CKA/CKAD certification guides are the most honest map of what's considered baseline operational knowledge.
FAQ: What People Ask About Kubernetes and Technical Interviews
Do you need to know Kubernetes to get a job as a backend developer?
Depends on the role. For pure backend roles on teams with dedicated DevOps, often not. For full-stack or platform engineering roles, it's increasingly expected. The most useful signal is reading job postings for the segment you care about: if K8s shows up in more than 40% of the relevant job descriptions, it's worth investing time.
What's the difference between knowing kubectl and knowing Kubernetes?
kubectl is the command-line tool. Kubernetes is the system. Knowing kubectl without understanding the object model — what a controller loop is, what the scheduler does, how networking between Pods works — is like knowing how to write SQL without understanding what the query planner does. Works fine until something breaks.
When does it make sense to use Kubernetes for a personal project or early-stage startup?
Almost never at early-stage. The operational overhead of maintaining a cluster is real and constant. For personal projects or startups with fewer than 5 services, Railway, Fly.io, or Render give you 90% of the benefits with 10% of the complexity. The moment to migrate to K8s is when the cost of not having it — manual scaling, multi-tenancy, specialized workloads — exceeds the cost of operating it.
Do CKA/CKAD certifications actually teach what's used in production?
More than most interviews, yes. The CKA exam is hands-on: you solve real problems in a real cluster under a time limit. It's not perfect — there are production scenarios it doesn't cover — but the format is more honest than a definition quiz. The CKAD is more developer-oriented and covers exactly the subset I mentioned above: deployments, probes, resources, ConfigMaps.
What should you learn first if you're starting from zero with K8s?
In this order: (1) understand the basic object model — Pod, Deployment, Service, ConfigMap; (2) practice with minikube or kind locally; (3) read a real Kubernetes postmortem (Monzo's Engineering blog has several public ones); (4) configure liveness/readiness probes and resources on a service you own; (5) then look at Ingress, HPA, and PersistentVolumes. Skipping step 4 is the most common mistake.
Does it make sense to learn Kubernetes if I'm working with Railway or similar PaaS platforms?
Yes, but with clear expectations. Understanding K8s gives you the mental model for what Railway is doing under the hood. You won't operate the cluster directly, but you'll understand why railway up restarts the container, how health checks work, or why a crash loop gives you 30 seconds of downtime instead of 5. It's abstraction-layer knowledge — it helps you debug what the platform is hiding.
Closing: What I Take Away From All This and What I Recommend Doing
Kubernetes interviews are a noisy radar. They measure definitions, not decisions. But that noise is useful: it tells you exactly what part of the system the industry considers "expected baseline knowledge" versus what's actually learned by operating it.
My take: if you want to understand Kubernetes for real, don't study for the interview — study the failures. Public postmortems from Google, Monzo, Cloudflare, or Shopify show what actually breaks and why. That gives you an operational map that no definition quiz ever will.
What I don't buy: the idea that Kubernetes is the default answer for any system that "needs to scale." Scaling has many forms. A well-designed index in PostgreSQL, a query that stops doing N+1, or a cache in the right place can solve the problem without adding 300 lines of YAML — something that also comes up when I talk about verifiable technical decisions or authentication tokens with actual judgment.
The concrete next step if you're into this: grab a real Kubernetes postmortem — Cloudflare's 2019 outage writeup or the Monzo Engineering posts are public and detailed — read it with the checklist above in hand, and see which of those commands would have shortened the diagnosis time. That's worth more than memorizing what a DaemonSet is.
Related Articles
How Memory Safety CVEs Differ Between Rust and C/C++
Rust has fewer memory CVEs than C/C++ — but that's not the whole story. My analysis of what that number actually says, what it doesn't, and how to turn it into a real technical decision.
Formal Methods and the Future of Programming: What's Worth Trying and Where the Ceiling Is
Formal methods keeps surfacing on the technical radar as the solution the industry ignored. My read: the problem it points to is real, but the recipe floating around omits costs that change the equation entirely.
Rio de Janeiro's "Own LLM" Looks Like a Merge: What to Read Between the Lines
A municipality announces its "own" LLM and the technical community discovers it might be a merge of an existing model. My read: the real problem isn't the fraud — it's that almost nobody knows how to verify it. Here's the checklist.
Comments (0)
What do you think of this?
Drop your comment in 10 seconds.
We only use your login to show your name and avatar. No spam.