What Job Interviews Taught Me About Kubernetes

The most common question in a Kubernetes technical interview is "what is a Pod?" The second most common is "what's the difference between a Deployment and a StatefulSet?" Both are valid. Both are almost irrelevant to 80% of the problems that actually break a cluster in production.

That bugged me for a long time. Until I realized that interviews don't measure what you can use — they measure what you can define fast under pressure. And that creates a deeply distorted map of Kubernetes — packed with API objects you'll barely touch, with almost no room for the operational decisions that actually matter.

My thesis: the gap between what interviews ask and what production demands is a valuable technical signal. Not about Kubernetes itself, but about which parts of the system are worth mastering first and which ones you can safely defer.

The Real Problem That Kubernetes Interviews Are Pointing At

Kubernetes has over 50 object types in its API. The average interview covers 8 to 12. The problem isn't coverage — it's selection.

The most frequently asked objects tend to be the easiest to define in one sentence:

Pod: minimum executable unit
Service: network abstraction over Pods
ConfigMap / Secret: external configuration
Ingress: HTTP routing

What almost never shows up in interviews but does show up in real postmortems:

PodDisruptionBudget: how many Pods can go down simultaneously during a rolling update
ResourceQuota / LimitRange: what happens when a namespace consumes more memory than expected
HorizontalPodAutoscaler with custom metrics (not CPU): scaling by message queue depth or P95 latency
Affinity and Tolerations: why your machine learning workload ends up running on the smallest node if you don't configure nodeSelector

The distance between those two lists is the map I was missing when I first started taking Kubernetes seriously.

What's Actually Worth Mastering First (And What to Look at Before Anything Else)

Let's get concrete. Before touching a production cluster — or preparing for a serious interview — there's a subset of concepts with the highest impact-to-complexity ratio:

Real Priority Checklist

hljs language-bash

# Verify a Deployment rolled out correctly
kubectl rollout status deployment/my-api

# See recent events in a namespace (first place to look when something breaks)
kubectl get events -n production --sort-by='.lastTimestamp'

# See limits and requests configured on pods in a deployment
kubectl get pods -n production -o jsonpath='{.items[*].spec.containers[*].resources}'

# Check if an HPA is active and what metrics it's using
kubectl get hpa -n production

# Validate there are no pods in CrashLoopBackOff or Pending state
kubectl get pods -n production --field-selector=status.phase!=Running

These five commands cover 70% of the initial diagnostics any team runs on day one when something breaks. They're not the most sophisticated. They're the most useful.

What to Understand Before Configuring Anything

Before you touch kubectl apply, it's worth having these straight:

Requests vs Limits: requests is what the scheduler uses to decide which node a Pod lives on. limits is the usage ceiling. If you don't configure requests, the scheduler is flying blind. If you set limits too tight, the OOMKiller will murder your process without warning.
liveness vs readiness probes: livenessProbe kills and restarts the container if it fails. readinessProbe pulls it out of the Service without killing it. Mixing them up is the most silent error that exists — the Pod restarts in a loop, or traffic hits a container that isn't ready yet.
Rolling update vs Recreate: the default strategy is RollingUpdate, but if your app can't tolerate two versions running in parallel (say, a database with non-backwards-compatible migrations), you need Recreate or a more explicit deploy strategy.

hljs language-yaml

# deploy strategy that avoids two versions running simultaneously
# useful when DB migrations are not backwards-compatible
spec:
  strategy:
    type: Recreate

Where People Go Wrong (And What It Actually Costs)

The most common mistake I see in technical conversations about Kubernetes is treating it like Docker Compose with more YAML. It isn't.

Docker Compose solves "how do I run this set of containers on this machine." Kubernetes solves "how do I distribute workloads across a cluster of nodes, with scheduling, self-healing, and resource control." Different problems, different tools.

The hidden cost of using Kubernetes when Docker Compose is enough:

Real operational overhead: a minimal functional cluster (control plane + 2 worker nodes) has fixed costs that don't disappear when there's no traffic.
Longer debugging curve: when something breaks in Docker Compose, docker logs container is enough 90% of the time. In Kubernetes, the container log is just the first layer — then come Pod events, scheduler logs, node state.
Abstractions that hide the problem: Kubernetes can restart a container that's failing in a loop without anyone noticing if there are no alerts configured. The system "works" — except that Pod has 300 restarts on it.

For a stack like this blog's — Next.js, PostgreSQL, and stateless services on Railway — Kubernetes would be overkill. Railway already handles scheduling, restart policies, and networking. Slapping a K8s cluster on top adds complexity with no visible operational gain.

The question isn't "is Kubernetes good?" It's "what specific problem does it solve in this context?"

Decision Matrix: When It Makes Sense and When It Doesn't

Before adopting Kubernetes — or before answering interview questions as if everything were K8s — it's worth running through this:

Criterion	K8s makes sense	K8s is probably overkill
Number of services	10+ microservices with independent scaling	1-5 services, same deploy cadence
Advanced scheduling needs	GPU, affinity, topology spread	Standard CPU/RAM only
Multi-tenancy	Namespaces with per-team quotas	One team, one environment
Team availability	Someone dedicated to maintaining the cluster	Everyone is backend
Current platform	On-prem or cloud without PaaS	Railway, Render, Fly.io already available
Stateful workloads	PostgreSQL with HA, Kafka, Redis cluster	External managed database

If most of your checks land in the right column, the right answer for your system probably isn't Kubernetes. And that's fine.

On the topic of when a tool is the right answer even when it seems excessive, I wrote something similar in the analysis of formal methods and the future of programming — the pattern of "this seems like too much for my case" comes up more often than you'd expect.

The Limits of What You Can Conclude Without Logs or Production Data

I want to be explicit here: everything I've written so far is pattern analysis, not my own production measurements. There are things you can't conclude without real data:

How much performance improves with K8s vs Docker Compose in a specific case: depends on node count, workload type, and networking overlay overhead. Without a reproducible experiment, any number you throw out is folklore.
Whether HPA with custom metrics works well for your use case: the Metrics Server configuration and the lag between the metric and the scale-out event varies by implementation. You have to measure it.
How much the cluster costs to operate in person-hours: highly dependent on the team, the cloud provider, and how stable the workload is. The ranges floating around in blogs ("2 hours per week") are averages from very different contexts.

What can be said with public evidence: the official Kubernetes documentation is the most up-to-date source for understanding object lifecycles. The CNCF's CKA/CKAD certification guides are the most honest map of what's considered baseline operational knowledge.

FAQ: What People Ask About Kubernetes and Technical Interviews

Do you need to know Kubernetes to get a job as a backend developer?

Depends on the role. For pure backend roles on teams with dedicated DevOps, often not. For full-stack or platform engineering roles, it's increasingly expected. The most useful signal is reading job postings for the segment you care about: if K8s shows up in more than 40% of the relevant job descriptions, it's worth investing time.

What's the difference between knowing kubectl and knowing Kubernetes?

kubectl is the command-line tool. Kubernetes is the system. Knowing kubectl without understanding the object model — what a controller loop is, what the scheduler does, how networking between Pods works — is like knowing how to write SQL without understanding what the query planner does. Works fine until something breaks.

When does it make sense to use Kubernetes for a personal project or early-stage startup?

Almost never at early-stage. The operational overhead of maintaining a cluster is real and constant. For personal projects or startups with fewer than 5 services, Railway, Fly.io, or Render give you 90% of the benefits with 10% of the complexity. The moment to migrate to K8s is when the cost of not having it — manual scaling, multi-tenancy, specialized workloads — exceeds the cost of operating it.

Do CKA/CKAD certifications actually teach what's used in production?

More than most interviews, yes. The CKA exam is hands-on: you solve real problems in a real cluster under a time limit. It's not perfect — there are production scenarios it doesn't cover — but the format is more honest than a definition quiz. The CKAD is more developer-oriented and covers exactly the subset I mentioned above: deployments, probes, resources, ConfigMaps.

What should you learn first if you're starting from zero with K8s?

In this order: (1) understand the basic object model — Pod, Deployment, Service, ConfigMap; (2) practice with minikube or kind locally; (3) read a real Kubernetes postmortem (Monzo's Engineering blog has several public ones); (4) configure liveness/readiness probes and resources on a service you own; (5) then look at Ingress, HPA, and PersistentVolumes. Skipping step 4 is the most common mistake.

Does it make sense to learn Kubernetes if I'm working with Railway or similar PaaS platforms?

Yes, but with clear expectations. Understanding K8s gives you the mental model for what Railway is doing under the hood. You won't operate the cluster directly, but you'll understand why railway up restarts the container, how health checks work, or why a crash loop gives you 30 seconds of downtime instead of 5. It's abstraction-layer knowledge — it helps you debug what the platform is hiding.

Kubernetes interviews are a noisy radar. They measure definitions, not decisions. But that noise is useful: it tells you exactly what part of the system the industry considers "expected baseline knowledge" versus what's actually learned by operating it.

My take: if you want to understand Kubernetes for real, don't study for the interview — study the failures. Public postmortems from Google, Monzo, Cloudflare, or Shopify show what actually breaks and why. That gives you an operational map that no definition quiz ever will.

What I don't buy: the idea that Kubernetes is the default answer for any system that "needs to scale." Scaling has many forms. A well-designed index in PostgreSQL, a query that stops doing N+1, or a cache in the right place can solve the problem without adding 300 lines of YAML — something that also comes up when I talk about verifiable technical decisions or authentication tokens with actual judgment.

The concrete next step if you're into this: grab a real Kubernetes postmortem — Cloudflare's 2019 outage writeup or the Monzo Engineering posts are public and detailed — read it with the checklist above in hand, and see which of those commands would have shortened the diagnosis time. That's worth more than memorizing what a DaemonSet is.

Share:X LinkedIn

Comments (0)

What do you think of this?

Drop your comment in 10 seconds.

We only use your login to show your name and avatar. No spam.

No comments yet. Be the first — your take matters most when we're few.

Jul 13 2026

Barman vs pgBackRest: a decision tree for PostgreSQL backup in production

There's no universal winner. Barman wins on simplicity and real-time WAL streaming with low operational overhead. pgBackRest wins on volume and restore speed. The criteria matter more than the tool.

Jul 13 2026 · 9′ · Tutorials · devops · produccion

9′

Jul 08 2026

Node.js: the runtime that changed how we think about backend

Node.js isn't just "JavaScript on the server." It's a paradigm shift in how we handle I/O. Thirty years in tech taught me to recognize when something genuinely moves the ground beneath your feet.

Jul 08 2026 · 5′ · Experiments · javascript · node.js

5′

Jul 07 2026

Rate limiting in web apps: what to protect before picking a library

Rate limiting is not a dependency you drop into middleware and call it done. It's an abuse policy. Before you copy that snippet, you need to define what asset you're protecting, what abuse you're expecting, and what a false positive actually costs you.

Jul 07 2026 · 10′ · Tutorials · TypeScript · nextjs

10′