Your container crashes. docker ps shows it exited with code 137. You check docker inspect expecting confirmation it ran out of memory — and it says OOMKilled: false. The host has 32 GB free. So what killed it? Exit code 137 is one of the most misread signals in Docker: it almost always means something sent your process SIGKILL, usually for memory — but the flag that's supposed to confirm it lies more often than it tells the truth. Here's what actually kills a container with 137, why your runtime never saw the limit coming, and how to stop it.
First time this bit me was in a high-frequency trading SaaS we shipped — container exited with 137 but OOMKilled showed false. I wasted an entire day chasing a flaky health check before realizing the host kernel had silently killed it.
OOMKilled: falsedoesn't mean it wasn't memory; it means the host kernelkilled it instead of Docker's cgroup. The repeat-offender cause is a cgroup-unaware runtime (Node, Python, Go) that sizes itself for the full host RAM, not the container limit. Fix: tell the runtime the truth — --max-old-space-size, worker sizing, GOMEMLIMIT — and keep the runtime ceiling at 75–80% of the container limit.What exit code 137 actually means
Start with the number, because it's not arbitrary. When a Linux process is terminated by a signal, its exit code is 128 + signal number. Signal 9 is SIGKILL. So 128 + 9 = 137.
That's the entire meaning of 137: something sent your process SIGKILL.Not SIGTERM (the polite “please shut down” your app can catch and handle) — SIGKILL, the one no process can trap, ignore, or clean up after. The kernel just stops scheduling it and reclaims its pages.
SIGKILL has a short list of senders:
- The kernel's OOM killer, when the system or a cgroup is out of memory.
- Docker / your orchestrator, when
docker stophits its timeout and escalates from SIGTERM to SIGKILL. - A failed health check in some orchestrators, which kills and restarts the container.
- A human or script running
kill -9.
In practice, for a container that dies on its own, it's almost always memory. The other causes are real but rarer, and they're easy to rule out — which is the next step.
Why docker inspect says OOMKilled: false — the two different OOM killers
Here's the part that sends people in circles. You'd expect a memory kill to set the flag:
docker inspect my-container \
--format '{{.State.ExitCode}} {{.State.OOMKilled}} {{.State.FinishedAt}}'
# → 137 false 2026-05-27T18:42:11ZExit code 137, OOMKilled: false. Contradiction? No — there are two different OOM killers, and only one of them sets that flag.
- Docker's cgroup OOM killer. If you set a container memory limit (
--memory) and the container exceeds it, the cgroup memory controller triggers the kill and Docker recordsOOMKilled: true. This is the clean case. - The host kernel's global OOM killer. When the hostruns low on memory — or when a process is killed for a reason Docker didn't mediate — the Linux kernel picks a victim and SIGKILLs it directly. Docker never sees the cgroup event, so it reports
OOMKilled: falseeven though the cause was absolutely memory.
The takeaway: OOMKilled: falsedoes not mean “not a memory problem.” It usually means you either didn't set a container memory limit at all (so the host kernel did the killing instead of the cgroup), or the host itself is under memory pressure. The flag tells you which killer fired, not whether memory was the cause.
This single flag has probably cost my teams more debugging hours than any other Docker gotcha. I've seen senior engineers burn days assuming it wasn't memory-related, only to discover the host kernel OOM killer had stepped in because no container limit was set.
dmesg -T | grep -i -E "killed process|oom" shows the OOM killer's victim, its PID, and how much memory it was using at the moment of death — the ground truth that docker inspectcan't give you when the host kernel did the killing.The real cause: your runtime has no idea what the cgroup limit is
Set the memory limit, get OOMKilled: true, raise the limit, and it comes back anyway. At this point most people conclude they have a memory leak. Sometimes they do. More often, the runtime is sizing itself for a machine that doesn't exist.
A container is not a VM. It's a process tree with a cgroup memory limit wrapped around it. But most language runtimes, when they boot, ask the kernel how much memory the machine has — and the kernel answers with the host's total RAM, not the cgroup limit. The runtime then sizes its heap, buffer pools, and worker counts for that number.
A cgroup-unaware process reads 32 GB, decides it has plenty of headroom, and grows happily past your 512 MB limit. The cgroup controller kills it the moment it crosses the line. You raise the limit to 1 GB; the runtime still thinks it has 32 GB, still has no reason to run the garbage collector aggressively, and walks right past 1 GB too. The limit isn't the bug. The mismatch between what you capped and what the runtime believes is the bug.
How much memory your container thinks it has
This is where the fix has to be runtime-specific, because each runtime gets it wrong differently.
Node.js has a V8 heap with a default old-space ceiling that is not derived from your cgroup limit. Left alone, V8 will let the old space grow toward its default cap (gigabytes on a 64-bit build) regardless of the 512 MB container limit — so V8 never feels memory pressure, never runs a full GC in time, and the cgroup kills the process first. You have to tell V8 the truth:
# Container limited to 512MB → give V8 ~75-80% and leave room for
# non-heap memory (buffers, native addons, the runtime itself).
ENV NODE_OPTIONS="--max-old-space-size=384"Python has no single heap knob, and two production traps that JVM-centric guides never mention. First, multiprocessing.cpu_count()(and anything built on it, like Gunicorn's “workers = 2 × cores + 1” rule) reads the host's core count — on a 32-core host your 512 MB container tries to fork 65 workers, each with its own interpreter and memory, and the cgroup kills it instantly. Size workers from the limit, not the host:
import os
CGROUP_V2 = "/sys/fs/cgroup/memory.max"
CGROUP_V1 = "/sys/fs/cgroup/memory/memory.limit_in_bytes"
# Values larger than this are effectively "unlimited"
UNLIMITED_THRESHOLD = 1 << 60 # ~1 exabyte
def cgroup_mem_limit_bytes():
for path in (CGROUP_V2, CGROUP_V1):
try:
with open(path) as f:
raw = f.read().strip()
# cgroup v2 unlimited
if raw == "max":
return None
value = int(raw)
# Some runtimes expose absurdly large numbers instead
if value >= UNLIMITED_THRESHOLD:
return None
return value
except (FileNotFoundError, ValueError):
continue
return None
limit = cgroup_mem_limit_bytes()
# Rough baseline:
# - sync workers: ~150–300MB each for typical Python apps
# - tune based on your actual RSS under load
MEM_PER_WORKER = 200 * 1024 * 1024
workers = max(1, limit // MEM_PER_WORKER) if limit else 2Second, Python on glibc can show alarming RSS growth that looks like a leak but is allocator fragmentation— glibc's malloc keeps per-thread arenas that inflate resident memory under concurrency. Capping the arenas often drops RSS enough to stop the kills:
ENV MALLOC_ARENA_MAX=2Go gets it more right than Node and Python, but large hosts still trip it up. Without GOMEMLIMIT, Go's GC will happily let the heap drift past your container limit on a 128 GB host before triggering a full collection. Since Go 1.19, you can fix this with one env var:
# GOMEMLIMIT is advisory: GC applies backpressure before this.
# Set to ~90% of the container limit — Go's non-heap overhead is small.
ENV GOMEMLIMIT=460MiBIt's a soft ceiling, not a hard cap — Go can exceed it briefly in bursts — so keep the container --memory limit 10–15% above this.
For completeness, the JVM solved this years ago: it's container-aware by default on modern JDKs, and you size the heap as a percentage of the limit with -XX:MaxRAMPercentage=75.0. If you're on Node or Python, you have to do that math yourself.
Debugging a 137: the commands that find the real killer
Don't guess. Walk the evidence in order — each command rules out a cause.
# 1. Confirm the exit code and which killer fired
docker inspect my-container \
--format '{{.State.ExitCode}} {{.State.OOMKilled}}'
# 2. If OOMKilled is false, ask the kernel directly
dmesg -T | grep -i -E "killed process|out of memory"
# Look for: "Killed process 12345 (node) total-vm:..., anon-rss:524288kB"
# anon-rss is how much RAM it was using when it died.
# 3. Watch live memory vs the limit (reproduce while this runs)
docker stats my-container --no-stream
# MEM USAGE / LIMIT → 511MiB / 512MiB means you're pinned at the cap
# 4. What limit is actually in effect?
docker inspect my-container --format '{{.HostConfig.Memory}}'
# 0 means NO limit set → the host kernel is your only backstop
# 5. Rule out the polite-shutdown-turned-violent case
docker logs my-container | tail # SIGTERM ignored → docker stop SIGKILLs after timeoutIf step 4 returns 0, that alone explains an OOMKilled: false 137: with no cgroup limit, the container can grow until the host is starved and the global OOM killer steps in. Set a limit and the failure at least becomes legible — OOMKilled: true, killed at a number you chose.
I still remember one production incident where docker stats showed the container pinned at 511MiB/512MiB right before it died. That single snapshot told us everything — the runtime was completely unaware of the limit and had grown aggressively until the cgroup killed it.
Why “just raise the memory limit” usually doesn't fix it
Raising the limit is the first thing everyone tries, and it's right often enough to be dangerous. It buys time when you were genuinely a little under-provisioned. It does nothing in the two cases that actually cause most repeat 137s:
- A cgroup-unaware runtime.Covered above — the process doesn't know about the old limit orthe new one, so a bigger number just moves the cliff further out. It'll walk to that one too.
- A real leak.If memory grows monotonically with uptime or request count, no limit is high enough. You've turned a fast crash into a slow one and made it harder to diagnose because it now takes hours instead of minutes.
Raising the limit is the correct fix in exactly one situation: the runtime is correctly sized, memory is stable(it plateaus, doesn't climb), and the plateau simply sits a bit above your cap. Then you were under-provisioned, and a higher limit — or right-sizing your ECS task or VM — is the answer. Sizing container memory deliberately is the flip side of the same problem we walk through in how we reduced an AWS bill by 40% without rewriting the application, where over- and under-provisioning both cost you.
The honest rule: if you don't know whether memory is stable or climbing, you're not ready to change the limit. Watch docker stats across a load cycle first.
We once raised the limit from 512M to 2G in a Python service thinking it would solve the problem. The container lasted 40 minutes longer before crashing again. The real fix came only after we capped the number of Gunicorn workers based on actual cgroup memory.
Fixing it: align the runtime to the limit
The durable fix is a loop, not a single setting. Measure, align, cap, verify.
- Measure the real working set. Run under realistic load and watch
docker stats. Note where memory plateaus — that's your floor. - Set the container limit a margin above the plateau (a working set of ~400 MB → a 512 MB limit). In Compose:compose.yaml (set the limit explicitly)
services: api: image: myapp:latest deploy: resources: limits: memory: 512M - Tell the runtime that limit —
--max-old-space-sizefor Node, worker math +MALLOC_ARENA_MAXfor Python,MaxRAMPercentagefor the JVM. The runtime's ceiling must sit below the container limit, with headroom for non-heap memory (native buffers, thread stacks, the runtime itself). Heap = container limit is a guaranteed 137; aim for 75–80%. - Verify under load, then check it didn't just move the problem onto a heavier garbage-collection cost or a slower query path.
When the container keeps restarting — the OOM loop
A single 137 is a bug. A container that keeps restartingwith 137 is usually a self-inflicted loop, and it's worth recognizing the shape because it pages people at 3 a.m.
The loop happens when the work that triggers the OOM also happens during startup — loading a large model, warming a cache, reading a big file into memory. The container boots, allocates past the limit while warming up, gets killed at 137, and the restart: alwayspolicy (or your orchestrator) dutifully starts it again into the exact same wall. Now it's flapping, burning CPU on repeated cold starts, and your health checks never go green.
Two things break the loop: fix the startup allocation (stream the file, lazy-load the model, raise the limit enough to survive warmup) andset a backoff so a crashing container doesn't hammer restarts. restart: on-failure with a sane max, rather than always, at least stops the tight spin.
One of our edtech services entered a brutal restart loop after we introduced a large in-memory cache warmup. It took us embarrassingly long to realize the OOM was happening before the health check could ever pass.
137s in Kubernetes and ECS
In Docker, the cgroup does the killing and docker inspect tells you what happened. In production orchestrators, both of those assumptions break.
Kubernetesintroduces a second OOM path that's easy to miss. A container can get a 137 because it hit its own limits — or because the noderan out of memory and the kubelet started evicting pods to save itself. These look identical from the container's perspective. OOMKilled: truedoesn't tell you which.
The eviction order follows QoS class. Guaranteed pods — where requests == limits — are evicted last. Burstable pods go before them. BestEffort pods (no requests or limits set) go first. In practice: set requests and limitsequal for memory on every critical service. Yes, it reserves capacity on lightly-loaded nodes. The alternative is watching your pod get evicted during a traffic spike because a neighbor consumed the node's RAM.
docker inspect is useless here — the pod has already been replaced. Check the orchestrator:
# Last terminated state of the container
kubectl describe pod <pod-name> | grep -A 10 "Last State"
# Reason: OOMKilled, Exit Code: 137
# Check whether node memory pressure triggered the eviction
kubectl describe node <node-name> | grep -A 5 "MemoryPressure"Amazon ECS splits into two different enforcement models depending on compute type.
On ECS + EC2, memory enforcement works like plain Docker: the cgroup limit is set by the task definition's container-level memory field. If a container exceeds it, the cgroup kills it — 137 with OOMKilled: true. But if you set memory only at the tasklevel and not per-container, Docker treats it as a soft limit. One greedy container can eat into its neighbors' headroom and push the entire EC2 host toward the kernel OOM killer — back to 137 with OOMKilled: false. Always set memory at the container level.
On ECS + Fargate, the enforcement is harder. Fargate reserves the task memory at the infrastructure layer; exceeding it terminates the task. There's no overcommit — you pay for every MB you declare, and you get killed faster if you're under-provisioned. dmesgdoesn't exist in your context. The stop reason lives in ECS events:
aws ecs describe-tasks \
--cluster my-cluster \
--tasks <task-arn> \
--query 'tasks[0].{stop:stoppedReason,containers:containers[].{name:name,exit:exitCode,reason:reason}}'
# exitCode 137 in containers[] confirms the OOM killAcross both orchestrators: set container-level limits explicitly, and check the platform's own event log first — not docker inspect.
When NOT to cap container memory
Limits are good defaults, but a wrong limit causes the exact crash you're trying to prevent. Skip or loosen the cap when:
- You haven't measured the working set yet. A limit pulled from a round number instead of
docker statsunder load is just a randomly placed cliff. Measure first, cap second. - The workload is legitimately spiky. Batch jobs, report generation, and ETL steps can have a working set many times their idle memory. Capping to the idle number guarantees a 137 the first time real data shows up. Size for the peak or run these unconstrained on a dedicated host.
- It's a single-tenant host running one workload. If the box exists to run this one container, a tight cgroup limit adds a failure mode without adding isolation you need — the host limit already bounds it.
- The workload loads a large model at startup. LLM inference and embedding services load multi-GB model weights before serving a single request. That startup footprint isthe working set — it doesn't grow with traffic. Cap below the model size and you get a 137 before the first request completes. Size to the model, not your intuition about what feels reasonable.
The point of a memory limit is isolation between noisy neighbors and a legible failure mode — not micro-optimizing RAM. If a limit isn't buying you either, a too-tight one is pure downside.
Some teams I've worked with develop a religious attachment to tiny memory limits. They end up with constant OOM kills on legitimate batch jobs. A memory limit should serve a purpose — isolation or predictability — not become a source of self-inflicted pain.
Summary
- Exit code 137 means SIGKILL (
128 + 9). For a container that dies on its own, it's almost always memory. OOMKilled: falseis not “not memory.” It means the host kernelkilled it (no limit set, or host pressure), not Docker's cgroup killer. Confirm withdmesg | grep -i oom.- The repeat-offender cause is a cgroup-unaware runtime. Node and Python read the host's RAM, size themselves for a machine they don't have, and blow past your limit. Raising the limit just moves the cliff.
- Align the runtime to the limit:
--max-old-space-size(Node), worker-sizing +MALLOC_ARENA_MAX(Python),MaxRAMPercentage(JVM),GOMEMLIMIT(Go). Keep the runtime ceiling at ~75–80% of the container limit. - A restart loop = the OOM happens during warmup.Fix the startup allocation and add restart backoff; don't let
restart: alwayshammer it. - Measure before you cap. A limit set without watching
docker statsunder load is a randomly placed cliff.
My rule after shipping dozens of services is simple: if you haven't measured the working set under load and aligned the runtime to the cgroup limit, the container isn't ready to ship.

