Should I use Alpine to reduce Docker image size?

For simple services with no native modules — possibly. For Node.js services that use bcrypt, sharp, or canvas; Python ML services; or PHP with compiled extensions — no. Alpine uses musl libc, and precompiled binaries and PyPI manylinux wheels are built for glibc. The mismatch causes runtime failures that don't always show up in CI. Use -slim instead: it's glibc-based, 3-5x smaller than the full image, and doesn't carry Alpine's compatibility risk.

What is a multi-stage Docker build and when do I need one?

A multi-stage build uses multiple FROM instructions in one Dockerfile. The first stage (the builder) installs the full build toolchain and compiles or transpiles the application. The second stage (the runtime) starts from a clean base and copies only the output from the builder. Build tools, dev dependencies, and intermediate files never make it into the final image. You need one whenever your build process requires tools that shouldn't be present at runtime — which is basically every compiled or transpiled service.

What is a distroless Docker image?

Distroless images (maintained by Google at gcr.io/distroless) contain only the application runtime and its direct dependencies — no shell, no package manager, no extra OS utilities. They're smaller than -slim images and harder to exploit because the attack surface is minimal. The downside: you can't docker exec into a running container for debugging. A good choice for security-conscious production services, but only after your service is stable enough that you're not regularly shelling into it.

Reduce Docker Image Size: Measure First, Then Cut

Q: How does Docker layer caching work and why does order matter?

Each instruction in a Dockerfile creates a layer. Docker caches each layer and reuses it on the next build if the instruction and all preceding layers are unchanged. If you put COPY . . before RUN npm install, every source file change invalidates the install cache — even if package.json didn't change. Putting dependency files first (COPY package*.json ./ → RUN npm ci → COPY . .) means the install only re-runs when the lockfile changes, which is far less often than source code.

Your Node or Python service deploys fine locally. Then you check ECR and the image is 1.1 GB. The next ECS deploy takes two minutes just on the pull. A cold-start instance takes another ninety seconds before it's healthy. You search “reduce docker image size” and land on a listicle: “use Alpine, use multi-stage builds, use .dockerignore.” All true. None of them explain how to find what's actually inflating your image — which is rarely where the list says it is.

On a healthcare analytics platform I built, our Node API image had crept to 1.8 GB. Auto-scaling events caused intermittent 502s — not from the service itself, but from cold starts taking so long the health check timed out before the first request was ever served.

TL;DR

Run docker history and dive before you change anything — most images have one or two obvious fat layers the listicles miss. Multi-stage builds are the highest-leverage fix: a TypeScript API can drop from ~800 MB to ~180 MB. Use -slim as your default base; Alpine works fine for pure services but breaks silently when native modules enter the picture. Layer order and a missing .dockerignore together can easily add two minutes to every CI build.

Why your Docker image is 1 GB — and where the weight actually hides

Most bloated images have the same handful of causes. The top listicles always nail the wrong one as the biggest contributor.

Build tools left in the runtime layer. gcc, make, build-essential, the full Python dev headers, the Node.js build toolchain for native modules — these get installed to compile something, and then they stay. On a Debian base, installing build-essential alone adds ~200 MB.

Development dependencies. npm install with no --omit=dev, or a pip install -r requirements.txtthat includes pytest, black, and mypy. You're shipping your test suite to production.

A heavyweight base image. node:24is ~1.1 GB. It's based on Debian Bookworm and includes Python, curl, wget, git, and everything you'd want in a dev environment. You don't want that in a runtime image.

Unused intermediate layers. Docker layers are additive. A RUN apt-get install followed by RUN apt-get clean in a separateinstruction still carries the full package cache in the first layer. The cache isn't removed — it's just obscured. The image size doesn't shrink.

Bad — the apt cache lives in layer 2 even after layer 3 removes it

RUN apt-get update
RUN apt-get install -y build-essential
RUN apt-get clean

Good — one layer, cache never persists

RUN apt-get update \
    && apt-get install -y --no-install-recommends build-essential \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

The biggest surprise, when I first ran docker history on a production automation service, was leftover build tools adding 350 MB from a single native dependency — one package installed for a compile step that nobody cleaned up.

Measure first: `docker history` and `dive` find the fat before you cut

Every top-10 list jumps straight to “use Alpine.” Resist that impulse. Until you know which layeris responsible for your image size, you're guessing — and you might spend two hours on Alpine compatibility only to discover the real fat was your 300 MB node_modules of dev dependencies.

docker historyis the starting point. It's built in and takes five seconds:

docker-history.sh — find the big layers fast

docker history --no-trunc \
  --format "{{.Size}}\t{{.CreatedBy}}" \
  myimage:latest \
  | sort -rh \
  | head -20

That shows you layers sorted by size. It'll tell you “this 400 MB layer is the RUN npm installinstruction.” That's your target.

For anything less obvious — especially layers from a base image you don't control — dive is worth the thirty seconds to install:

install-dive.sh

# macOS
brew install dive

# Linux
DIVE_VERSION=$(curl -sL "https://api.github.com/repos/wagoodman/dive/releases/latest" \
  | grep '"tag_name"' | cut -d '"' -f4)
curl -Lo /tmp/dive.tar.gz \
  "https://github.com/wagoodman/dive/releases/download/${DIVE_VERSION}/dive_${DIVE_VERSION#v}_linux_amd64.tar.gz"
tar -xf /tmp/dive.tar.gz -C /usr/local/bin dive

dive-usage.sh — inspect layer-by-layer

dive myimage:latest
# Navigate layers on the left, see the file system diff on the right.
# Watch for: large /root/.cache, /tmp build artifacts, .git directories,
# test fixtures, dev dependencies that shouldn't be in the runtime image.

diveshows you the file system delta at each layer. You're looking for large directories that shouldn't be in a runtime image: /root/.cache/pip, .npm, .gradle, test fixtures, or a .git directory you forgot to exclude.

The moment that stuck with me: dive revealed 220 MB of pip wheel caches sitting in /root/.cache — invisible to docker history, accumulated across multiple dependency installs, and completely pointless in a runtime image.

Multi-stage builds: keep the build toolchain out of the runtime

This is the single highest-leverage change for most services. The build toolchain exists to produce an artifact. The artifact alone goes to production.

A multi-stage Docker build flow: the builder stage carries the full toolchain and dev dependencies at roughly 800 MB, while only the compiled artifact is copied forward into a roughly 180 MB runtime stage, keeping the build tooling out of production. — The builder's 800 MB of toolchain only exists to produce the artifact — the artifact alone is what ships.

Node.js example — TypeScript API down from ~800 MB to ~180 MB:

Dockerfile.node — multi-stage build for a Node API

FROM node:24-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:24-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY package*.json ./
RUN npm ci --omit=dev
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]

The builder stage carries node_modules with dev deps, TypeScript, and the compiler. The runtime stage gets only the production node_modules and the compiled output. The build toolchain never touches the final image.

Python wheels can be pre-built in a builder and copied to a slimmer runtime, avoiding the need to carry build headers in the final layer:

Dockerfile.python — build wheels in one stage, copy to slim runtime

FROM python:3.13-bookworm AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

FROM python:3.13-slim-bookworm AS runtime
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

For a Python data processing service, this refactor took us from 950 MB to 265 MB. The friction was real — untangling tightly coupled build steps in an existing Dockerfile takes more thought than a greenfield one — but once the first conversion was done, the pattern became muscle memory. Every new service on that platform started multi-stage from day one.

Picking a base image: the honest size-vs-pain comparison

Not all small base images are equal, and the tradeoffs aren't obvious from the size numbers alone.

Base image	Approx size	C library	Native module support	Recommendation
`node:24`	~1.1 GB	glibc	Full	Too big for production
`node:24-slim`	~230 MB	glibc	Good	Safe production default
`node:24-alpine`	~55 MB	musl	OK if no native modules	Risky with `bcrypt`, `sharp`, `canvas`
`gcr.io/distroless/nodejs20-debian12`	~170 MB	glibc	Good	Security hardening, no shell
`python:3.13-slim-bookworm`	~130 MB	glibc	Good	Safe production default
`python:3.13-alpine`	~50 MB	musl	OK if no native modules	Risky with `confluent-kafka`, NumPy, PyTorch
`php:8.3-fpm-alpine`	~70 MB	musl	Risky	Extensions must rebuild from source

A decision tree for choosing a Docker base image: if the service uses native modules or compiled extensions, use a glibc-based slim image; otherwise pick distroless for maximum security hardening or Alpine for pure APIs and workers, each leading to a production-ready image. — Native deps in play? Stay on glibc -slim. Alpine only earns its place when nothing has to compile.

The -slim variants of official images are almost always the right call for production services that use any native extension. They strip the dev tools but keep glibc, which is what most native code expects.

Distroless is worth it when you need the security hardening of no shell and no package manager — but you trade away the ability to docker exec into a running container for debugging. That matters more than most teams think until something breaks at 2 a.m.

My personal defaults after shipping dozens of services: node:24-alpine or python:3.13-alpine for services without native extensions, *-slim when native modules enter the picture, and public.ecr.aws/lambda/python:3.13for Lambda functions. I've shipped distroless in a high-security e-commerce environment — it worked, but we had to invest significantly more in observability before we could give up the ability to exec into a container.

The alpine musl trap — where it bites Node, Python, and Laravel

Alpine is everywhere in Docker tutorials because the numbers are impressive: 55 MB instead of 1.1 GB. For pure services — API workers, queue consumers, Redis/DocumentDB clients with no native extensions — Alpine works fine in production. The trap is assuming that extends to everything.

Alpine uses musl libc instead of glibc. Most native binaries and precompiled wheels are built against glibc. When you run them on musl, one of three things happens: it works, it produces subtly wrong output, or it crashes with a linker error. The third scenario is the one tutorials skip.

Warning

Do not use Alpine for services that depend on native modules or precompiled binaries unless you have verified each one compiles and runs correctly on musl. The runtime cost of debugging a musl-glibc mismatch in production is far higher than the storage savings from a smaller image.

Node.js. Native npm packages — bcrypt, sharp, canvas, anything built with node-gyp — either fail to install because musl lacks the system libs they need, or they install but crash at runtime. The node-pre-gyp download path usually 404s on Alpine; you fall back to compiling from source, which means installing build-essential again. Your Alpine image is no longer small.

Python.PyPI serves platform-specific wheels for manylinux (glibc-based). Alpine/musl doesn't match any manylinux tag. pip falls back to building from source, which requires the C headers — and some packages (NumPy, cryptography, anything with Fortran dependencies) have build times that wreck your CI pipeline. ML images with PyTorch are a common victim.

Laravel/PHP. PHP extensions compiled against glibc are not reusable on Alpine. The official php:8.3-alpine image requires you to rebuild every extension (pdo_pgsql, redis, imagick) from source inside the Alpine environment. Buildable, but imagick in particular has had Alpine build failures across major versions.

In a media processing service, sharp and canvas on Alpine caused segmentation faults that only appeared under load — not in dev, not in a quick smoke test, only when real traffic hit the image processing pipeline. It took nearly four days to trace back to a musl-glibc mismatch in a precompiled binary. After that I moved all media services to -slimand haven't gone back.

Layer caching: order your Dockerfile so the cache actually hits

Docker's layer cache is one of the most valuable tools for fast CI builds — and one of the most commonly wasted. The rule is simple: anything that changes frequently goes at the bottom; anything that changes rarely goes at the top.

The worst pattern is COPY . . before RUN npm install. Every source file change invalidates the install cache. A 90-second npm install runs on every single commit.

Side-by-side comparison of Dockerfile instruction order: the bad order copies all source before installing dependencies so every commit busts the cache, while the good order copies the manifest and installs first, then copies source, so the install stays cached. — COPY . . before the install busts the cache on every commit — one reorder is the entire fix.

The fix is one reorder:

Dockerfile — cache-friendly layer order

FROM node:24-slim AS builder
WORKDIR /app

# These change rarely — copy first so they cache
COPY package*.json ./
RUN npm ci

# Source changes every commit — copy last
COPY . .
RUN npm run build

Now npm ci only re-runs when package.json or package-lock.json changes. Every other commit hits the cache and skips the install entirely.

The same principle applies everywhere. Python: requirements.txt before COPY .. Go: go.mod + go.sum before source. Dependencies before source, always.

Reordering layers cut our average Docker build from over two minutes to under one — across every service on the platform, at zero infrastructure cost.

`.dockerignore`is the cheapest 50% you'll ever get

A .dockerignore tells Docker what not to send to the build context. Every file you COPY . . without a .dockerignore gets included — .git, node_modules, test fixtures, coverage reports, .env files, local build output.

On a Node.js project with a populated node_modules, COPY . . without .dockerignore can copy 800 MB into the build context before npm ci even runs. That context transfers from your machine (or CI runner) to the Docker daemon on every build.

Docker tells you the context size on every build: => [internal] load build context ... 1.2GB. If that number is large, .dockerignore is your fastest win.

.dockerignore — baseline for a Node.js project

.git
.gitignore
node_modules
npm-debug.log*
dist
coverage
.env
.env.*
*.test.ts
*.spec.ts
docs
README.md
.dockerignore

For Python, add __pycache__, .venv, .pytest_cache, *.pyc, htmlcov. For Next.js, add .next — it gets built inside the container.

The production cost of fat images: ECR storage, pull bandwidth, cold starts

Image size is treated as a vanity metric. It isn't. The downstream costs are measurable.

A 1 GB Docker image fans out into three production costs: ECR storage at roughly one dollar per month per service, inter-AZ pull bandwidth at roughly six dollars per month at twenty deploys a day, and cold-start latency of 60 to 90 seconds versus 8 to 12 seconds for a 200 MB image. — Storage is the cheap part — pull bandwidth and cold-start 502s are what a fat image actually costs you.

ECR storageis $0.10/GB/month. A 1 GB image with 10 tags is $1/month — cheap in isolation. Multiply by 20 services and you're at $200/month before you count ECR data transfer.

Pull bandwidthis the more painful one for high-deploy teams. ECS tasks in private subnets pull from ECR through a NAT Gateway or VPC endpoint. Inter-AZ data transfer is $0.01/GB. At 20 deploys per day with three tasks per service, a 1 GB image costs ~$6/month in transfer for that single service alone. A 200 MB image costs ~$1.20. That math matters when you're running the kind of cost optimization covered in how we reduced an AWS bill by 40% without rewriting the application — image size is a frequently-missed line item on the ECR and data transfer side.

Cold-start latency is the one that directly affects users. When ECS launches a new task — an auto-scaling event, a deploy, a restart after an OOMKill — it has to pull the image before the container starts. A 1 GB image on a fresh instance can take 60–90 seconds over a standard 1 Gbps link. A 200 MB image takes 8–12 seconds. That's the difference between a scaling event that serves traffic before your health check timeout and one that doesn't.

When NOT to obsess over image size

Image size is worth optimizing — but not in all cases, and chasing the smallest possible number can introduce real production risk.

ML inference and embedding services.A fine-tuned LLaMA 7B checkpoint is several gigabytes before you open a single connection. There's no Alpine trick that removes the weights. Time spent on image size here is better spent moving the model weights to EFS or pulling from S3 at startup, so they're not baked into the image at all.

Services with complex native build requirements.If you're spending days debugging musl-glibc incompatibility across a suite of native PHP extensions, the engineering cost has already exceeded the storage savings. Use -slim and move on.

Single-use batch jobs.If the container runs for ten seconds and is never kept warm, cold-start latency is irrelevant. Don't apply the same scrutiny you'd give a long-running API.

When the image already pulls from cache. If your ECS tasks are stable enough that most pulls are cache hits at the layer level (common with Fargate), the transfer cost of new layers is small. Optimize when you have pull latency evidence, not preemptively.

The goal isn't the smallest image — it's the image that starts fast, runs reliably, and doesn't cost you an incident because you picked the wrong libc.

Summary

A five-step Docker image optimization pipeline: measure with docker history and dive, cut the build context with dockerignore, reorder layers so dependencies come first, split into multi-stage builds, and pick a slim base image, ending in a fast, reliable, cheap optimized image. — Measure first, base image last — that order kills 80% of the bloat before the Alpine-vs-slim debate even starts.

Measure before you cut. Run docker history --format "{{.Size}}\t{{.CreatedBy}}" sorted by size. Then diveto see what's inside the big layers. Most images have one or two obvious sources of fat that a listicle won't point you at.
Multi-stage builds are the highest-leverage change for compiled and transpiled services. Keep the build toolchain out of the runtime image. A typical TypeScript API drops from ~800 MB to ~180 MB.
Use -slim as your default. Not Alpine, not full Debian, not distroless unless you need the security hardening and accept the no-shell debugging trade-off. -slim keeps glibc and removes the dev tools.
Alpine is fine for pure services; dangerous with native modules. Queue consumers, simple APIs, Redis/DocumentDB workers — Alpine holds up well. The moment bcrypt, sharp, confluent-kafka, or anything that compiles a C extension enters the picture, switch to -slim. musl will break them, often silently, often later than you'd like to find out.
Order layers correctly. COPY package.json before COPY .. Dependency installs above source copies. Fast CI at zero cost.
Write .dockerignore before you write FROM. Context size matters, and node_modules in the build context is wasted bandwidth on every build.
The cost of a fat image compounds.ECR storage is cheap; inter-AZ pull bandwidth and cold-start latency are not. Run the math for your deployment frequency before you decide it doesn't matter.

After shipping dozens of services, my rule is simple: measure with docker history and dive before touching anything else. Multi-stage builds are almost always next. That sequence alone eliminates 80% of image bloat before you ever debate Alpine versus slim.

Frequently Asked Questions

Usually one of four reasons: your final image includes build tools or dev dependencies that should have been removed (fix with multi-stage builds), you're using a heavy base like node:24 instead of node:24-slim, your RUN commands split the package install and cleanup into separate layers so the cache persists, or you forgot .dockerignore and are copying node_modules or .git into the build context. Run docker history myimage:latest | sort -rh to find the offending layer in thirty seconds.

Reduce Docker Image Size: Measure First, Then Cut

Why your Docker image is 1 GB — and where the weight actually hides

Measure first: `docker history` and `dive` find the fat before you cut

Multi-stage builds: keep the build toolchain out of the runtime

Picking a base image: the honest size-vs-pain comparison

The alpine musl trap — where it bites Node, Python, and Laravel

Layer caching: order your Dockerfile so the cache actually hits

`.dockerignore`is the cheapest 50% you'll ever get

The production cost of fat images: ECR storage, pull bandwidth, cold starts

When NOT to obsess over image size

Summary

Frequently Asked Questions

Why Your Docker Container Gets OOMKilled (Exit Code 137)

Caching Strategies That Work (And When They Fail)

Why Your Database Indexes Are Not Working (And How to Fix Them)

Reduce Docker Image Size: Measure First, Then Cut

Why your Docker image is 1 GB — and where the weight actually hides

Measure first: docker history and dive find the fat before you cut

Multi-stage builds: keep the build toolchain out of the runtime

Picking a base image: the honest size-vs-pain comparison

The alpine musl trap — where it bites Node, Python, and Laravel

Layer caching: order your Dockerfile so the cache actually hits

.dockerignoreis the cheapest 50% you'll ever get

The production cost of fat images: ECR storage, pull bandwidth, cold starts

When NOT to obsess over image size

Summary

Frequently Asked Questions

Why is my Docker image so large?

Should I use Alpine to reduce Docker image size?

What is a multi-stage Docker build and when do I need one?

What is a distroless Docker image?

How does Docker layer caching work and why does order matter?

Why Your Docker Container Gets OOMKilled (Exit Code 137)

Caching Strategies That Work (And When They Fail)

Why Your Database Indexes Are Not Working (And How to Fix Them)

Measure first: `docker history` and `dive` find the fat before you cut

`.dockerignore`is the cheapest 50% you'll ever get