Contenido

macOS tar destroys files on Linux: I validated it in my real Railway pipeline and documented the 3 cases nobody mentions

There's a Hacker News thread that resurfaced this week with 107 points about a 2024 article: tar on macOS creates archives that Linux can't extract cleanly. The community reacted the way it always does — "use GNU tar", "install gtar with Homebrew", "this has been known for years." And yeah, all of that is correct.

But there's something nobody's saying: the 3 specific scenarios where this actually breaks production are not the same as each other, and each one has a different fix. I learned this the hard way — a failed deploy at 11pm that took two hours to diagnose. My thesis is that "use GNU tar" is necessary but not sufficient if you don't know exactly why your particular case is exploding.

macOS tar Linux extraction errors in production: the context that matters

Ever since I migrated from Vercel to Railway in 2024 (a weekend that taught me more about real infrastructure than months of tutorials), my deployment pipeline depends on .tar.gz artifacts I generate on macOS and extract in Linux containers. For months it worked fine. Until it didn't.

The core problem is that BSD tar (the one that ships with macOS) and GNU tar (the one running on Ubuntu, Alpine, Debian) are not the same program. They share a name and basic syntax, but differ in how they handle extended metadata. macOS adds HFS+/APFS filesystem metadata that GNU tar doesn't expect to find, and when it does find it, it can silently ignore it, fail with warnings that don't interrupt the process, or — worst case — extract corrupted files without telling you.

Check which version of tar you have on macOS:

# On macOS
tar --version
# Typical output:
# bsdtar 3.5.3 - libarchive 3.5.3 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8

# In your Linux container (Alpine, Ubuntu, etc.)
tar --version
# Typical output:
# tar (GNU tar) 1.34
# Copyright (C) 2021 Free Software Foundation, Inc.

They're not the same program. They never were.

The 3 real cases where macOS tar breaks a Railway pipeline

Case 1: Apple metadata `._*` files

This was the first bug I hit. When macOS creates a .tar.gz from a folder you've touched with Finder (or that had extended attributes at some point), it includes ._filename files with HFS metadata. They're invisible in Finder, but they're sitting right there in the tar.

# Compressed from macOS without any precaution:
tar -czf artifact.tar.gz ./dist/

# In the Linux container, extracted and listed:
tar -tzf artifact.tar.gz | grep "^\._"
# Output:
# ._index.html
# ._main.css
# ._chunk-abc123.js
# ... (one for every file in the build)

In my specific case, I had a Railway script that grabbed the first .js file in the directory to calculate a verification hash. The script found ._chunk-abc123.js before chunk-abc123.js and the hash failed. The deploy completed, but the post-deploy verification fired an alert. It took me 90 minutes to connect those dots.

Fix for Case 1:

# Option A: Strip metadata before compressing (on macOS)
COPYFILE_DISABLE=1 tar -czf artifact.tar.gz ./dist/

# Option B: Filter during extraction (on Linux)
tar -tzf artifact.tar.gz | grep -v "^\._" | tar -xzf artifact.tar.gz -T -

# Option C: What I actually use in my Railway Dockerfile
# Install GNU tar on macOS via Homebrew and use it explicitly
brew install gnu-tar
# Then in the build script:
gtar -czf artifact.tar.gz --exclude="._*" --exclude=".DS_Store" ./dist/

The COPYFILE_DISABLE=1 environment variable is the cleanest because it acts at creation time. But if you already have old .tar.gz files in storage, you need the extraction-side filtering option.

Case 2: Permissions that change silently

This one cost me more because there was no error. The deploy completed green, the app came up, but certain endpoints were returning 403s. The container couldn't read files that, on my local machine, had 644 permissions.

The problem: BSD tar on macOS can serialize permissions differently for files with APFS ACLs (Access Control Lists). When GNU tar extracts them, it interprets those permissions in a way that can result in different bits than the originals.

# On macOS, created a file and checked permissions:
ls -la config/settings.json
# -rw-r--r--  1 juan  staff  2048 Jun 15 22:31 config/settings.json

# Packed with BSD tar:
tar -czf config.tar.gz config/

# In the Linux container, extracted and checked:
tar -xzf config.tar.gz
ls -la config/settings.json
# -rw-------  1 1000  1000  2048 Jun 15 22:31 config/settings.json
# ↑ Permissions changed from 644 to 600 — group and others lost read access

It doesn't happen every time. It happens when the file had some extended attribute at some point in its history on the macOS filesystem. The kind of bug that shows up in production but not in staging because staging has a different file history.

Fix for Case 2:

# In the Dockerfile, force explicit permissions after extraction:
RUN tar -xzf artifact.tar.gz && \
    find ./config -name "*.json" -exec chmod 644 {} \; && \
    find ./scripts -name "*.sh" -exec chmod 755 {} \;

# Or better: in the macOS build script, normalize permissions before packing:
find ./dist -type f -exec chmod 644 {} \;
find ./dist -type d -exec chmod 755 {} \;
COPYFILE_DISABLE=1 gtar -czf artifact.tar.gz ./dist/

The second option is superior because it fixes the problem at the source, not the destination. If you fix it at the destination, you're depending on every Dockerfile having that fix — and eventually someone will create a new one without it.

Case 3: Paths with spaces in filenames

This is the quietest one, and the one the original HN article doesn't cover in nearly enough detail. If you pack from macOS and any file in the path has a space (which Finder makes completely normal), extraction behavior on Linux depends on the exact version of GNU tar and how you process the file list.

# On macOS I had an assets directory:
ls "dist/static/Open Graph/"
# og-image.png
# og-video.mp4

# After packing with BSD tar, the path looked like:
tar -tzf artifact.tar.gz | grep "Open"
# dist/static/Open Graph/og-image.png
# dist/static/Open Graph/og-video.mp4

# On Linux, extracting with a script that processed the list:
for file in $(tar -tzf artifact.tar.gz); do
    # ⚠️ This breaks: "Open" and "Graph/og-image.png" are two separate tokens
    echo "Processing: $file"
done

The tar itself extracts correctly with tar -xzf. The problem appears when any downstream script processes the file list assuming no spaces. In my case it was a CDN invalidation script that read paths from the tar to know which caches to flush.

Fix for Case 3:

# Bad: iterate with for over $(tar -t...)
for file in $(tar -tzf artifact.tar.gz); do
    invalidate_cache "$file"  # breaks with spaces
done

# Good: use read to handle spaces correctly
tar -tzf artifact.tar.gz | while IFS= read -r file; do
    invalidate_cache "$file"  # works with spaces
done

# Better: prevent the problem on macOS by renaming before packing
find ./dist -name "* *" -exec bash -c 'mv "$0" "${0// /_}"' {} \;
COPYFILE_DISABLE=1 gtar -czf artifact.tar.gz ./dist/

Renaming at the source is more robust because you eliminate the root cause. The read-based iteration is a patch that works but that the next developer will break when they copy the loop without understanding why it was written that way.

The mistakes I made before I understood the pattern

Mistake 1: Trusting that "tar worked before, it'll always work." The ._* files appeared after I started opening that assets folder with Finder for previews. Before Finder touched it, no metadata. After Finder, yes. The pipeline was the same; the filesystem wasn't.

Mistake 2: Only reading exit codes. GNU tar extracts ._* files with exit code 0. No error. Your deploy is "green" and in production you've got garbage stuffed into your build directory. You need post-extraction validation, not just exit codes.

Mistake 3: Installing gtar but still using tar in the scripts. After brew install gnu-tar, on macOS the binary is called gtar, not tar. If you keep writing tar in your build script, you're still using BSD tar. I did this for a week.

# Check which tar your script is actually running:
which tar        # /usr/bin/tar → BSD tar (macOS default)
which gtar       # /opt/homebrew/bin/gtar → GNU tar (Homebrew)

# If you want tar to be GNU tar without changing your scripts:
echo 'export PATH="/opt/homebrew/opt/gnu-tar/libexec/gnubin:$PATH"' >> ~/.zshrc
source ~/.zshrc
tar --version    # Should now show GNU tar

This PATH override is what I ended up using to keep existing scripts untouched.

My current Railway setup

After validating all three cases, my macOS build pipeline now looks like this:

#!/bin/bash
# scripts/build-artifact.sh
# Generates the tar.gz for Railway deployment

set -euo pipefail

BUILD_DIR="./dist"
OUTPUT="artifact-$(date +%Y%m%d-%H%M%S).tar.gz"

# 1. Normalize permissions before packing
echo "→ Normalizing permissions..."
find "$BUILD_DIR" -type f -exec chmod 644 {} \;
find "$BUILD_DIR" -type d -exec chmod 755 {} \;
find "$BUILD_DIR" -name "*.sh" -exec chmod 755 {} \;

# 2. Remove macOS metadata files
echo "→ Cleaning Apple metadata..."
find "$BUILD_DIR" -name "._*" -delete
find "$BUILD_DIR" -name ".DS_Store" -delete

# 3. Create the tar with GNU tar and no extended metadata
echo "→ Packing with GNU tar..."
COPYFILE_DISABLE=1 gtar \
    --exclude="._*" \
    --exclude=".DS_Store" \
    --exclude=".AppleDouble" \
    --exclude=".LSOverride" \
    -czf "$OUTPUT" \
    "$BUILD_DIR"

# 4. Verify the resulting tar has no Apple metadata files
METADATA_COUNT=$(tar -tzf "$OUTPUT" | grep -c "^\._" || true)
if [ "$METADATA_COUNT" -gt 0 ]; then
    echo "❌ ERROR: tar contains $METADATA_COUNT Apple metadata files"
    exit 1
fi

echo "✅ Artifact created: $OUTPUT"
echo "   Files: $(tar -tzf "$OUTPUT" | wc -l | tr -d ' ')"

And in the Railway Dockerfile, the extraction step has its own verification:

# Dockerfile — relevant fragment
FROM node:20-alpine AS runner

WORKDIR /app

# Copy the artifact
COPY artifact.tar.gz .

# Extract with explicit verification
RUN tar -xzf artifact.tar.gz && \
    rm artifact.tar.gz && \
    # Verify no ._* files snuck through
    METADATA=$(find . -name "._*" | wc -l) && \
    if [ "$METADATA" -gt 0 ]; then \
        echo "ERROR: Apple metadata detected in extraction" && exit 1; \
    fi && \
    echo "Clean extraction: $(find . -type f | wc -l) files"

This double verification — at creation time and at extraction time — is what gives me actual confidence. I don't trust that the process is always perfect; I trust that if it fails, I'll know before the deploy and not after.

This connects to something bigger than tar

A few weeks ago I wrote about my YAML specs for agents and about migrating from pgbackrest to Barman. In both cases the pattern was the same: a standard tool that "works" in most cases, until it hits the specific production edge case. Tar is just another instance of this.

The real risk isn't that tar is hard. It's that tar is so familiar that nobody considers it a failure point. When the deploy breaks at 11pm, nobody thinks "probably tar." And that's exactly why these bugs hurt more than they should.

FAQ: macOS tar Linux extraction errors in production

Why does macOS tar generate ._* files and when do they appear? The ._filename files are Resource Forks from HFS+/APFS — a legacy mechanism for storing file metadata. They appear when a file had extended attributes at any point: special permissions, Finder metadata, color tags, or simply when Finder opened the folder to show previews. They don't appear on all files; they appear on the ones the macOS filesystem touched in certain ways. It's non-deterministic from the developer's perspective.

Is COPYFILE_DISABLE=1 enough or do I still need GNU tar? COPYFILE_DISABLE=1 prevents BSD tar from including extended metadata at creation time. It's sufficient for Case 1 (the ._* files). For Case 2 (permissions with ACLs) and Case 3 (paths with spaces in downstream scripts), you need GNU tar and permission normalization. In practice I use both together because the cost is zero and the combination covers more ground.

Does GNU tar on macOS via Homebrew have any tradeoffs? The only real tradeoff is that it installs as gtar, not tar, to avoid breaking the system. If you override PATH so that tar points to GNU tar, you need to be aware that some macOS system tools assume BSD tar with specific behaviors. In practice, after 18 months using the PATH override I haven't had a single problem — but it's something you should know going in.

Does this affect GitHub Actions or only local builds? It mainly affects local builds on macOS and any CI runner running on macOS. GitHub Actions runners on Ubuntu already use GNU tar, so the problem doesn't show up there. The real risk is when you compress on local macOS and upload the artifact for a Linux system to extract — which is exactly the workflow for manual or semi-manual deployments.

Is there a way to detect whether an existing tar.gz has Apple metadata without extracting it? Yes, one line:

# List ._* files without extracting
tar -tzf artifact.tar.gz | grep "^\._"
# If it returns nothing, the tar is clean of Apple metadata

You can include this as a CI validation before publishing the artifact.

Why doesn't Docker build protect against this? Docker build copies files into the build context, but if the .tar.gz already has Apple metadata inside, that metadata travels inside the tar — Docker doesn't inspect tar contents when copying it. The problem happens when your Dockerfile does RUN tar -xzf and extracts the corrupted tar inside the container. Docker sees a command that exits with code 0 and assumes everything is fine.

The fix is easy. The trap isn't technical.

GNU tar + COPYFILE_DISABLE=1 + post-extraction verification solves all three cases. The technical part is documented above and you can copy it in five minutes.

The real trap is attitudinal: tar is so old and so familiar that nobody adds it to the list of things that can fail. I didn't have it on the list either. Until I had a broken deploy at 11pm with completely green logs and half an hour of staring at code that had no bugs in it.

If you're working with Kimi K2, Claude, or any LLM for code generation, none of them will warn you about this problem unless you already know about it and ask explicitly. If your stack touches Railway or any containerized infra, the problem can appear with no visible indicator.

My concrete recommendation: audit the tars you currently have in production or in storage with tar -tzf file.tar.gz | grep "^\._". If it returns results, you have work to do. If it returns nothing, good — but add the verification to your pipeline anyway so the next local macOS build doesn't silently break that guarantee.

This is exactly the kind of problem that shows up in Railway logs as a symptom of something else entirely. And that's precisely what makes it expensive.

Source: Hacker News

Comments (0)

💬

What do you think of this?

Drop your comment in 10 seconds.

We only use your login to show your name and avatar. No spam.

No comments yet. Be the first — your take matters most when we're few.

Opinionproduccionrailway

Agentic Coding Is Not a Trap: I Answered the Viral HN Post With My Own Production Logs

367 points on HN say agentic coding is a trap. I have logs that say something more uncomfortable: sometimes it saves 3 hours, sometimes it sends me down a 4-hour rabbit hole. The difference isn't the agent — it's the contract you sign before you send it to work.

8 min39

Experimentsdevopsproduccion

Barman Replacing pgbackrest: I Migrated My Postgres Backups in Production and Here's What I Found

pgbackrest went unmaintained. Barman shows up trending on HN almost immediately after. I did the actual migration on Railway, measured restore times, backup size, and configuration complexity. My conclusion is uncomfortable: the community is celebrating way too fast.

8 min96

ExperimentsTypeScriptproduccion

Kimi K2.6 vs Claude vs GPT-5.5: I ran it against my real coding cases and the numbers surprised me

The hype says Kimi K2.6 beat Claude and GPT-5.5 at coding. I ran it against my own codebase — not cherry-picked HumanEval — and what I found changes the question you should actually be asking.

11 min79