Why your 'correct' C code is probably undefined behavior

4 min read 1 source explainer
├── "C's undefined behavior surface is dangerously larger than working programmers realize, and modern optimizers actively exploit it"
│  ├── Thomas Habets (blog.habets.se) → read

Habets argues that the gap between what C programmers think they're writing and what the standard actually permits has become a crisis. He catalogs everyday operations — signed overflow, shifts at type width, one-past-end pointers, strict aliasing violations, NULL memcpy with n==0, uninitialized reads — that the standard treats as UB, and shows how GCC and Clang increasingly turn that permission into surprising assembly that breaks 'obviously correct' code.

│  └── @lycopodiopsida (Hacker News, 427 pts) → view

By submitting the post and driving it to 427 points with 577 comments, the submitter and the HN audience signal-boost the position that this UB surface area is underappreciated. The community's strong engagement reflects broad agreement that the disconnect between mental model and abstract-machine semantics is a real, ongoing source of bugs.

├── "This is a systemic language-safety problem that justifies migrating off C, not just writing more careful C"
│  └── top10.dev editorial (top10.dev) → read below

The editorial frames Habets's catalog as evidence supporting CISA's directive to plan migrations away from memory-unsafe languages. It points to the Linux kernel's long tail of patches — notably Kees Cook's battle against compilers optimizing away null checks after dereferences — as proof that even the world's most scrutinized C codebase can't durably defend itself against UB exploitation, making the problem structural rather than a matter of programmer discipline.

└── "The root cause is an education and mental-model failure: C is taught as 'portable assembly' but the standard defines an abstract machine"
  └── top10.dev editorial (top10.dev) → read below

The editorial argues the entire story is the gap between what the standard says and what programmers think it says. Engineers learn C from K&R, colleagues, or existing code — none of which trains them to treat the abstract machine as the source of truth, so they read 'int x = a + b' as hardware addition rather than as a constrained operation the compiler may assume never overflows.

What happened

A new blog post from Thomas Habets, making the rounds on Hacker News with 427 points, makes a provocative claim in its title: everything in C is undefined behavior. The argument isn't literal hyperbole — it's a careful walk through how the C standard's UB rules, combined with modern optimizing compilers, mean that the line between "working code" and "compiler-permitted nasal demons" is far thinner than most C programmers assume.

Habets enumerates the surface area: signed integer overflow, shifting by widths ≥ the type size, dereferencing a pointer one-past-end, comparing pointers from different objects, violating strict aliasing, reading uninitialized memory, calling `memcpy` with a NULL source (even when `n 0`), and modifying the same scalar twice between sequence points.== Each of these is something working C programmers do — or accidentally do — every day. The standard says the compiler is allowed to assume these never happen. GCC and Clang increasingly take that permission and run with it.

The post lands at a moment when this debate is no longer academic. CISA has explicitly told vendors to plan migrations off memory-unsafe languages. The Linux kernel has accumulated a long tail of patches forced by compiler UB exploitation — the most famous being Kees Cook's running battle to keep null checks from being optimized away after a pointer dereference. Habets's contribution isn't novel research; it's a clean consolidation of the indictment, with code examples that compile to surprising assembly.

Why it matters

The gap between "what the standard says" and "what C programmers think it says" is the entire story. Most working engineers learned C from K&R, a colleague, or by reading existing code — none of which trains you to treat the abstract machine as the source of truth. When you write `int x = a + b;` you mean "add two integers, possibly overflowing." The standard means "if `a + b` overflows the signed range, the program's behavior is undefined and the compiler may assume it does not."

That assumption isn't theoretical. GCC famously uses signed-overflow-is-UB to optimize loop induction variables, turning `for (int i = 0; i <= n; i++)` into infinite loops when `n INT_MAX` — and the standard says this is correct.== Clang's `-fsanitize=undefined` exists precisely because there is no other way to find these bugs at runtime; they're invisible to ordinary testing because the compiler's output "works" until the day it doesn't.

The community reaction on HN split along familiar lines. The C defenders argue — fairly — that UB exists because the standard had to accommodate trap-on-overflow CPUs, segmented memory models, and non-IEEE float hardware that existed in 1989. The critics counter that none of that hardware has shipped in volume for thirty years, and the cost of keeping the abstraction is now paid in CVEs. Both sides are right, which is why the resolution isn't "fix C" — the resolution is that new code increasingly isn't being written in C.

Rust, Zig, and even modern C++ (with `-fwrapv`, sanitizers, bounds-checked containers, and `std::span`) have meaningfully smaller UB surfaces. The Linux kernel's Rust experiment, now four years in, has shipped Rust drivers in mainline. Android's media stack rewrite in Rust took memory-safety bugs in that subsystem to roughly zero. The data on counterfactuals is no longer ambiguous.

What's interesting about Habets's framing is that it sidesteps the language wars and just shows you the C. The point isn't "Rust is better" — the point is that the C you think you're writing is not the C the compiler is compiling. Once you internalize that, every line of pointer arithmetic in your codebase becomes a small bet on the compiler's mood.

What this means for your stack

If you maintain C — and most infrastructure engineers do, even if it's just OpenSSL, glibc, or the kernel headers your service depends on — there are concrete moves worth making in 2026.

Turn on UBSan in CI, not just locally. `-fsanitize=undefined,address` catches a wide class of UB at runtime, and the overhead is acceptable for test suites. The cost of leaving it off is that you ship UB to production and find it via crash reports. Several large codebases — PostgreSQL, SQLite, curl — now gate merges on sanitizer-clean test runs. If your C project's CI doesn't run a sanitizer build, that's the single highest-ROI change you can make this quarter.

Use `-fwrapv` and `-fno-strict-aliasing` if you can't audit every cast. These flags trade some optimization for behavior most programmers expect. The Linux kernel has used both for years. Yes, you give up a few percent on integer-heavy code. You also stop the compiler from deleting your overflow check.

For new code: pick the smallest amount of C you can get away with. A 200-line `.c` file wrapping a stable C API and a 50KLOC Rust application around it is now a normal architecture. The boundary is the dangerous part; minimize it.

And if you're an SRE rather than a C author: the relevant question for your stack isn't "is C unsafe" — it's "which of my dependencies' release notes mention UBSan, fuzzing, or Rust ports?" That's the leading indicator of which transitive deps will eat you in 2027.

Looking ahead

The long arc here is unmistakable: C will remain the substrate of operating systems and embedded software for decades, but the volume of new C being written is in steady decline, and the willingness of compiler authors to interpret the standard aggressively is in steady incline. Those two trends compose badly. Habets's post is best read not as an attack on C — he clearly loves the language — but as a practitioner's plea to take the standard literally, because your compiler already does. The C you're actually compiling is the C in the spec, not the C in your head. Acting otherwise has a name in the industry now, and it's pronounced "CVE."

Hacker News 450 pts 594 comments

Everything in C is undefined behavior

→ read on Hacker News
muvlon · Hacker News

Yes there is tons of surprising and weird UB in C, but this article doesn&#x27;t do a great job of showcasing it. It barely scratches the surface.Here&#x27;s a way weirder example: volatile int x = 5; printf(&quot;%d in hex is 0x%x.\n&quot;, x, x); This is totally fine if x is just an int, but the v

beeforpork · Hacker News

The UB in unaligned pointers is even worse: an unaligned pointer in itself is UB, not only an access to it. So even implicit casting a void*v to an int*i (like &#x27;i=v&#x27; in C or &#x27;f(v)&#x27; when f() accepts an int*) is UB if the cast pointer is not aligned to int.It is important to unders

quelsolaar · Hacker News

The 5 stages of learning about UB in C:-Denial: &quot;I know what signed overflow does on my machine.&quot;-Anger: &quot;This compiler is trash! why doesn&#x27;t it just do what I say!?&quot;-Bargaining: &quot;I&#x27;m submitting this proposal to wg14 to fix C...&quot;-Depression: &quot;Can you rely

greysphere · Hacker News

The examples aren&#x27;t really undefined behavior. They are examples that could become UB based on input&#x2F;circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def o

bestouff · Hacker News

The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that&#x27;s convenient for its happy path.

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.