The Setup

A 30+ year old system — 32-bit C, dozens of executables, no tests. A previous team had tried migrating to 64-bit. Code compiled cleanly, then crashed at runtime.

Not compile errors, not linker failures. The binaries built, started, and produced memory corruption — silent data destruction. The kind of bugs where you stare at correct-looking code for hours and nothing makes sense.

The conclusion: too much effort. And the 32-bit platform still works. The project was shelved.

The immediate survival problem, Y2038, I’d already solved with _TIME_BITS=64 (a compile flag that makes time_t 64-bit while keeping everything else 32-bit). But full 64-bit is still the long-term goal because the platform vendor is discontinuing 32-bit support. You can buy time with containers and extended support contracts, but eventually the migration has to happen. So the question remained: why did the previous attempt fail when the code compiled fine?

The Answer Was in 4 Lines

In the system’s central type definition header, four typedefs use long instead of fixed-width types. The pattern:

typedef signed long   four_byte_int;   // "four bytes." Except on LP64.
typedef unsigned long four_byte_uint;  // Same problem.
typedef unsigned long ptr_as_int;      // Stores pointers as integers.
typedef unsigned long color_val;       // UI color values. Also 4 bytes. Also not.

On a 32-bit system, long is 4 bytes. On 64-bit Linux (LP64), long is 8 bytes. These types are named after their 4-byte size but silently become 8-byte types on a 64-bit build.

Every struct containing these types changes layout. Every memcpy using sizeof(long) copies the wrong number of bytes. Binary file formats become unreadable. Pointers stored as integer types get truncated when cast back.

And none of this produces a compile error. It compiles. It links. It runs. It destroys data.

Why Nobody Saw It

Because the rest of the codebase was clean.

The system’s primary types — LONG, ULONG, COUNT — are defined as int, not long:

// Linux
typedef int LONG;           // 4 bytes on ILP32 and LP64
typedef unsigned int ULONG; // 4 bytes on ILP32 and LP64

On LP64, int stays 4 bytes. These types are safe. Whether the original developers did this intentionally or not, the result is that the vast majority of the codebase — the database transaction layer, the UI framework, most business logic — uses types that don’t change size.

This created a false sense of security. The code compiled. Most of it worked. But thousands of occurrences of the four unsafe types scattered across the codebase were corrupting memory wherever struct layout or data size mattered.

Without type-level classification (separating safe types from unsafe types) the previous team had no way to know where to look. Everything compiled. Some things crashed. The connection was invisible.

Mapping the Blast Radius

Once you know which types are dangerous, you can map exactly what breaks. I compiled with the typedef fixes and -Wconversion, then classified every warning and error:

Pointer storage in integer types (~20 sites)

The UI framework stores this pointers in one of the 4-byte integer types. On LP64, an 8-byte pointer gets stuffed into what’s now still treated as a 4-byte slot by the surrounding code. The upper 32 bits vanish. When retrieved and cast back to a pointer, you get an address in the bottom 4 GB of memory — which may or may not be mapped.

This is almost certainly what caused the runtime crashes. It’s a use-after-truncation bug that’s invisible on 32-bit, guaranteed to crash on 64-bit, and produces no compile warning.

sizeof(long) in memcpy/memset (~15 sites)

Code that uses sizeof(long) where it means “4 bytes.” On LP64, sizeof(long) is 8, causing buffer overflows in memcpy, wrong byte counts in database bind operations, and broken index structures with hardcoded sizes.

Binary file formats (4+ formats)

Caches, permission data, sort buffers — all contain structs with long fields. 32-bit binaries write 4-byte fields. 64-bit binaries expect 8-byte fields. The files become unreadable across the boundary.

A typedef contradiction buried in the source tree

Two files deep in the codebase redefine typedef long LONG — contradicting the safe typedef int LONG everywhere else. If any compilation unit picks up the wrong header, LONG becomes 8 bytes in that unit while remaining 4 bytes everywhere else. Same struct, different layout, depending on which header wins. An ABI split that’s invisible until runtime.

The Fix

Replace long with int32_t, uint32_t, and uintptr_t. 4 lines. The types now mean what their names always claimed.

Then follow the compiler warnings:

Category Sites Effort
Pointer storage API ~20 Redesign to use uintptr_t or void*
sizeof(long)sizeof(int32_t) ~15 Mechanical replacement
Binary format versioning 4+ formats Version bump + migration reader
Contradicting typedef 2 files Delete or align with central header
Remaining conversion warnings ~50 Individual review

Scope: 6–9 weeks. One engineer.

And here’s the leverage: the Y2038 migration (_TIME_BITS=64) and the LP64 migration fix different code sites. Run them in parallel, share the canary deployment, and the combined scope is 7–11 weeks — not the 18 you’d expect by adding them up.

The Embedded Interpreter (And Why It Doesn’t Matter)

The system includes an embedded interpreter based on Python 2.4. On LP64, it produces a dozen hard errors related to LONG_BIT. This looks like a blocker — until you realize the interpreter is dying anyway.

Instead of migrating a 20-year-old interpreter fork to 64-bit, containerize it. 32-bit container, communication with the 64-bit system via the existing network protocol (confirmed LP64-safe). The interpreter stays 32-bit until its replacement is ready. Containment, not migration.

What Actually Happened

The previous team compiled, saw it work, deployed, and got memory corruption. Without type classification, they had no diagnostic path. “It compiled and crashed” became “too much effort.”

With classification:

  • 62 compile errors → 3 real 64-bit errors (the rest: missing includes, dead code)
  • Thousands of warnings → mostly deprecation noise
  • Runtime crashes → 4 typedefs, ~20 pointer truncation sites, ~15 sizeof mismatches

The codebase wasn’t broken. It was 99% clean with a 1% poison distributed across thousands of occurrences of four types. Fix is mechanical once you know which types to change. Hard part was figuring out which were dangerous and which were safe.

“It compiled” is not “it works.” On LP64, the most dangerous bugs are the ones the compiler accepts without complaint. Type classification before migration is the difference between “too much effort” and “6 weeks.”


4 lines in a header file. That’s what broke the other 40,000.