Common Misconceptions About Compilers

A myth‑busting overview of widespread misunderstandings in how compilers operate, optimize, and relate to code and hardware.

  • Compilers Aren’t Perfect Optimizers:
    They aim to improve code rather than guarantee optimal output in all cases. Practical constraints and design trade‑offs limit perfect optimization, and machine‑specific cost models are imperfect by nature.

  • Undefined Behavior Is a Tool, Not Just a Loophole:
    UB enables aggressive optimizations, but compilers may also avoid optimizing when necessary to preserve correct behavior. It's not always purely beneficial.

  • Undefined Behavior Can Be Defined by the Compiler:
    Compilers are allowed to define UB in specific ways (e.g. null dereference returns 0), but doing so typically incurs performance costs.

  • The "99% Correctness" Fallacy:
    Machine‑learning cost models in compilers may be 99% accurate, but that still leaves error margins when applied at scale—such models are approximate guides, not guarantees.

  • Compilers Don't Optimize Data Locality Automatically:
    Data layout decisions (like structure-of-arrays vs array-of-structures) are outside compiler control. Compilers optimize for instruction cache, not access patterns. Engineers must guide memory layout.

  • Inlining Remains Crucial Despite Other Optimizations:
    Inlining often yields more performance gain than other transformations and remains a cornerstone of compiler optimization.

  • Separate Compilation Isn’t Always Superior:
    While parallel builds can benefit from separate compilation, unity builds or link-time optimization sometimes outperform in large complex projects.

  • Middle-End Is Not Fully Platform-Agnostic:
    Though compilers attempt to abstract target details, LTO and whole-program analysis show that final optimizations depend on complete linking stage and knowledge of entire program.

  • Compilers Are Complex and Fallible:
    Even mature compilers like GCC and Clang contain subtle bugs triggered by rare edge cases; their enormous codebases make reliability a continual challenge.

The full post is available here.