Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
C++20 Modules with GCC11 (feabhas.com)
139 points by ingve on Aug 18, 2021 | hide | past | favorite | 108 comments


> Modules promise to significantly change the structure of C++ codebases and possibly signal headers’ ultimate demise (but probably not in my lifetime). It also opens the door to potentially have a unified build system and package manager, similar to Rust’s Cargo package manager; though I imaging standardising a unified build system would be one bloody battle.

I wonder how many folks, like me, have stopped using C++ because it's too much effort to manage the build system and the overhead of headers adds too much to compile time?

An enormous draw for alternatives, like Rust, Zig, Nim et al is that they simply don't have these problems.


Honestly the modern build system is my favourite feature of rust.

No messing with autoconfs, and make files that spit out some random gcc error I have to debug.

To be fair however I am also not too interested in c++ and its 30 years of foot guns either so take it for what its worth.


In my experience Rust’s ability to rid itself from autoconf comes not because of any inherent benefits but because Rust code typically does not support systems where autoconf would really be required. For the mostly sane platform differences, the popular ones are probably hardcoded in some dependency crate with “-std” in its name so you don’t have to see it, unless of course it turns out to not support your less common platform or the check is broken, in which case you lob a GitHub issue at them and they deal with it.


Yes, cargo is perfect if you are willing to always compile from source, everything is Rust and one is willing to deal with everything else themselves in build.rs scripts.

Google and Microsoft aren't using cargo on their projects, rather had to come up with their own build tool.


At Microsoft, we do use cargo internally but we make use of an internal crate registry.


Google's bazel/blaze predates rust. It has the advantage of supporting multiple languages


and is pretty much unuseable in the traditional unix setting where you are supposed to vendor nothing and use all the libraries of the OS however broken they may be


That's less a limitation of the build system and more an intentional policy to ensure reproducible binaries etc.

Technically bazel can load any external files, but they must be tracked as workspace dependencies, which is more verbose:

E.g.

    new_local_repository(
        name = "system_libs",
        # pkg-config --variable=libdir x11
        path = "/usr/lib/x86_64-linux-gnu",
        build_file_content = """
cc_library( name = "x11", srcs = ["libX11.so"], visibility = ["//visibility:public"], ) """, )


That's a feature, no?


I think there's an argument to be made that some developers might be willing to sacrifice portability to some lesser-used platforms in order to make the ecosystem a bit cleaner for the platforms it does support. This is definitely a tradeoff though, so it make sense and is good that different ecosystems make difference choices here, as it allows people to choose whichever suits them best.


Rust does support a surprisingly deep set of platforms, but it achieves this by baking platform-specific knowledge into Rust itself. See for example how it finds errno, it's just a big list of special cases. It is pleasant to not need the configure scripts, but a downside is that no new platform can participate without first patching Rust.

https://doc.rust-lang.org/src/std/sys/unix/os.rs.html#38


(Into the standard library, not into Rust itself. Anyone can write this, you don't need to patch Rust first just to call some functions.)


I believe you can write your own completely separate standard library if you want to (much like each platform typically has it's own libc). It's just that it's easier to manage the first-party supported platforms as one codebase.


Now try to integrate it on mixed codebase like Android or Fuchsia.


How does Rust handle problems like "call this function if it exists?" Is there support in build scripts?


It doesn't. Indeed, when it comes to libc and other system libraries (or really most C libraries in general), the Rust ecosystem relies on hardcoding the ABI of every supported platform. It's theoretically possible to use bindgen (which parses C headers) as a build step, but this is rarely done.


Zig doesn't have a package manager like rust does. They are competing "non-official" implementations that are not compatible with each other. Zig is in the same state as C++ right now with vcpkg, build 2, conan, etc...

But there is an effort[0] from the "core team" (meaning Andrew Kelley the author or Zig) to push for an "official solution".

Maybe, the same way Zig makes cross-compiling C/C++ easier we might see a Zig package manager making C/C++ packaging easier too.

[0] https://github.com/ziglang/zig/issues/943


To provide some more context, here is a snippet from the latest release notes[0]:

> Having a package manager built into the Zig compiler is a long-anticipated feature. Zig 0.8.0 does not have this feature.

> If the package manager works well, people will use it, which means building Zig projects will involve compiling more lines of Zig code, which means the Zig compiler must get faster, better at incremental compilation, and better at resource management.

> Therefore, the package manager depends on finishing the Self-Hosted Compiler, since it is planned to have these improved performance characteristics, while the Bootstrap Compiler is not planned to have them.

I can't help but mention, I am incredibly excited about how well the self-hosted compiler is coming along. Progress is swift, and all the ambitious design decisions (fully incremental compilation[1], extremely fast debug builds[2], low memory usage, multi-threaded) are intact. I anticipate it to make quite a splash once it becomes generally usable.

I don't expect the package manager to take nearly as long to develop.

[0]: https://ziglang.org/download/0.8.0/release-notes.html#Packag...

[1]: https://vimeo.com/491488902

[2]: https://twitter.com/andy_kelley/status/1416485475125141504


I long for the times where there were less package managers, not more. Maybe two at most (one for distros and one for developers maybe) but isn't it time for some kind of standardisation? One per language is not justified.


Well, a dedicated package manager such as cargo sure is convenient. Maybe what we currently need is rather a "meta package manager" that knows of many package managers and can orchestrate them?


> stopped using C++ because [...] and the overhead of headers adds too much to compile time?

> An enormous draw for alternatives, like Rust, Zig, Nim et al is that they simply don't have these problems.

I'm using Nim full time at the moment in a large project.

Like Rust, it certainly does have the compile time problem due to parsing lots of files! Every compile, Nim processes every source file in the program, including libraries, and this can take minutes.

Unlike C++ where each source file can be compiled in parallel, and after a small change most files usually don't need to be recompiled, Nim's scan of the whole program's source files is single-threaded, so it's the slowest part of the build. Nim compiles to C, and the C compilation and linking steps are relatively fast because the C compilations are run in parallel and cached.


Neither does C++ when using IDE build tools.

It is just many anti-IDE folks never get to adopt them.


The only problem woth this approach is that C++ IDEs are silos of their own: Visual Studio on Windows, XCode on MacOS, whatever the heck you want to use on Linux... you quickly end up maintaining a lot of project files if you have a cross platform project. Damned if you do, damned if you don't...


Not every project needs to be cross platform.


Correct. And e.g. Visual Studio is kind of nice if you only need to develop for Windows. It's just that in my experience the pain with build systems invariably starts to increase when you actually have to go cross platform.


At the office c++ fans are using modules to fight back against the vast simplifications go, rust come with wrt to builds, dependency management. Alas modules are a step forward for c++ ... But ...

* It'll be decades before our existing c++ code base is converted to modules meaning same cmake, dpkg, and all manner of nonsense ...

* clibs used by c++ are not impacted

* Modules are unlikely to help for os headers

I'm trying to move to go when I can and pure rust otherwise. After many years of c++ build hell on top of legion internal c++ code .. building has become a cosmic pain in the butt.


> It'll be decades before our existing c++ code base is converted to modules

> I'm trying to move to go when I can and pure rust otherwise.

I don't understand in which universe replacing includes by imports takes more time than a full rewrite in a different language


It is not as if the same effort will be dedicated to switching to modules than a rewrite (which probably won't happen for the OP)


And 40 years of mature libraries.


In my experience the important libraries I need to interact with are C libraries, not C++ libraries. Of course certain niches have very important C++ libraries, but I don't think the C++ library story in general is that great. Care to expand?


Not OP, but several domains have large non-trivial C++ codebases. For example, numerics, computational geometry and computer graphics have large mature libraries that would need to be rewritten by subject experts. The number of subject experts in these fields is not that large to start with, and the algorithms are complex and non-trivial, so "exothermic" explosion to new languages like Rust is unlikely to happen very fast.


CUDA, SYSCL, DP++, Qt, MFC and WinUI, Metal Shading Language, DirectX, most COM APIs in Windows, WinRT infrastructure, macOS IO/Driver Kit, ARM Mbed, Arduino, the native implementation of most well known ML libraries, all major game engines middleware, LLVM and GCC, just from the top of my head.


Not the OP, but the first libraries that come to my mind are Qt and wxWidgets (for GUI programming).


Mmm. 40 years ago is 1981. So, if you do have a 1981 API it's completely incompatible with how you'd actually write C++ in 2021 and will need a wrapper, putting it squarely in the same situation as Rust where you would also want to wrap this 1981 API.


> So, if you do have a 1981 API it's completely incompatible with how you'd actually write C++ in 2021

is it though. I have plenty of .h in my /usr/include with copyrights between 1980 and 1985. Sure, it's not modern best practices, but at least it's possible... and for a one-off task there's definitely not much point in doing a wrapper.


Yes. So, modern C++ says you want a unique_ptr to any owned resource you're not sharing, and otherwise a shared_ptr (or some equivalent smart pointer). Your 1981 API doesn't do that because those don't exist in 1981.

Maybe it gives you a raw pointer to new resources. In 2021 those are just loose references which don't signify ownership, you need the wrapper to avoid this confusion leaking into your program.

Maybe it gives you an integer handle which needs to be passed to some other API call to "release" the handle when you're done with it. Again you'll want a wrapper to handle this properly in your code or you may leak resources. You'll also want a wrapper because one integer is the same as another, but you probably shouldn't finished_with_cow(dog_number) or clean_up_dog(cow_number) even though those compile.

Being forty years old it invariably won't be thread safe, so you'd also want to wrap it for that reason unless (as is often the case in C++) your code isn't thread safe anyway.

"at least it's possible" isn't a differentiator. As I said you could also access this 40 year old API from Rust, but you'd want to build a wrapper for it because it the behaviour is so violently different from what's expected in the modern ecosystem.


> Maybe it gives you a raw pointer to new resources. In 2021 those are just loose references which don't signify ownership, you need the wrapper to avoid this confusion leaking into your program.

you can use unique_ptr directly for that. take for instance the 197x API, fopen: you can just do

    using file_handle = std::unique_ptr<FILE, decltype([] (FILE* file) { fclose(file); })>;
and then store a file_handle in your class.

> Maybe it gives you an integer handle which needs to be passed to some other API call to "release" the handle when you're done with it. Again you'll want a wrapper to handle this properly in your code or you may leak resources. You'll also want a wrapper because one integer is the same as another, but you probably shouldn't finished_with_cow(dog_number) or clean_up_dog(cow_number) even though those compile.

it needs a one-time class to coax an int into behaving like a pointer but otherwise it's the same principle


Right, you want to wrap this code, exactly like Rust.

The point isn't that this is impossible it's that in both cases you want to write a wrapper to use them ergonomically in your modern software, C++ doesn't magically make 1980s code more respectful of object lifetime management than you'd think by reading Stroustrup's book at the time just because modern C++ (and the modern edition of the book) spends more time on this problem.

I think Jason Turner put together a little example about this type of wrapping as the more sane way forward for people who've persuaded themselves they can't stomach a C++ ABI change because they have binaries with no source or their system is too complicated to re-test. Maybe as a C++ Weekly.


> Right, you want to wrap this code, exactly like Rust

In theory. In practice if it's a one-off task I'm definitely not going to bother and will just use the 1981 API directly, especially if it's used in a single file for instance. There's nothing I find more stupid than code style dogmatism - if it compiles it's kosher in the right circumstances.


Ever heard of UNIX?

Naturally you are being pedantic by focusing on exactly 40 years.


Which parts of the UNIX APIs feel like good fits for modern C++ to you?

This reminds me of BeOS which claimed to be a C++ operating system and had abstract classes like "BStatable" representing things that can be "stat'd" and which comes with a bunch of methods that use out-pointers for integers that can in turn be bit-compared to constants like S_IWUSR. Very C++. Much object orientation. Wow.

[ Behind the scenes of course it's the stat() system call you remember from Unix and C. ]

It's true that there are really significant libraries people care about for C++ and some of those libraries existed five, ten, in a few cases even twenty-five years ago, but the older they are the uglier they get and more likely they are to drive people away from C++ rather than toward it. If your argument becomes something like "Why use Rust when C++ has COM?" you might as well reveal your membership of the Rust Evangelism Strike Force.


Nice post!

When playing with modules, I noticed a major caveat that makes me dislike them very much: the standard says that modules must form acyclic graphs. This means you still cannot get rid of forward declarations and you must put interfaces and implementations into different files for any realistically sized code base. So, in terms of effort, you still have to maintain the equivalemt of headers, just with a different name and less preprocessor macro interference.


Oh that's unfortunate.


Unfortunate, yes. But it keeps with the spirit of a language that is designed for both single pass compilation and separate compilation. A module system that mimics other languages like D or Python (I don't know enough about rust) would require a major change to how C++ compilations works. The compilers are by now complex enough that I'm pretty sure that compiler developers aren't keen on rewriting their frontends for multi-pass compilation.


> I don't know enough about rust

Rust allows cyclic dependencies between modules within a single crate, but not between crates.


That is true of most module systems anyway, and it was already an huge battle to get them adopted at all.


Which module systems are you thinking of? The only ones that I think come close are the ones in Pascal and its successors.


Mesa, Mesa/Cedar, Modula-2, Modula-3, CLU, Eiffel, Ada, Oberon, Oberon-2, Oberon-07, Component Pascal, Active Oberon, COM/WinRT, .NET.

Probably there are others as well.


What makes you include .NET? C# and VB.net handle cyclical dependencies between classes / source files just fine. Partial classes are the most extreme example of that. Assemblies are different, but so are static/dynamic libraries.


Assemblies are what maps to modules in concept, specially given polyglot projects.

If you want to stay in one language, then F# imposes compilation order, even Visual Studio has support to rearrange file order on the project file.


I know this has to integrate with existing C++.

But wow is that a multiple degrees more complex then in most other languages.


C++ supports platforms multiple degrees more complex than other languages.

You're not gonna use Rust or Zig to write code for some obscure DSP CPU with a ridiculous architecture and no OS.


Actually, maybe you would choose Rust. The core consideration is going to be whether somebody is going to target LLVM on that CPU. If the CPU has a perfectly nice GCC backend then maybe Rust is a bad choice, but if it's LLVM anyway, or if "land an actual high level language on this hardware so that I don't need to write machine code by hand" is part of the goal of your work anyway, then Rust looks reasonable.

Of course with no OS you can't run Rust's standard library as this expects OS services to underpin it. But you can have core features, and if you've got "no OS" you presumably were expecting to write anything you needed anyway.

The reality for modern C++ is that you're on a Microsoft platform, or you're on GCC, or you're on Clang, there is no fourth vendor - you are screwed and nothing works. The era of $$$ proprietary compilers for weird targets on C++ ended and isn't coming back, C++ got too complicated for them and they gradually stopped implementing new features. But if you're on Clang you already have LLVM, which is a long way toward having Rust. If you're on a Microsoft platform that is the OS, I know, seems like they could do more but they didn't.

When you say "ridiculous architecture" if you got too far off piste you can't have modern C++ either. Even as you stray into the black zone stuff gets very fragile and I wouldn't risk it. Your CPU integer arithmetic isn't technically required to be two's complement for example, but good luck fixing all the weird things that break if it isn't.


What I heard from people working with obscure

- They often don't support any C/C++ compiler.

- If they support a fully custom one it's often not that grate. Potentially limited to C or even a subset of C (or a very small subset of C++)

- If they base it on gcc/llvm either they are not too obscure or you sooner or later run into compiler bugs. Also in that case it's not too much of a stretch to e.g. support Rust instead of C++ if you want to.


Also what I thought...

Although, this is for people who make modules, not for module users.

As long as modules are written and compiled, they're both faster and easier to use and re-use than headers+libraries, although I'm curious to see if this covers both multithreaded libs and not, debug/release symbols, 32 or 64 bits and so on. So far all those options means there are 8 versions for each lib.

I've heard that now, libraries built with a version of MSVC are compatible with future version of MSVC, which means I don't need to build libraries each time I use a new version of MSVC.


it's the C++ way


I was waiting for C++20 modules to finally come back to using C++ again.. and what a disapointment

You still have to predeclare everything......... and you can't have circular dependency....

They learnt nothing, and they missed an opportunity to be relevant again

C++ is definitely dead, at least to me

This is so sad..


I've been waiting half a decade for modules


Half a decade? Man, I was hoping for something better than header files in the 90s!


> Half a decade? Man, I was hoping for something better than header files in the 90s!

It's hard to get something better than header files; modules don't provide the same functionality, as far as I can see.

I think headers will remain purely for practical purposes.


Since Mesa languages with modules also support conditional compilation and file inclusion.

Check Modula-2 documentation, or Turbo Pascal for MS-DOS or Apple's Object Pascal.


Well I did say "for practical purposes" :-)

Header files serve as a library interface for almost all languages out there, and are so simple that for the most part they can automatically be used from other languages to access the library.

C++ Modules, as far as I can tell anyway, are for C++ programs to use only, to access C++-only libraries.


Since when do other languages compile C header files?


> Since when do other languages compile C header files?

I dunno, I never claimed that.


> Header files serve as a library interface for almost all languages out there, and are so simple that for the most part they can automatically be used from other languages to access the library


> Header files serve as a library interface for almost all languages out there, and are so simple that for the most part they can automatically be used from other languages to access the library

What does that have to do with compilation? I use C headers all the time from Java projects and I've never had to compile them. Not manually anyway.

Headers are usable from Python, Tcl, Lua, Java, C# and a whole lot of other languages that similarly don't require compilation and don't require much effort further than issuing a single command.

The modules being proposed as a replacement for headers cannot, and will never, provide this functionality.


How do you make javac or csc read stdio.h?


> How do you make javac or csc read stdio.h?

Why would I need to? Swig execution is part of the build process (a dependency).

We seem to be arguing past each other: my original point is that modules are not a complete replacement for headers because headers are used to specify an interface that all languages can, right now, in a very practical sense, use.

In practical terms, headers work as an interface to the rest of the system that all other languages currently consume. Modules cannot (and there is no indication that they ever will be able to) provide this functionality.


So, for nearly one third of a century :P


I'm pretty sure modules and concepts have basically taken my entire life to happen.


This seems like it tries to be all things to all people. I wonder if it will succeed.


I'm not sure if you're talking about modules or C++ itself.


It's not, it still the same bullshit where you need to predeclare everything that is written bellow in your file or elsewhere

C++20 modules from 60 years ago when people didn't know how to design modules

They haven't looked at how languages such as D/Rust/Zig/Swift are doing, instead they went blind and they built a "new" thing with the limitations of their old design

What a shame


Unfortunately modules support in major build systems (e.g. cmake) is falling behind, and without it modules are somewhat useless for general c++ projects. I've only heard of build2 having some support for modules.


Visual Studio's MSBuild is doing quite alright,

https://devblogs.microsoft.com/cppblog/moving-a-project-to-c...


So will clang, gcc, and msvc all have different implementations modules?


They're different codebases, so as far as I know, they'll have different implementations, sure.

But modules themselves are standardized, and so you should expect that after some time, you'll have stuff that works on all implementations. I don't believe that any of them are complete yet, which is why you see these differences currently.

(Caveats: I have followed the development of modules semi closely but am by no means an expert.)


There's been a strong effort to have binary compatibility between compilers. The initial effort was funded by Intel when they launched Itanium and the resulting C++ ABI is used for other processor targets as well. So gcc, clang, and Intel's proprietary compiler are binary compatible modulo bugs.

But modules are very new and I doubt if enough corner cases are nailed down yet.


> So gcc, clang, and Intel's proprietary compiler are binary compatible modulo bugs.

For generated code, definitely. But for example their standard libraries are different so you still can't, say, compile something in gcc that uses std::string and expect to be able to link it to something compiled with clang that uses std::string unless you are careful to use the headers (and .a files) of the same libc++ or libstdc++

I consider this a feature, not a bug -- diversity in implementation is a plus.


Yeah, all the compilers will have different implementations and ABI and it'd be unrealistic expectation for them to be completely compatible. Instead, IIRC the current approach is to have a compatible format for specific common tasks. There has been some effort in SG15 to ensure consistency/interoperability across module implementations which led to https://wg21.link/P1688R0. This was mainly due to difficulty of scanning dependencies for build systems so there needs to be a standardized format to consume module information. One caveat: this group has been inactive for a while due to the pandemic, unfortunately.


Yes. There are no plans for binary module compat between compilers, or different versions of the same compiler for gcc and Clang.


In the C world, we have pkgconf.

There is no real chance to see it becoming a part of the C standard, because C is just that compact, and simple.

Yet, pkgconf will be a perfect base for pan-C package metadata standard outside of the standard.


pkgconf already sucks on Linux when your packages are configurable, and sucks even more on Windows


> pkgconf already sucks on Linux when your packages are configurable, and sucks even more on Windows

Yes, but to be fair most things suck on Windows because that userbase only recently decided that portability is a good thing.

As we move forward and Microsoft products stop being the odd one out, the suckage will get lower.


Ah pkgconf now works on IBM, ARM, Sony, Nintendo, Unisys, Cisco, Green Hills, Apple, Google's platforms?


None of those systems you mention particularly care about portability, right? That means your pkgconf support on those platforms are going to be as broken as on Windows, if not more.


What portability? Pkgconf is GNU/Linux only anyway.


Sorry, Mea Cupla - I kept reading it as 'pkg-config'.


I think you mean "in the Linux world" because pkgconf is not exclusive to C and doesn't work at all in many domains where C is in widespread use.


Yes, you will never manage to run it everywhere where C runs.

But as a part of a wider C world, it's probably the most portable semblance of standardised package metadata system.


Compact and simple?

I guess someone is still using K&R C book as reference instead of ISO C18.


How is it less compact and simple than it was 10 or 20 years ago? It's still the same language, in spirit and in practice.


Except K&R C will stop compiling in C23, and is so simple that most people that think they know C, will fail in all glory a pub Quizz that contains all ISO C versions and major language extensions.


I mean, that's nice and everything, but you're the one who brought up pre-standard (K&R) C.

I have a copy of the 2nd edition "The C Programming Language" by K&R from 1988 which discusses what is in effect the standard C89 language. Appendix C calls out the substantive changes compared to their earlier book, it is three pages long.

It's true that these changes are significant, C would be a very different language without enum for example, or with "old-style" (already deprecated by the time the book was written in 1988) function declarations. But the language is still very compact.

The book explains and describes essentially the entire language, and outlines the standard library, in under 300 pages. Only a few years later, the 2nd edition of Stroustrup's book "The C++ Programming Language" is more than twice the length, and introduces numerous already anachronistic features in notional support of C, which had meanwhile deprecated them. This takes up about 700 pages.

Stroustrup not only grandfathers in features C had by then cast off (such as weird function definitions) but also introduces crazy nonsense that I hope nobody ever used in the name of "backward compatibility" e.g. hey why not overload both of the increment operators (themselves bad ideas C++ inherited from C) with the same function? Imagine trying to pretend to yourself that you believe this is a reasonable thing to do, while also pretending to believe operator overloading delivers "unsurprising" semantics.


To properly compare the C standard with C++ in page count, you need to include POSIX as well.

Also you should be comparing ISO standard documents editions, not books.


The argument was that C is "compact and simple". The modestly size book is both an excellent introduction and a good manual for the complete language (for C89 anyway). That's not why the language is compact and simple, it's because the language is compact and simple.

In contrast Stroustrup's work is much bigger, and yet things that would later be claimed to be very important don't warrant a mention. Even assuming we accept that Bjarne is simply retrospectively imagining a motivation he never had at the time, and so these were indeed tiny inconsequential features when the book was written, somehow the "survey" of the C++ language in the much longer book is inadequate while K&R cover their entire language very well in fewer pages. Simply this is because in fact C is a "compact and simple" language.

People make a lot of claims about C that I disagree with, but it certainly is simple. If I could make modest adjustments in the light of what we know now, to what was standardised in ANSI C in 1989 - a few of them would perhaps simplify it further (don't do implicit narrowing just emit a diagnostic and give up; don't try to make volatile a type qualifier) but many would make it more complicated instead (more type safety, sum-types, and of course an actual length+pointer string type) just arguably to our considerable benefit.


K&R C is not 10 years ago. It's 30-50 years ago and (AFAIK) wasn't even properly standardized and (AFAIK) still mostly works with recent compilers.

Everybody thinks they know shit and yet they don't know all the details even after 5 years of practice (and let's not even look at C++!). Welcome to reality.

However, a programmer's goal is not to know all the (often irrelevant) details. And it's not to win a pub quiz.

The goal is to know enough to be able to communicate ideas in a clear and straightforward way, and to be productive.


Ah it isn't so simple after all.

Than lets limit the questions to what is UB, which is needed for daily programming activities, specially when one cares about security assessments, I am betting on the same outcome.


This has nothing to do with K&R or whatever new C standard there is. You're talking trash.

The rules around UB can be annoying, but I don't think this is a recent development, either. Probably compiler optimizations are to blame for some of this. For actual practice and pedestrian code, obscure UB bugs have little relevance.

Food for thought: C compilers are free to insert runtime checks to abort when UB occurs.


UB was standardized in C99 and optimizers started really going after it maybe 5-10 years after that. There was a wide spread push to make the issue more widely known 8 years ago-ish if I recall correctly and most people still don’t fully understand it. I wouldn’t say it’s “new information”, but it’s certainly relatively new in the overall lifetime of the language (ie even a lot of C99 code from that era is malformed on today’s compilers).

I understand the basic concept of UB but some of the rules can have nuance and be easy to run afoul of. If you’re not running UBSAN on non-trivial codebases (lots of LOC and/or multiple devs), you are likely to have UB. Same goes for ASAN and TSAN issues. I always have humility around this topic as I’m not a compiler author. Heck, the compiler authors I’ve worked with themselves know the knives are sharp and work with humility.


Totally agree with your post. I don't know what your conclusion is, though.

My conclusion is that C lets me do what I want straightforwardly, I still haven't found this quality in a different language (there is inertia, I've only tried Zig for an hour for example and gave up in frustration). Also, to some degree I understand where most UB definitions are coming from and even agree with some of them.

The basic idea behind UB is to give the compiler a free pass (optimization, and not having to detect corruption) if the program has a bug anyway.

In some cases there can be legitimate surprises, when the programmer made some assumptions about guarantees that C can't give and that a newer language (without the historical baggage of supporting some architecture) can give, which might result in the compiler transforming a little bug into something bigger (more dangerous?). An example that I've personally learned at some point is integer shifting and masking in conjunction with implicit type promotion to int (which is signed).

All in all though, while I'm not be the most sophisticated programmer and most of my programms are never exposed to hacking attempts, just anecdotally I've had very, very little headaches from UB.


> My conclusion is that C lets me do what I want straightforwardly, I still haven't found this quality in a different language

C is the language where you've found things to be straightfoward? I'm curious what you're working on where this is the case. I'd reach for Python, Go, Java, Swift, C#, depending on what I'm trying to do (even C++ can be better, but it comes with its own unique additional set of footguns so some C acolytes tend to shy away from it). For most projects (at least those starting from scratch today) I'd probably choose Rust if C/C++ would otherwise be the only appropriate choice & use Python/Go/Swift/Java/C# elsewhere.

> All in all though, while I'm not be the most sophisticated programmer and most of my programms are never exposed to hacking attempts, just anecdotally I've had very, very little headaches from UB.

and

> An example that I've personally learned at some point is integer shifting and masking in conjunction with implicit type promotion to int (which is signed).

Is an interesting mix of statements. I can say that I've worked on quite a few large codebases with a myriad of contributors, & I've always uncovered UB when I went looking for it. Some of it harmless, others less so. It's a "this probably is invisible 90% of the time you run it locally" kind of thing, so unless you have thorough tests you can run under UBSAN, you're likely just unaware of the scope of the problem. That being said, I don't want to blow it out of proportion. I'd say memory safety & thread safety are bigger classes of issues. As with UB, if you're not running sanitizers with very large test suites, you're unlikely to see it [1][2].

As for why UB is a real problem conceptually, this used to be well-formed and correctly behaving pre-C99:

``` uint8_t x = <pointer to some buffer> uint64_t y = (uint64_t)x; ```

I don't mean in a "this just happened to work" but in a "this was valid semantics of the language". Now it requires a memcpy. It's also entirely possible this is valid in C & not C++? That's what I mean by UB being very subtle. Yes, I understand why UB is important to allow compilers to generate more optimal code. I think Rust strikes a better balance here - most code doesn't* need that tradeoff and where it's important can be explicitly annotated with that fact.

Ultimately, to me the fact that even the Linux kernel is subject to these kinds of issues[3], gives me enough pause to believe it's a real problem in the ecosystem (among many others). So other, safer languages, are usually better if you can use them. Most experienced developers should have a number of languages under their belts they're fluent in and be able to switch to languages they haven't seen before (albeit there's always a cost & it's not fun, so I don't seek it out as often as I maybe should).

[1] https://reviews.llvm.org/D51170 [2] https://bugs.llvm.org/show_bug.cgi?id=23293 [3] https://software.intel.com/content/www/us/en/develop/blogs/n...


You can escape stars with backslash so they won't be interpreted as a section in italics: *** can be achieved like this: \\*\\*\\*

> C is the language where you've found things to be straightfoward? I'm curious what you're working on where this is the case.

Recently I've done networking, threading, embedded, GUI (2D/3D), compilers/interpreters - most is completely from scratch. I'm not a world-class expert in any of these domains but I'd like to think that I'm decent, especially last year I've finally made a leap in my understanding of systems design. In general, the stuff that I manage to complete gets exceptionally few bug reports and has good performance without any optimization.

I'm not claiming that my statements have any scientific accuracy, just that C is what turned out to be subjectively the most versatile, productive, and fun one for me - even though (or because) it meant going through a steep learning curve, learning how to do most things from scratch. I've tried quite a few languages before, I have for example significantly invested in: bash/sh, python, Javascript, "modern" C++, Haskell. And coded in a few others, like Java, C#, Postscript, Erlang, Perl... Today I'm still doing some Python for some scripting / admin stuff, but I don't enjoy it anymore.

If I had to find one reason why I don't like those languages, it's that they make it so hard/impossible or obscure to do the simplest thing that a program needs to do: Copy data. Shaping the code and data architecture to the problem from the ground up brings wins in the long run. It seems to me like most languages optimize for brevity and cleverness in toy examples or short script-like programs, while larger architectures simply don't profit from the conventions or syntax that the more "convenient" languages set up - beyond a few thousand lines of code those are only getting in the way. I don't need one predefined way how to do "objects" or "interfaces" - the preexisting stuff is usually not a good solution to the problem. It's better approaching problems by looking at what data representations are needed and how can they be transformed between each other, and this stuff can usually be achieved directly by mostly copying data around. C doesn't even encourage the programmer to try anything more complex, and this tends to result in programs that solve the problem in a straightforward way.


Trying to attack me won't do a thing, remember comp.lang.c? Plenty of Usenet experience over here.

If you don't like it to be pointed out C's weaknesses, don't engage.

I can play language lawyer quite well, including security standards.

Yes compilers are free to do many things to fix C, except that many don't.


> Yes compilers are free to do many things to fix C, except that many don't.

But this is the point. The language is quite compact and simple and if you think like a computer, it just "makes sense". Counting pages of the standards is not an accurate measure of language complexity.

Certain subtleties are not even forced by the language, rather they arise from real world requirements (such as performance, implementation simplicity, modularity, compatibility, etc).

By and large, it's not the language, but rather what people do with it. The language gets out of the way.


so excited!


Now make a left-pad.cpp module and unpublish it.


That joke doesn't even make sense. C++ modules aren't a package manager. They can't fetch code from a central repository.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: