The report gets it wrong. C and C++ can both be made memory safe with small chan...

woodruffw · on Feb 28, 2024

I think what you’re doing with Fil-C is cool, but I wouldn’t call a 200x slowdown a “small change.”

One of the interesting things that Rust has demonstrated is that you don’t have to choose between performance and safety and, in fact, that safety improvements in languages can actually result in faster programs (e.g. due to improved alias analysis). New technology/sexiness advantage aside, I think this is a significant driver of adoption.

pizlonator · on Feb 28, 2024

> I think what you’re doing with Fil-C is cool, but I wouldn’t call a 200x slowdown a “small change.”

If you're bringing up the 200x, then you don't get what's going on.

It's extremely useful right now to have a compiler that's substantially correct so I don't have to deal with miscompiles as I grow the corpus.

Once I have a large enough corpus of tests, then I'll start optimizing. Writing compiler optimizations incrementally on top of a totally reliable compiler is just sensible engineering practice.

So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).

> One of the interesting things that Rust has demonstrated is that you don’t have to choose between performance and safety and, in fact, that safety improvements in languages can actually result in faster programs (e.g. due to improved alias analysis). New technology/sexiness advantage aside, I think this is a significant driver of adoption.

You have to rewrite your code to use Rust. You don't have to rewrite your code to use Fil-C. So, Rust costs more, period. And it costs more in exactly the kind of way that cannot be fixed. Fil-C's perf can be fixed. The fact that Rust requires rewriting your code cannot be fixed.

We can worry about making Fil-C fast once there's a corpus of stuff that runs on it. Until then, saying speed is a shortcoming of Fil-C is an utterly disingenuous argument. I can't take you seriously if you're making that argument.

woodruffw · on Feb 28, 2024

> So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).

I actually did, the first day you made it public. A friend also sent it to me because you link my blog in it. Again, I think it's cool, and I'm going to keep following your progress, because I think Rust alone is not a panacea.

I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].

To be clear: I don't think performance is a sufficient reason to not do memory safety. I've previously advocated for people running sanitizer-instrumented binaries in production, because the performance hit is often acceptable. But again: Rust gets you both performance and safety, and is increasingly the choice for shops that are looking to migrate off of their legacy codebases anyways. It's also easier to justify training a junior engineer to write safe code that can be integrated into a pre-existing codebase.

> You don't have to rewrite your code to use Fil-C.

If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.

[1]: https://security.apple.com/blog/towards-the-next-generation-...

pizlonator · on Feb 28, 2024

> I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].

Isoheaps suck a lot more in kernel than they do in user. I don't think it's accurate to say that isoheaps are a "huge pessimization". It's not huge, that's for sure.

For sure, right now, memory usage of Fil-C is just not an issue. The cost of isoheaps is not an issue.

Also, Fil-C is engineered to allow GC, and I haven't made the switch because there are some good reasons not to do it. That's an example of something where I want to pick based on data. I'll pick GC or not depending on what performs better and is most ergonomic for folks, and that's the kind of choice best made after I have a massive corpus.

> If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.

Yeah but it's not a rewrite.

If you want to switch to Rust, it's not a matter of changing a union - it's changing everything.

If you want to switch to Fil-C, then yeah, some of your unions, and most of your mallocs, will change.

For example, it took about two-three weeks working about 2hrs/day to convert OpenSSH to the point where the client works. I don't think you'd be able to rewrite OpenSSH in Rust on that kind of schedule.

safercplusplus · on Feb 28, 2024

Hi pizlonator, I'm working on a solution with similar goals (I think), but a bit of a different approach. It's a tool that auto-translates[1] (reasonable) C code to a memory-safe subset of C++. The goal is to get it reliable enough that it can be simply inserted as an (optional) build step, so that the source code can be maintained in its original form.

I'm under the impression that you're more of a low-level/compiler person, but I suggest that a higher level language like (a memory-safe subset of) C++ actually makes for a more desirable "intermediate representation" language, as it's amenable to maintaining information about the "intent" of the code, which can be helpful for optimization. It also allows programmers to provide manually optimized memory-safe implementations for performance-critical parts of the code.

The memory-safe subset of C++ is somewhat analogous to Rust's in terms of performance and in that it depends on a non-trivial static checker, but it imposes less onerous restrictions than Rust on single-threaded code.

The auto-translation tool already does the non-trivial (optimization) task of determining whether any (raw) pointer is being used as an array iterator or not. But further work to make the resulting code more performance optimal is needed. The task of optimizing a high-level "intermediate representation" language like (memory-safe) C++ is roughly analogous to optimizing lower-level IR languages, but the results should be more effective because you have more information about the original code, right?

I think this project could greatly benefit from the kind of effort you've displayed in yours.

[1]: https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

pizlonator · on Feb 29, 2024

That's cool!

My plan for Fil-C is to introduce stricter types as an optionally available thing while preserving the property that it's fast to convert C code to Fil-C.

C++ is easiest to describe, at the guts, in terms of C-style reasoning about pointers. So, the easiest path to convincingly make C++ safe is to convincingly make C safe first, and then implement the C++ stuff around that. It works out that way in the guts of clang/llvm, since my missing C++ support is largely about (a) some missing jank and glue in the frontend that isn't even that interesting and (b) missing llvm IR ops in the FilPizlonatorPass.

safercplusplus · on Feb 29, 2024

> the easiest path to convincingly make C++ safe is to convincingly make C safe first

Yeah, with all the static analysis, I did end up straying from the easy path. Ugh :) But actually, one thing that C++ provides that I found made things easier is destructors. I mean, I provide a couple of raw pointer replacement types that rely on ("transparently wrapped") target objects checking for any (replacement) pointers still targeting them when they get destroyed.

As you indicated in another comment, you explicitly choose to expose/require zalloc() because you didn't want to make malloc() too "magical" (by hiding the indirect type deduction). In that vein, one maybe nice thing about the "safe C++ subset" solution is that it exposes the entirety of the run-time safety mechanisms, in the sense that it's all in the library code and you can even step through it in the debugger. (It also gives you the option to catch any exceptions thrown by said safety mechanisms. You know, if exceptions are your thing. Otherwise you can provide your own custom "fault handling" code (if you want to log the error, or dump the stack or whatever).)

> There's a ton of literature on ways to make C/C++ safe. I think that the only reason why that path isn't being explored more is that it's the "less fun" option - it doesn't involve blue sky thoughts about new hardware or new languages.

I can't think of any other reason that makes sense either. Anyway, the first thing is to dispel the notion that C and C++ cannot be safe, and it seems like your project is likely to be the first to demonstrate it on some staple C libraries. I'm looking forward to it.

NonEUCitizen · on Feb 29, 2024

Do you have a forecast as to what the slowdown will be after optimizations are implemented? 20x? 2x? 1.2x? 1.02x?

Thanks.

pizlonator · on Feb 29, 2024

somewhere between 1.02x and 2x

ptx · on Feb 28, 2024

What kind of small changes? It seems strange to me that other languages would bother implementing complicated garbage collectors and borrow checkers if all you need is a small change from C.

pizlonator · on Feb 28, 2024

See here: https://github.com/pizlonator/llvm-project-deluge/blob/delug...

I just got the OpenSSH client to work last night.

Here's an example of the kinds of changes you have to make: https://github.com/pizlonator/deluded-openssh-portable/commi...

Most of the changes are just using zalloc and friends instead of malloc and friends. If I reaaaallly wanted to, I could have made it automatic (like, `malloc(sizeof(Foo))` could be interpreted by the compiler as being just `zalloc(Foo, 1)` ... I didn't do that because I sorta think it's too magical and C programmers don't like too much magic).

deathanatos · on Feb 28, 2024

How do you handle unions?

I'm also having a hard time fully convincing myself of this:

> the allocator will return a pointer to memory that had always been exactly that type. Use-after-free does not lead to type confusion in Fil-C

In the worst case, this seems like you must simply never reallocate memory, or we're discarding parts of the type. If I successively allocate integer arrays of growing lengths, it seems to be it must either return memory that had previously been used with a different type (e.g., a int[5] and an int[3] occupying the same memory at disjoint times) or address space usage in such a program is quadratic, or we're not considering array length as "part of the type", i.e., we're discarding it. (I'm not sure if this is acceptable or not. I think that should be fine, but I'll have to think harder.)

pizlonator · on Feb 28, 2024

> How do you handle unions?

This union is fine:

    union {
        int* x;
        foo* y;
    }

It's fine because they're both pointers. This union is also fine:

    union {
        int x;
        float y;
    }

It's fine because Fil-C treats both members as "ints" in the underlying type system.

This union has to change:

    union {
        char* a;
        int b;
    }

You can turn it into a struct or you can move the `char*` member out of it.

> In the worst case, this seems like you must simply never reallocate memory, or we're discarding parts of the type. If I successively allocate integer arrays of growing lengths, it seems to be it must either return memory that had previously been used with a different type (e.g., a int[5] and an int[3] occupying the same memory at disjoint times) or address space usage in such a program is quadratic, or we're not considering array length as "part of the type", i.e., we're discarding it. (I'm not sure if this is acceptable or not. I think that should be fine, but I'll have to think harder.)

int[3] and int[5] are both integer typed but have different size. The allocator also uses size segregation. It so happens that the virtual memory used for int[3] will never be reused for int[5] for that reason.

There's no problem with this; it's just a more aggressive version of segregated allocation.

The allocator still returns physical pages when they go free. It's just the virtual memory that only gets reused in a way that conforms to type.

And, that part of the system is the most well-tested. The isoheap allocator has been shipping in WebKit for years.

deathanatos · on Feb 29, 2024

> This union is fine:

I can type pun int & float pointers. (I think this is the same sort of behavior that I'm going to note at the end.)

> You can turn it into a struct or you can move the `char` member out of it.*

A struct is a product type. A union (at least one combined with a tag) is a sum type. I suppose one could use a struct like a union, and the members not corresponding to the variant are just wasted memory …

This is also a breakage from vanilla C, but it's not the first in your language, so I'm assuming that's acceptable. (And honestly sum types beyond `enum` are pretty rare in C, as C really doesn't help you.)

> [isoheaps]

Yeah, so I think your isoheaps are "memory-safe". I would stress that Rust's guarantees are a good bit stricter, and remove behavior you'd still see in your language. Your language wouldn't crash … but it would still go on to do undefined-ish things. (E.g., you'd get a integer, but it might not be the one you expect, or you might write to memory that is in use elsewhere, assuming the write was in-bounds within a current allocation.)

pizlonator · on Feb 29, 2024

> I can type pun int & float pointers. (I think this is the same sort of behavior that I'm going to note at the end.)

Yes. And yes.

> A struct is a product type. A union (at least one combined with a tag) is a sum type. I suppose one could use a struct like a union, and the members not corresponding to the variant are just wasted memory …

Yup, wasted memory.

> This is also a breakage from vanilla C, but it's not the first in your language, so I'm assuming that's acceptable. (And honestly sum types beyond `enum` are pretty rare in C, as C really doesn't help you.)

Exactly. I'm fine with the kind of breakage where I don't have to rewrite someone's code to fix it. I'm fine with breakage that doesn't slow me down, basically. It's nuanced.

> Yeah, so I think your isoheaps are "memory-safe". I would stress that Rust's guarantees are a good bit stricter, and remove behavior you'd still see in your language. Your language wouldn't crash … but it would still go on to do undefined-ish things. (E.g., you'd get a integer, but it might not be the one you expect, or you might write to memory that is in use elsewhere, assuming the write was in-bounds within a current allocation.)

You're 100% right. It's a trade-off.

Here's the cool bit though: Fil-C has no `unsafe` and no other `unsafe`-like escape hatch. In other words, Fil-C has a lower bar than Rust, but that bar is much more straightforward to meet.

ptx · on Feb 28, 2024

Attaching capabilities to pointers is sort of what CHERI does, isn't it? And the presumably CHERI can have better performance thanks to the direct hardware support. (Your manifesto mentions a 200x performance impact currently.)

pizlonator · on Feb 28, 2024

CHERI's capabilities are more permissive. For example, if you use-after-free in CHERI, then you can access the "free" memory (or whatever ends up there after another malloc) without trapping regardless of what type ends up there.

Fil-C never allows pointer memory to allow primitive or vice-versa, and use-after-free means you're at least pointing at data of the same type. Also, Fil-C has a clear path to using concurrent GC and then not have the UaF problem at all, while CHERI has no path to concurrent GC (they can stop the world, kinda maybe).

It's not meaningful to conclude anything from Fil-C's current perf. In the limit, it's easier to make Fil-C fast than it is to make CHERI fast. For example, in CHERI, if you want to have a thin pointer then you have to throw safety out of the window. The Fil-C plan is to give you thin pointers provided that you opt into more static typing.