What do you think won't show up in a profiler with GC? With the right profiler I...

mirekrusin · on April 29, 2022

It won't show what we're talking about in this thread - deallocations of large number of allocations when they loose last root reference at once.

spullara · on April 30, 2022

It absolutely will show them because they cost nothing. 0 will show up in the profiler which is how much deallocations cost with GC.

mirekrusin · on April 30, 2022

Magic.

In rc/arc you can achieve the same by making release a no-op.

kragen · on April 30, 2022

I don't know what you mean by "making release a no-op", but with reference counting it inherently takes O(N) time to deallocate an object containing N references, and O(M) time to deallocate M objects containing no references. With a generational copying collector, it takes O(1) time to deallocate M unreferenced objects each containing N references in the nursery — or rather it takes O(P+Q) time, where P is the number of objects that don't get deallocated and Q is the number of root references that need to be scanned, the ones in the stack plus the cards marked by your write barrier or whatever.

You can't achieve that with reference counting, with malloc/free, or with a non-copying tracing collector. You can achieve it with regions.

On the other hand, you may not need to. Allocating and initializing those objects took O(MN) time, so your program can't get an asymptotic speedup by deallocating them in O(1) time instead of O(MN) time. So it really depends on the constant factors, which nowadays depend strongly on things like cache miss rates and cache line sharing between cores.

mirekrusin · on May 1, 2022

Allocation cost in region is just pointer update, quite cheap. Deallocation is constant. You will get speedup if you can replace large number of allocations with arena. In gc languages you don't have luxury of specifying custom allocators for selected parts of the program. But the main argument was that tracing gc hides all of this from you and you can't profile your program explicitly see which callstacks contribute to large deallocations. Just because gc hides it, defers and spreads in time, doesn't mean it's zero cost – on contrary tracing will have more overhead. In vast majority of programs power consumption overhead is not relevant though, if it's traded for programming ergonomics, it's a win.