> I'd personally think it would be interesting to test artificially increasing the sizeof() of strings from 24 bytes (the minimum needed to store a pointer, size and capacity) to something like 64 bytes
The "optimal" implementation would actually have different allocation sizes when used mostly for processing and when used mostly for storage: for processing one would prefer bigger defaults, for storage, one needs as little as possible initial overhead.
That could be achieved by raising the knowledge of that distinction and introducing actually (at least) two kind of strings. And even more than that can be gained by recognizing that whenever the strings are involved in some more complex structures, some additional restrictions for totality of all associated strings could also be exploited: allocating the strings separately is an overhead by itself.
If I remember correctly, Turbo Pascal strings had the default size of 256 bytes and that's how much was on the stack for each, and that was fast even decades ago. That was not efficient for having a lot of them in the structures, if a lot of them could be smaller, but that was what a programmer had to care separately - still the typical processing of strings and working with the limited number of them was nicely covered by that default.
The 2^n-1 sizes are cache-line don’t need explanation, I think. 27 was the maximum length of a volume name in MFS, the original Macintosh file system (which was a strange mix of backwards (no directories) and forwards (255 character file names) thinking).
Str32 was used in AppleTalk (and probably a design error or bug; Str31 is a more ‘natural’ type)
The "optimal" implementation would actually have different allocation sizes when used mostly for processing and when used mostly for storage: for processing one would prefer bigger defaults, for storage, one needs as little as possible initial overhead.
That could be achieved by raising the knowledge of that distinction and introducing actually (at least) two kind of strings. And even more than that can be gained by recognizing that whenever the strings are involved in some more complex structures, some additional restrictions for totality of all associated strings could also be exploited: allocating the strings separately is an overhead by itself.
If I remember correctly, Turbo Pascal strings had the default size of 256 bytes and that's how much was on the stack for each, and that was fast even decades ago. That was not efficient for having a lot of them in the structures, if a lot of them could be smaller, but that was what a programmer had to care separately - still the typical processing of strings and working with the limited number of them was nicely covered by that default.