See ASN.1's extensibility rules. If you mark a record type (SEQUENCE, in ASN.1 parlance) as extensible, then when you later add fields (members, in ASN.1 parlance) the encoding rules have to make it possible to skip/ignore those fields when decoded by software implementing the pre-extension type. PER/OER will include a length for all the extension fields for this purpose, but it can be one length for all the extension fields in each round of extensions rather than one per (which would only save on the type "tag" in TLV).
> the decoder can skip decoding fields
This is mainly true for on-line decoders that produce `{path, leaf value}` tuples _and_ which take paths or path filters as arguments.
> Now, you may disagree that tolerating unknown fields is a features (as many people do), but one must understand the context where protobuf has been designed, namely the situation where it takes time to roll out new versions of binaries that process the data (either in API calls or on stored files) and thus the ability to design a schema evolution with backward and forward compatibility is worth a few more cycles during encoding.
This is the sort of thing I mean when I complain about the ignorant reinvention of the wheel that we all seem to engage in. It's natural and easy to do that, but it's not really a good thing.
Extensibility, versioning, and many many other issues in serialization are largely not new, and have been well-known and addressed for decades. ASN.1, for example, had no extensibility functionality in the early 1980s, but the only encoding rules at the time (BER/DER/CER), being TLV encodings, naturally supported extensibility ("skipping over unknown fields"). Later formal support for extensibility was added to ASN.1 so as to support non-TLV encodings.
ASN.1 also has elaborate support for "typed holes", which is what is referred to as "references" in [0].
ASN.1 gets a lot of hate, mainly for
a) its syntax being ugly (true) and hard to parse (true-ish)
b) the standard being non-free (it used to be non-free, but it's free now, in PDF form anyways)
c) lack of tooling (which is not true anymore).
(c) in particular is silly because if one invents a new syntax and encoding rules then one has to write the non-existent tooling.
And every time someone re-invents ASN.1 they miss important features that they were unaware of.
Meanwhile ASN.1 is pluggable as to encoding rules, and it's easy enough to extend the syntax too. So ASN.1 covers XML and JSON even. There's no other syntax/standard that one can say that for!
Next time anyone invents a new syntax and/or encoding rules, do please carefully look at what's come before.
You make it sound like protobuf was invented yesterday. Sure it's not so old like ASN.1 but protobuf now about a quarter century old and battle tested as the main interchange format for at least one giant company with gazillions of projects that needed to interoperate.
One of the design requirements was simplicity and ease of implementation and for all the love in the world I can muster for ASN.1 I must admit it's far from simple.
IIRC complete and open implementations of ASN.1 were/are rare and the matrix of covered features didn't quite overlap between languages.
See ASN.1's extensibility rules. If you mark a record type (SEQUENCE, in ASN.1 parlance) as extensible, then when you later add fields (members, in ASN.1 parlance) the encoding rules have to make it possible to skip/ignore those fields when decoded by software implementing the pre-extension type. PER/OER will include a length for all the extension fields for this purpose, but it can be one length for all the extension fields in each round of extensions rather than one per (which would only save on the type "tag" in TLV).
> the decoder can skip decoding fields
This is mainly true for on-line decoders that produce `{path, leaf value}` tuples _and_ which take paths or path filters as arguments.
> Now, you may disagree that tolerating unknown fields is a features (as many people do), but one must understand the context where protobuf has been designed, namely the situation where it takes time to roll out new versions of binaries that process the data (either in API calls or on stored files) and thus the ability to design a schema evolution with backward and forward compatibility is worth a few more cycles during encoding.
This is the sort of thing I mean when I complain about the ignorant reinvention of the wheel that we all seem to engage in. It's natural and easy to do that, but it's not really a good thing.
Extensibility, versioning, and many many other issues in serialization are largely not new, and have been well-known and addressed for decades. ASN.1, for example, had no extensibility functionality in the early 1980s, but the only encoding rules at the time (BER/DER/CER), being TLV encodings, naturally supported extensibility ("skipping over unknown fields"). Later formal support for extensibility was added to ASN.1 so as to support non-TLV encodings.
ASN.1 also has elaborate support for "typed holes", which is what is referred to as "references" in [0].
ASN.1 gets a lot of hate, mainly for
(c) in particular is silly because if one invents a new syntax and encoding rules then one has to write the non-existent tooling.And every time someone re-invents ASN.1 they miss important features that they were unaware of.
Meanwhile ASN.1 is pluggable as to encoding rules, and it's easy enough to extend the syntax too. So ASN.1 covers XML and JSON even. There's no other syntax/standard that one can say that for!
Next time anyone invents a new syntax and/or encoding rules, do please carefully look at what's come before.