> invariants, validity checks, run-time type checks, etc. should all be performed as early (and often) as possible
Having worked on numerous large-scale distributed systems, I strongly disagree (and I would think the author would too).
There is a tradeoff to strictness, which is coupling. You want to validate and type check as much as necessary—but not more. If you are over-coupled in a distributed system, it can make it difficult to change your system incrementally, as one must also do with distributed systems. A canonical example of this is closed enums, which don’t tolerate unknown values. Good luck adding a new enum value if every part of your system is strictly validating that. Each part of the system should validate what it needs in order to guarantee that that system can operate correctly, and no more.
It's not a matter of distributed systems, but a matter of systems where only parts of it are updated on prod at the same time.
You could easily imagine a huge release, where the whole codebase gets pushed to prod at the same time. Then, the whole issue of versioning APIs disappears. However, it would require more discipline (probably unachievable at Google scale), so most companies prefer the slowly-burning garbage fire of versioned APIs and backward/forward compatiblity of messages.
While atomic big-bang releases have their benefits (and drawbacks), I don't think there's any way to avoid dealing with data backward compatibility in at least some way. Old versions of data always exist in your system (in databases, message queues, replay logs, caches, retry loops in external parties, even on the wire at the time of the release in some cases). While explicit and long term API versioning may not be needed in some release processes, a strategy for coping with old data becomes necessary past a certain scale. Migrating all extant data (not just your RDBMS) at the same time as the big-bang release is not practical.
> Each part of the system should validate what it needs
Easier said than done. Each part of the system can't know what it needs, since you've just added a new thing it might need to contend with.
The point of closed enums is to force these checks, even if all you're doing is adding a new fall through to ignore it, since at least that means you've considered it.
> Each part of the system can't know what it needs,
Yeah it can. It needs what it needed before.
If you have a system that is shipping stuff to customers, it doesn't have to care about the new childhoodPetName Field that you've added for a different part of the system.
This is especially relevant if you handle any kind of description of the real world, e.g. in robotics or medicine, and have multiple distributed systems and codebases all communicating with each other.
The inability to gracefully ignore cases that don't fit the expected data model is a weakness most contemporary programming languages and type systems share.
If field is childhoodPetName and other case labels are e.g., mothersMaidenName, and the common operation is to perform input sanitization, then this new field would have introduced a security hole if it's just ignored.
It's a matter of correctness enforced at language level, and why languages like Rust's match statement enforces all checks at compile time by default.
I agree this is where enum coupling is beneficial. What I'm saying is that, if those fields are later used for other things in your system (like persisting to a DB or running reports or something), you likely don't want to be re-validating the strings as closed enums, that's probably over-coupled.
I don't think there's a silver bullet here, or some rule of thumb that can save us from these schema issues. It's messy and imperfect; we have static type systems that can help us locally but they are not global silver bullets. Unfortunately we still have to use our judgment to find the right balance, because the "perfect" solution requires knowledge of the future.
By definition the field is ignored by your program, therefore even if there is any malicious text in there, e.g. raw html, the system in question can't do anything harmful with it as it is ignored.
Sanitation occurs on a type level, so if the field contains data of the type DangerousRawHTML, and there is a component in your system that just blindly renders any field besides those of type SanitisedHTML, then that is the location of your vulnerability. A type system like Rusts would also catch this.
Don't conflate "correct rust program semantics" with "correct real-world model semantics".
Having worked on numerous large-scale distributed systems, I strongly disagree (and I would think the author would too).
There is a tradeoff to strictness, which is coupling. You want to validate and type check as much as necessary—but not more. If you are over-coupled in a distributed system, it can make it difficult to change your system incrementally, as one must also do with distributed systems. A canonical example of this is closed enums, which don’t tolerate unknown values. Good luck adding a new enum value if every part of your system is strictly validating that. Each part of the system should validate what it needs in order to guarantee that that system can operate correctly, and no more.