I would say everybody smart is doing that, but a lot of the dumb money in AI right now is just wrappers on the GPT API That makes for a flashy demo with no underlying substance or expertise.
They are 100% better for classification at a given compute budget. They can account for information before and after e.g. a token for token classification and use that information to classify.
They are there, you just have to look. Tasksource, NuNER, Flan, T0. There’s not a lot, but still at least a few good zero shot models in both architectures.
It's because you need to mess with embeddings or even train new heads on top of a network to use it. LLMs just use tokens-in tokens-out, they don't classify with softmax over classes, they softmax over vocabulary tokens. LLMs are more convenient
When you have hundreds or thousands of examples, BERT works great. But that is very restricting.