I think the big reason why BERT and T5 have fallen out of favor is the lack of z...

jerrygenser · on July 19, 2024

Yes but you can use an llm to label data and then train a bert model which then costs a small fraction of time and money to run than the original llm.

hdhshdhshdjd · on July 19, 2024

Shhh, don’t tell everybody the secret. ;-)

Karrot_Kream · on July 20, 2024

Lol isn't everyone doing it? That's how I bootstraped my BERT fine-tunes.

hdhshdhshdjd · on July 20, 2024

I would say everybody smart is doing that, but a lot of the dumb money in AI right now is just wrappers on the GPT API That makes for a flashy demo with no underlying substance or expertise.

robrenaud · on July 20, 2024

Is the encoder style arch better for representing classification tasks at a given compute budget than a causal LM?

Is this because the final represention in bert style models more globally focused, rather than being optimized for next token prediction?

jerrygenser · on July 20, 2024

They are 100% better for classification at a given compute budget. They can account for information before and after e.g. a token for token classification and use that information to classify.

byefruit · on July 19, 2024

Yes, no zero shot. Few shot is possible for some use cases with setfit: https://github.com/huggingface/setfit and the very recent Fastfit: https://github.com/IBM/fastfit ( https://arxiv.org/pdf/2404.12365 )

deepsquirrelnet · on July 20, 2024

They are there, you just have to look. Tasksource, NuNER, Flan, T0. There’s not a lot, but still at least a few good zero shot models in both architectures.

visarga · on July 20, 2024

It's because you need to mess with embeddings or even train new heads on top of a network to use it. LLMs just use tokens-in tokens-out, they don't classify with softmax over classes, they softmax over vocabulary tokens. LLMs are more convenient