Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Edit: I guess one solution would be to use a pretrained ALBERT and finetune to get the initial model and then use model distillation to get a smaller, faster model.

Huggingface did just that: https://medium.com/huggingface/distilbert-8cf3380435b5



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: