> Edit: I guess one solution would be to use a pretrained ALBERT and finetune to...

		halflings on Jan 9, 2020 \| parent \| context \| favorite \| on: ALBERT: A Lite BERT for Self-Supervised Learning o... > Edit: I guess one solution would be to use a pretrained ALBERT and finetune to get the initial model and then use model distillation to get a smaller, faster model. Huggingface did just that: https://medium.com/huggingface/distilbert-8cf3380435b5