[D] Best method of knowledge distillation available?

Xanta_Kross@alien.top · 1 year ago

[D] Best method of knowledge distillation available?

NoIdeaAbaout@alien.top · 1 year ago

Have you seen this article by Google?

https://blog.research.google/2023/09/distilling-step-by-step-outperforming.html

they claim that they were able to distill for reasoning task PaLM with T5 (2000 times difference in size) and the distilled T5 was outperforming PaLM

code is here:

https://github.com/google-research/distilling-step-by-step

Xanta_Kross@alien.top · 1 year ago

They seem to have distilled knowledge from a larger and general model to a smaller and specialised model and outperform the larger model on single task. Thanks for the paper. I wonder if I can specialise it to a subset of the original tasks and then try to outperform the original model.

NoIdeaAbaout@alien.top · 1 year ago

I think you can try a similar way for another task, for me, the approach can be generalized to different tasks

Xanta_Kross@alien.top · 1 year ago

for me the approach can be generalized to different tasks

Can you elaborate?