Abstract: Knowledge distillation (KD) is a model compression technique that transfers knowledge from a complex and well-trained teacher model to a compact student model, thereby enabling the student ...