Abstract: Recent years the most commonly used loss function in end-to-end monaural target speaker extraction, especially for time domain deep neural networks, is Scale-Invariant Signal-to-Distortion ...