Speaker Two: That's the letter t. It makes the t sound. Speaker One: t, t, tiger. Aahh! Speaker Two: t, t, tractor. Speaker One: That's the letter p. It makes the p ...
Abstract: Recently, pre-trained models with phonetic supervision have demonstrated their advantages for crosslingual speech recognition in data efficiency and information sharing across languages.