Doesn’t this really seem more like speech-to-phoneme-stream?


Doesn’t this really seem more like speech-to-phoneme-stream? In languages where spelling diverges strongly from pronounciation (like English & French), we should expect phonetic pronounciation to spelling translation to be at least as complex as speech to phonetic pronounciation, if it used the same models (which it basically never does — having a NN guess the spelling for an unfamiliar word would be a nightmare!).