mimic (and most probably flite) does support SSML, at least some parts of it. I was using it to provide pronunciation for some words used in English Pirate instructions by Valhalla.

While supporting SSML, it becomes a bit more difficult to switch the language in the voice engine that has multiple voices for the same language. I guess, that has to be somehow resolved on TTS side. If done, then we can, indeed, give the full phrase with different languages to TTS engine.

As for country->lang matching, there is no need. OSM data gives the names in different languages, not by country code.

In general, I think we do need a decent TTS, whether open-sourced or not. In some parts of the code, you can get around by pre-recording, but that cannot be done for all applications.

