

I once got kobold.CPP working with their collection of TTS model+ wav tokenizer system. Here’s the wiki page on it.
It may not be as natural as a commercial voice model but may be enough to wet your appetite in the event that other solutions feel overwhelmingly complicated
Nice post Hendrik thanks for sharing your knowledge and helping people out :)