Hey Greg,
Great article.
I especially liked your observation that audio quality starts before the TTS model runs. The sample-first workflow feels like a much more practical approach than treating long-form narration as a one-click conversion problem.
It sparked an idea for me, and I'd love to get your thoughts on it if you're open to connecting.
Thanks for sharing this.