The CSS Speech Module sounds like such a fantastic idea, I really wish it had gained traction–but it hasn't. Here's some language I found from the W3C about the status:
Speech contains properties to specify how a document is rendered by a speech synthesizer: volume, voice, speed, pitch, cues, pauses, etc. There was already an ACSS (Aural CSS) module in CSS2, but it was never correctly implemented and it was not compatible with the Speech Synthesis Markup Language (SSML), W3C's language for controlling speech synthesizers. The ACSS module of CSS2 has therefore been split in two parts: speech (for actual speech, compatible with SSML) and audio (for sound effects on other devices). The speech properties in level 3 will be similar to those in level 2, but have different values. (The old properties can still be used with the deprecated 'aural' media type, but the new ones should be used inside the new 'speech' medium, as well as in style sheets for 'all' media.)
I don't think there has been enough implementer interest and public outcry to help these gain traction. A lot more attention seems to be paid these days to the Web Speech APIs. This goes to show that if something is specified, we have to tell browser vendors that we WANT and NEED those features, so they prioritize them. Otherwise they might not ever get adopted.