© 2026 Hashnode
Meta has made a major return to open-source AI with its new Omnilingual ASR system—capable of transcribing speech in more than 1,600 languages, including over 500 that previously had no AI support. The platform also supports “Bring Your Own Language”...

TL;DR The LLM4S Scala library has been extended to fully support speech-to-text (ASR) and text-to-speech (TTS) as first-class modalities. This GSoC 2025 contribution adds a comprehensive speech subsystem, enabling voice input/output alongside text. K...

Introduction Creating an Automatic Speech Recognition (ASR) system using Wav2Vec 2.0 on E2E’s Cloud GPU server is a compelling endeavor that brings together cutting-edge technology and robust infrastructure. Leveraging the power of Wav2Vec 2.0, a sta...

To give my Ai learning a context to ground into, I'm writing a funny little app I've called VoxPlan (in Python) which allows you to organise goals and tasks in a hierarchical tree and display them in an interactive GUI. I'm very interested in explori...

Background With the development of unsupervised pre-training, exemplified by Wav2Vec 2.0 released in 2020, these models could learn directly from the raw audio without the need for human labels. So the raw training data could be scaled to 1 million h...
