#asr articles | Hashnode

IIlyabjrn.hashnode.devJun 23 · 13 min read

I Processed 190,000 Podcast Episodes on a GPU Server in My Apartment

There is a server in my apartment that processed around 190,000 podcast episodes in three months. That is roughly 253,000 hours of audio. If I had pushed that through public transcription APIs, the tr

0

DTDhruv Thakurvad-voice-activity-detection.hashnode.devJan 29 · 6 min read

VAD Voice Activity Detection

What is VAD (Voice Activity Detection)? Voice Activity Detection (VAD)isa digital signal processing technique that distinguishes human speech from background noise or silence. It acts as a "gatekeeper" in audio pipelines, ensuring systems only proces...

0

DTDhruv Thakurpsyrar.hashnode.devJan 29 · 6 min read

VAD(Voice Activity Detection)

What is VAD (Voice Activity Detection)? Voice Activity Detection (VAD) is a digital signal processing technique that distinguishes human speech from background noise or silence. It acts as a "gatekeeper" in audio pipelines, ensuring systems only proc...

0

LSLogicVerse Solutionsskillmx.hashnode.devNov 11, 2025 · 3 min read

Meta Returns to Open Source AI with Omnilingual ASR Models

Meta has made a major return to open-source AI with its new Omnilingual ASR system—capable of transcribing speech in more than 1,600 languages, including over 500 that previously had no AI support. The platform also supports “Bring Your Own Language”...

0

AAAnshuman Awasthiawesomegsoc.hashnode.devSep 13, 2025 · 5 min read

Bringing Voice to LLM4S: Speech-to-Text and Text-to-Speech in Scala

TL;DR The LLM4S Scala library has been extended to fully support speech-to-text (ASR) and text-to-speech (TTS) as first-class modalities. This GSoC 2025 contribution adds a comprehensive speech subsystem, enabling voice input/output alongside text. K...

0

KSKaustubh Sharmakaustubhtech.hashnode.devJul 25, 2025 · 6 min read

Understanding Unexpected System Reboots

When Windows systems reboot unexpectedly, it can be challenging to determine the root cause. This newsletter provides comprehensive guidance on investigating these mysterious events using event logs, system files, and virtualization-specific tools. T...

0

MPMaksim Panfilovm.z3r.ioApr 4, 2025 · 26 min read

RTTM format specification and its application

Rich Transcription Time Marked (RTTM) is a widely used, text-based format for annotating audio and video, representing results of speech recognition, speaker diarization, and related metadata. Developed by NIST in the early 2000s, RTTM files consist ...

0

AUAkriti Upadhyayakritiu.hashnode.devJan 2, 2024 · 13 min read

How to Make an Automatic Speech Recognition System with Wav2Vec 2.0 on E2E’s Cloud GPU Server

Introduction Creating an Automatic Speech Recognition (ASR) system using Wav2Vec 2.0 on E2E’s Cloud GPU server is a compelling endeavor that brings together cutting-edge technology and robust infrastructure. Leveraging the power of Wav2Vec 2.0, a sta...

0

RTRichard Thompsonrichardmthompson.hashnode.devOct 18, 2023 · 7 min read

Searching for a Python-based Speech Recognition Engine (for CPU Inference)

To give my Ai learning a context to ground into, I'm writing a funny little app I've called VoxPlan (in Python) which allows you to organise goals and tasks in a hierarchical tree and display them in an interactive GUI. I'm very interested in explori...

0

SBSuvro Banerjeeai-projects.hashnode.devApr 15, 2023 · 9 min read

OpenAI Whisper - a neural net for speech to text

Background With the development of unsupervised pre-training, exemplified by Wav2Vec 2.0 released in 2020, these models could learn directly from the raw audio without the need for human labels. So the raw training data could be scaled to 1 million h...

0

#asr

#asr

Explore Hashnode

Trending tags this week