Jan 29 · 4 min read · Voice input is becoming table stakes for modern web apps. Users expect to tap a mic button and speak instead of typing — especially on mobile. Here is how I added real-time voice transcription to a React form using OpenAI Whisper. The Architecture Th...
Join discussionNov 24, 2025 · 3 min read · 📝 Quick Summary: noScribe is a free and open-source AI-based software for automated audio transcription, primarily designed for qualitative social research and journalistic use. It leverages Whisper and pyannote for transcription and speaker identif...
Join discussionNov 7, 2025 · 4 min read · 前言 本文檔對比三種使用 Whisper 進行語音轉文字的方式,包含速度、成本與實際使用建議。測試音訊為 56 分 27 秒(約 52 MB)的訪談錄音。 【 總結 】 追求速度與性價比平衡:RunPod RTX 4090 速度最快(75 秒) 成本合理($0.023/次) 無文件限制 適合:中高頻使用(>50 次/天) 零部署需求:Fireworks V3 Turbo API 呼叫即可使用 成本中等($0.051/次) 速度中等(100 秒) 適合:低頻使用(<50 次/...
Join discussionNov 5, 2025 · 2 min read · 📖 What Are Speech Models? Speech models are advanced AI systems designed to convert spoken language into written text (speech-to-text) and sometimes even the reverse (text-to-speech).They are the foundation of voice assistants, transcription tools, ...
Join discussion
Oct 8, 2025 · 5 min read · TL;DR: Built a privacy-first voice-to-text app for Linux that works completely offline. Types directly into any application. Open-source, production-ready, comprehensive test coverage. The Problem I wanted to create a Linux voice typing app that: Wo...
Join discussion
Oct 9, 2025 · 5 min read · Czy kiedykolwiek marzyłeś o tym, żeby automatycznie transkrybować spotkania i tłumaczyć je na swój język? W tym artykule pokażę Ci, jak stworzyć własny system AI do transkrypcji i tłumaczenia audio używając Python i modeli Hugging Face. 🎯 Co to jest...
Join discussion
Jul 7, 2025 · 5 min read · Whisper is computationally intensive, and running it efficiently at scale or on constrained devices is a real engineering challenge. To make Whisper more suitable for production deployment, I explored two independent and hardware-aware optimization s...
Join discussion
Jul 7, 2025 · 5 min read · The Whisper model by OpenAI is a powerful multilingual, multitask ASR system. However, this generality can become a liability when adapting Whisper to a specific language and task. For example, transcribing Azerbaijani speech without translation or l...
Join discussion