#audio-processing articles

Ddishaaudioseparation.hashnode.devApr 29 · 5 min read

Beyond Black Boxes: Architecting a Text-Guided Audio Separator

We’ve spent the last decade teaching machines to understand pixels, paragraphs, and polygons. But audio? We still treat it like a sealed container. You get one file. One mix. If you want just the bass

0

VUVishal Uttam Manevishal-uttam-mane-mutli-modal.hashnode.devMar 31 · 4 min read

Multimodal AI: Combining Text, Image, Audio, and Video for Intelligent Systems

Multimodal AI represents a significant evolution in artificial intelligence, moving beyond single-modality systems (such as text-only or vision-only models) to architectures capable of understanding a

0

YYuriiblog.paramako.comJan 7 · 9 min read

Rust Audio Programming: Oscillator – Handle frequency changes smoothly [PART 2]

In Part 1, we built a simple sine wave generator and saved it as a .wav file. Now it’s time to make it sing — or at least play a short melody. We’ll modify our existing code to switch between different notes over time, building up something that feel...

0

SHShreesha Hanagudmcuselectioncatalogue.hashnode.devNov 5, 2025 · 6 min read

Edge AI Needs the Right Microcontroller – Here’s How to Choose

👇Surf down to Download the catalogue INTRODUCTION: In embedded systems, the microcontroller (MCU) is more than just a chip — it’s the brain that controls how the system thinks, behaves, and performs. As we move into smarter applications, such as Edg...

0

AKArtur Kudarturkud.hashnode.devOct 9, 2025 · 5 min read

🧠 NeuroNote-vibe: Jak stworzyć własny system transkrypcji i tłumaczenia audio z AI

Czy kiedykolwiek marzyłeś o tym, żeby automatycznie transkrybować spotkania i tłumaczyć je na swój język? W tym artykule pokażę Ci, jak stworzyć własny system AI do transkrypcji i tłumaczenia audio używając Python i modeli Hugging Face. 🎯 Co to jest...

0

JTJunluan Tsuible-voice.hashnode.devAug 6, 2025 · 8 min read

From KT142A Voice Chip to Practical Project Development: A Comprehensive Guide

Introduction: The KT142A is a robust voice chip that seamlessly integrates MP3 hardware decoding capabilities. It supports multiple storage media and offers flexible playback control mechanisms. This article will meticulously detail the process of ap...

0

YYuriiblog.paramako.comJun 21, 2025 · 6 min read

Rust Audio Programming: Oscillator – Build a sine wave [PART 1]

Hey there 👋! Whether you’re a seasoned rustacean, a curious audio hacker, or just someone who thinks synths are cool — welcome aboard! This is the Rust Audio Programming series, and today we’re diving into the basics of audio programming by building...

0

AKAkash Khandelwalmydevchronicles.hashnode.devJun 7, 2025 · 4 min read

🧠 From Frustration to Automation: My Journey to Transcribing YouTube Member-Only Videos with Whisper and Google Colab

Introduction There are several online tools that can generate transcripts simply by providing a YouTube video URL. These work well for public videos, making them a convenient solution in most cases. However, when it came to member-only videos, I quic...

0

MPMaksim Panfilovm.z3r.ioApr 4, 2025 · 26 min read

RTTM format specification and its application

Rich Transcription Time Marked (RTTM) is a widely used, text-based format for annotating audio and video, representing results of speech recognition, speaker diarization, and related metadata. Developed by NIST in the early 2000s, RTTM files consist ...

0

ESEmotion Systemsemotion-systems.hashnode.devFeb 27, 2025 · 3 min read

Innovations in Audio Description Services: How Emotion Systems is Leading the Way

In recent times making media accessible to everyone is crucial. Audio description services help visually impaired audiences enjoy movies, TV shows and online content. Emotion Systems is leading the way with its advanced audio processing in media supp...

0

#audio-processing

#audio-processing

Explore Hashnode

Trending tags this week

Beyond Black Boxes: Architecting a Text-Guided Audio Separator

Multimodal AI: Combining Text, Image, Audio, and Video for Intelligent Systems

Rust Audio Programming: Oscillator – Handle frequency changes smoothly [PART 2]

Edge AI Needs the Right Microcontroller – Here’s How to Choose

🧠 NeuroNote-vibe: Jak stworzyć własny system transkrypcji i tłumaczenia audio z AI

From KT142A Voice Chip to Practical Project Development: A Comprehensive Guide

Rust Audio Programming: Oscillator – Build a sine wave [PART 1]

🧠 From Frustration to Automation: My Journey to Transcribing YouTube Member-Only Videos with Whisper and Google Colab

RTTM format specification and its application

Innovations in Audio Description Services: How Emotion Systems is Leading the Way

#audio-processing

Search Hashnode

#audio-processing

Explore Hashnode

Trending tags this week

Beyond Black Boxes: Architecting a Text-Guided Audio Separator

Multimodal AI: Combining Text, Image, Audio, and Video for Intelligent Systems

Rust Audio Programming: Oscillator – Handle frequency changes smoothly [PART 2]

Edge AI Needs the Right Microcontroller – Here’s How to Choose

🧠 NeuroNote-vibe: Jak stworzyć własny system transkrypcji i tłumaczenia audio z AI

From KT142A Voice Chip to Practical Project Development: A Comprehensive Guide

Rust Audio Programming: Oscillator – Build a sine wave [PART 1]

🧠 From Frustration to Automation: My Journey to Transcribing YouTube Member-Only Videos with Whisper and Google Colab

RTTM format specification and its application

Innovations in Audio Description Services: How Emotion Systems is Leading the Way