Category: Whisper and On-device AI Optimization
-
Whisper Fundamentals: Understanding OpenAI’s Speech Recognition Model Architecture
Understanding Whisper's encoder-decoder architecture, compute bottlenecks, and model size trade-offs before optimizing it for on-device mobile deployment.
-
Real-time Whisper Is a Battery Nightmare (Here’s How to Fix It)
Streaming Whisper drains 1% battery per minute. VAD, adaptive inference, and thermal management strategies to build production-ready on-device speech recognition.
-
On-Device Inference: Running Whisper Efficiently with ONNX and Core ML
ONNX Runtime with CoreML provider beats native Core ML for Whisper on iOS. Here's why, with conversion scripts, memory tricks, and real benchmark data.
-
Optimizing Whisper for Mobile: Model Quantization and Compression Techniques
Comparing post-training quantization, static quantization, and QAT for deploying Whisper on mobile โ with real implementation failures and the ONNX workaround that actually works.