📝 Moozz · Transcription Test Bench

Transcribe a sung vocal into lyrics with word-level timings (Whisper large-v3 on GPU). Use a vocal from a previous stems test or upload one. This is speech-to-text only — phoneme alignment is a later pipeline stage.

New transcription

How it works: Whisper large-v3 (faster-whisper) on an L4 GPU. Output is phrases + words, each with start_ms/end_ms. Best results on a clean vocal stem (use the stems service first). Singing ASR is imperfect — the editor allows correction downstream.

Transcriptions