📝 Moozz · Transcription Test Bench

Transcribe a sung vocal into lyrics with word-level timings (Whisper large-v3 on GPU). Use a vocal from a previous stems test or upload one. This is speech-to-text only — phoneme alignment is a later pipeline stage.

New transcription

Vocal from a stems test

…or upload a vocal (.wav)

Language

How it works: Whisper large-v3 (faster-whisper) on an L4 GPU. Output is phrases + words, each with start_ms/end_ms. Best results on a clean vocal stem (use the stems service first). Singing ASR is imperfect — the editor allows correction downstream.

Transcriptions

Loading…

📝 Moozz · Transcription Test Bench

New transcription

Transcriptions

Detail