DAC-SE1: High-Fidelity Speech Enhancement via Discrete Audio Tokens

Abstract

Recent autoregressive transformer-based speech enhancement (SE) methods have shown promising results by leveraging advanced semantic understanding and contextual modeling of speech. However, these approaches often rely on complex multi-stage pipelines and low sampling rate codecs, limiting them to narrow and task-specific speech enhancement. In this work, we introduce DAC-SE1, a simplified language model-based SE framework leveraging discrete high-resolution audio representations; DAC-SE1 preserves fine-grained acoustic details while maintaining semantic coherence. Our experiments show that DAC-SE1 surpasses state-of-the-art autoregressive SE methods on both objective perceptual metrics and in a MUSHRA human evaluation. We release our codebase and model checkpoints to support further research in scalable, unified, and high-quality speech enhancement.

Mel Spectrogram Comparison

Comparison of speech enhancement methods showing mel spectrograms for each approach.

Clean Reference

Clean Reference Spectrogram

Noisy Input

Noisy Input Spectrogram

VoiceFixer

VoiceFixer Spectrogram

LLaSE-G1

LLaSE-G1 Spectrogram

DAC-SE1 (Ours)

DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram (placeholder)
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram
Clean Reference Spectrogram
Noisy Input Spectrogram (placeholder)
VoiceFixer Spectrogram
LLaSE-G1 Spectrogram
DAC-SE1 Spectrogram