Siamese SIREN: Audio Compression with
Implicit Neural Representations

ICML 2023 Neural Compression Workshop

Luca A. Lanzendörfer and Roger Wattenhofer

Paper Code

Siamese SIREN

Comparison of noisy and denoised libri3 and choice samples.¹
Notice the stereo effect in the noisy reconstruction due to different noise, but with the same underlying signal.

Ground Truth

Noisy Reconstruction

Noise Estimate

Denoised Reconstruction

¹https://librosa.org/doc/main/recordings.html

Noise Reduce without Noise Estimate

Comparison on noise reduction with noise estimate and without noise estimate. We observe strong cut-off when no noise estimate is provided.

Ground Truth

Reconstruction without Noise Estimate

Reconstruction with Noise Estimate

The following experiments use three random LibriSpeech and three random GTZAN samples each.

Quantized SIREN comparison

Comparison of different SIREN architectures. All reconstructions are computed after network quantization. We observe that the original SIREN, without positional encoding, is unable to reconstruct the signal.
PE refers to positional encoding, optimized SIREN refers to frequency scaling ω=100, NR refers to the Noise Reduce algorithm.

Ground Truth

Original SIREN

PE + SIREN

Optimized SIREN

Siamese SIREN + NR

Effect on decreasing network size

Comparison and effect on reconstruction quality when reducing network size. We observe strong signal corruption with reduced network size.
How to read column [number of shared layers x layer width, number of siamese layers x layer width, number of parameters]

Ground Truth

[3x256, 0, 843k]

[2x256, 1x128, 513k]

[2x128, 1x64, 142k]

[2x64, 1x32, 42k]

Shared Layer and Siamese Layer Configuration

Comparison on combination of shared and siamese layers. We observe an optimal trade-off when keeping two layers shared and one siamese layer.
How to read column [number of shared layers x layer width, number of siamese layers x layer width, number of parameters]

Ground Truth

[3x256, 0, 843k]

[2x256, 1x128, 513k]

[1x256, 2x128, 151k]

[0, 3x128, 75k]

Siamese SIREN: Audio Compression withImplicit Neural Representations

Siamese SIREN

Noise Reduce without Noise Estimate

The following experiments use three random LibriSpeech and three random GTZAN samples each.

Quantized SIREN comparison

Effect on decreasing network size

Shared Layer and Siamese Layer Configuration

Siamese SIREN: Audio Compression with
Implicit Neural Representations