Differentiable All-pole Filters for Time-varying Audio Systems

Chin-Yun Yu 1, Christopher Mitcheltree 1, Alistair Carson 2, Stefan Bilbao 2, Joshua D. Reiss 1, and György Fazekas 1

1 Centre for Digital Music, Queen Mary University of London
2 Acoustics and Audio Group, University of Edinburgh

Paper Experiments Code Filter Code Compressor Code Plugins

Abstract


Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they are approximations and cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within any audio system containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code available and provide the trained audio effect and synth models in a VST plugin.




Figure 1: The forward (left) and backpropagation (right) flow chart of a third-order time-varying all-pole filter.

Listening Samples


We provide listening examples for our experiments modeling a phaser (Electro-Harmonix Small Stone), a time-varying subtractive synthesizer (Roland TB-303 Bass Line), and a feedforward-compressor (LA-2A Leveling Amplifier). All three systems are trained to model some target analog audio in an end-to-end fashion using gradient descent.

1. Phaser (Electro-Harmonix Small Stone)




Figure 2: Discrete-time phaser model considered in this work, where K = 4. APF represents a time-varying all-pass filter with difference equation and BQ is a biquad filter.

For our first experiment, we use our time domain filter to model the Electro-Harmonix SmallStone phaser pedal using the differentiable phaser architecture shown in Figure 2. The Electro-Harmonix SmallStone's circuit consists of four cascaded analog all-pass filters, a through-path for the input signal, and a feedback path which means it is topologically similar to our phaser implementation. The pedal consists of one knob which controls the LFO rate, and a switch that engages the feedback loop.

Parameter
Config.
LFO Rate Feedback Hop Size
L / Fs
Input Target TD (Ours) FS
SS-A ≈ 2.3 Hz off 10 ms
SS-B ≈ 0.6 Hz off 40 ms
SS-C ≈ 0.09 Hz off 160 ms
SS-D ≈ 1.4 Hz on 10 ms
SS-E ≈ 0.4 Hz on 40 ms
SS-F ≈ 0.06 Hz on 160 ms

Table scrolls horizontally if space is limited.
 

2. Time-varying Subtractive Synthesizer (Roland TB-303 Bass Line)




Figure 3: Diagram of the differentiable synth modelling process. Our time-domain filter component is shown in green.

For our second experiment, we use our time domain filter to model the Roland TB-303 Bass Line synthesizer which defined the acid house electronic music movement of the late 1980s. The TB-303 is an ideal synth for our use case because its defining feature is a resonant low-pass filter where the cutoff frequency is modulated quickly using an envelope to create its signature squelchy, “liquid” sound. We model it using the time-varying subtractive synth architecture shown in Figure 3 which consists of three main components: a monophonic oscillator, a time-varying biquad filter, and a waveshaper for adding distortion to the output. The dataset is made from Sample Science’s royalty free Abstract 303 sample pack consisting of 100 synth loops at 120 BPM recorded dry from a hardware TB-303 clone.

Table 2.1: Entire test set concatenated together.

Filter
Config.
Inference
Method
Target TD (Ours) FS 128 FS 256 FS 512 FS 1024 FS 2048 FS 4096 LSTM 64
Coeff. TD N/A
Coeff. FS N/A N/A
Low-pass TD N/A
Low-pass FS N/A N/A
RNN TD N/A N/A N/A N/A N/A N/A N/A

Table scrolls horizontally if space is limited.
 

Table 2.2: Five different test set audio clips repeated 16 times each for easier comparison.

Example
Number
Filter
Config.
Inference
Method
Target TD (Ours) FS 128 FS 256 FS 512 FS 1024 FS 2048 FS 4096 LSTM 64
1 Coeff. TD N/A
1 Coeff. FS N/A N/A
1 Low-pass TD N/A
1 Low-pass FS N/A N/A
1 RNN TD N/A N/A N/A N/A N/A N/A N/A
2 Coeff. TD N/A
2 Coeff. FS N/A N/A
2 Low-pass TD N/A
2 Low-pass FS N/A N/A
2 RNN TD N/A N/A N/A N/A N/A N/A N/A
3 Coeff. TD N/A
3 Coeff. FS N/A N/A
3 Low-pass TD N/A
3 Low-pass FS N/A N/A
3 RNN TD N/A N/A N/A N/A N/A N/A N/A
4 Coeff. TD N/A
4 Coeff. FS N/A N/A
4 Low-pass TD N/A
4 Low-pass FS N/A N/A
4 RNN TD N/A N/A N/A N/A N/A N/A N/A
5 Coeff. TD N/A
5 Coeff. FS N/A N/A
5 Low-pass TD N/A
5 Low-pass FS N/A N/A
5 RNN TD N/A N/A N/A N/A N/A N/A N/A

Table scrolls horizontally if space is limited.
 

3. Feed-forward Compressor (LA-2A Leveling Amplifier)


For our third experiment, we use our time domain filter to learn the parameters for the Universal Audio LA-2A analog compressor. We optimise our proposed differentiable feed-forward compressor to match the target audio, examining its capability to replicate and infer the parameters of dynamic range controllers. We train and evaluate our compressor on the SignalTrain dataset, which consists of paired data recorded in 44.1 kHz from the LA-2A compressor with different peak reduction values.

Config. Peak
Reduction
Input Target TD (Ours) FS
LA-D 25
LA-E 50
LA-F 75

Table scrolls horizontally if space is limited.
 

Plugins




Figure 4: The Neutone FX plugin user interface.

We make the trained effect models accessible using the Neutone platform and SDK. This enables most users to experiment with the models via a real-time VST plugin in their preferred digital audio workstation (DAW) on arbitrary input audio. Older CPUs may struggle to run the models in real time.

Instructions

  1. Download and install the Neutone FX plugin.
  2. Download a model file from the links below.
  3. Open the plugin in your preferred digital audio workstation.
  4. Click on "load your own" at the top of the Neutone FX plugin interface and select one of the models you just downloaded.
  5. Use the four custom knobs to control the model.

Phaser

We provide the trained phaser model files for the six different parameter configurations explored in the paper. The phaser is controlled as follows:

  • Knob A: LFO rate (0.05 Hz to 3 Hz)
  • Knob B: LFO stereo offset (0 to 2π)
We recommend setting your DAW sampling rate to 44.1 kHz and using an M1 Pro MacBook or better for best results. We plan on adding an optimized, TorchScriptable C++ extension for our filter in the near future which will significantly reduce the computational overhead of the current C++ implementation.

Phaser Neutone Files
  • SS-A TD (feedback off, rate knob 3 o’clock (f0 ≈ 2.3 Hz))
  • SS-B TD (feedback off, rate knob 12 o’clock (f0 ≈ 0.6 Hz))
  • SS-C TD (feedback off, rate knob 9 o’clock (f0 ≈ 0.09 Hz))
  • SS-D TD (feedback on, rate knob 3 o’clock (f0 ≈ 1.4 Hz))
  • SS-E TD (feedback on, rate knob 12 o’clock (f0 ≈ 0.4 Hz))
  • SS-F TD (feedback on, rate knob 9 o’clock (f0 ≈ 0.06 Hz))

Time-varying Subtractive Synthesizer

We first provide a time domain and frequency sampling version of the low-pass biquad configuration of our synth implementation without the modulation extraction neural network so that users can familiarize themselves with the synth and test whether it can run in real time on their CPU. The synth is controlled as follows:

  • Knob A: Oscillator pitch (F#1 to C4)
  • Knob B: Decaying envelope exponent (0.2 to 3.0)
  • Knob C: Filter cutoff frequency (100 Hz to 8000 Hz)
  • Knob D: Filter resonance Q-factor (0.7071 to 8.0)
Sound and envelopes are generated continuously by the synth while input audio is detected. Due to the 4 parameter limit, the synth has fixed note on duration, oscillator gain, oscillator shape, and distortion gain parameters. All parameter values, ranges, and controllability can be modified by editing and running scripts/export_neutone_synth.py.

Acid Synth Neutone Files

Next we provide all trained synth model configurations from the paper. They include the modulation extraction neural network that generates all control parameters except for oscillator pitch. As a result these models are controlled with just the A knob:

  • Knob A: Oscillator pitch (F#1 to C4)
It is important to note that the modulation extraction neural network and STFT operator are currently non-causal and do not support streaming. As a result, generated audio is non-deterministic depending on the buffer alignment between the DAW and the input audio. Artefacts can also be heard at buffer transition points. We are planning on fixing these issues in the near future. For now, it is recommended to set your DAW's sampling rate to 48 kHz and buffer size to 512 samples or greater for the best performance possible when using these models.

Acid Synth Model Neutone Files

Feed-forward Compressor

The learned parameters for the feed-forward compressor are summarized in Table 6 of the paper. They can be applied to most default compressor plugins included in popular DAWs.

Citation



  @misc{ycy2024diffapf,
      title={Differentiable All-pole Filters for Time-varying Audio Systems},
      author={Chin-Yun Yu and Christopher Mitcheltree and Alistair Carson and Stefan Bilbao and Joshua D. Reiss and György Fazekas},
      year={2024},
      eprint={2404.07970},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
  }