ConvNet API

namespace convnet

Functions

ConvNetConfig parse_config_json(const nlohmann::json &config)

Parse ConvNet configuration from JSON.

Parameters:

config – JSON configuration object

Returns:

ConvNetConfig

std::unique_ptr<ModelConfig> create_config(const nlohmann::json &config, double sampleRate)

Config parser for ConfigParserRegistry.

class _Head
#include <convnet.h>

Public Functions

inline _Head()
_Head(const int in_channels, const int out_channels, std::vector<float>::iterator &weights)
void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end) const
class BatchNorm
#include <convnet.h>

Batch normalization layer.

In production mode, so really just an elementwise affine layer. Applies: y = (x - mean) / sqrt(variance + eps) * weight + bias which simplifies to: y = scale * x + loc

Public Functions

inline BatchNorm()

Default constructor.

BatchNorm(const int dim, std::vector<float>::iterator &weights)

Constructor with weights.

Parameters:
  • dim – Dimension of the input

  • weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void process_(Eigen::MatrixXf &input, const long i_start, const long i_end) const

Process input in-place.

Parameters:
  • input – Input matrix to process

  • i_start – Start index

  • i_end – End index

class ConvNet : public nam::Buffer
#include <convnet.h>

Convolutional neural network model.

A ConvNet consists of multiple ConvNetBlocks with increasing dilation factors, followed by a head layer that produces the final output.

Public Functions

ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)

Constructor.

Parameters:
  • in_channels – Number of input channels

  • out_channels – Number of output channels

  • channels – Number of channels in the hidden layers

  • dilations – Vector of dilation factors, one per block

  • batchnorm – Whether to use batch normalization

  • activation_config – Activation function configuration

  • weights – Model weights vector

  • expected_sample_rate – Expected sample rate in Hz (-1.0 if unknown)

  • groups – Number of groups for grouped convolution

~ConvNet() = default

Destructor.

virtual void process(NAM_SAMPLE **input, NAM_SAMPLE **output, const int num_frames) override

Process audio frames.

Parameters:
  • input – Input audio buffers

  • output – Output audio buffers

  • num_frames – Number of frames to process

virtual void SetMaxBufferSize(const int maxBufferSize) override

Resize all buffers to handle maxBufferSize frames.

Parameters:

maxBufferSize – Maximum number of frames to process in a single call

class ConvNetBlock
#include <convnet.h>

A single block in a ConvNet.

Consists of a dilated convolution, optional batch normalization, and activation.

Public Functions

inline ConvNetBlock()

Default constructor.

void set_weights_(const int in_channels, const int out_channels, const int _dilation, const bool batchnorm, const activations::ActivationConfig &activation_config, const int groups, std::vector<float>::iterator &weights)

Set the parameters (weights) of this block.

Parameters:
  • in_channels – Number of input channels

  • out_channels – Number of output channels

  • _dilation – Dilation factor for the convolution

  • batchnorm – Whether to use batch normalization

  • activation_config – Activation function configuration

  • groups – Number of groups for grouped convolution

  • weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void SetMaxBufferSize(const int maxBufferSize)

Resize buffers to handle maxBufferSize frames.

Parameters:

maxBufferSize – Maximum number of frames to process in a single call

void Process(const Eigen::MatrixXf &input, const int num_frames)

Process input matrix directly (new API, similar to WaveNet)

Parameters:
  • input – Input matrix (channels x num_frames)

  • num_frames – Number of frames to process

void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end)

Process input (legacy method for compatibility, uses indices)

Parameters:
  • input – Input matrix

  • output – Output matrix

  • i_start – Start index in input

  • i_end – End index in input

Eigen::Block<Eigen::MatrixXf> GetOutput(const int num_frames)

Get output from last Process() call.

Parameters:

num_frames – Number of frames to return

Returns:

Block reference to the output

long get_out_channels() const

Get the number of output channels.

Returns:

Number of output channels

Public Members

Conv1D conv

The dilated convolution layer.

struct ConvNetConfig : public nam::ModelConfig
#include <convnet.h>

Configuration for a ConvNet model.

Public Functions

virtual std::unique_ptr<DSP> create(std::vector<float> weights, double sampleRate) override

Construct a DSP object from this configuration.

Parameters:
  • weights – Model weights (taken by value to allow move for WaveNet)

  • sampleRate – Expected sample rate in Hz

Returns:

Unique pointer to a DSP object

Public Members

int channels
std::vector<int> dilations
bool batchnorm
activations::ActivationConfig activation
int groups
int in_channels
int out_channels
class ConvNet : public nam::Buffer

Convolutional neural network model.

A ConvNet consists of multiple ConvNetBlocks with increasing dilation factors, followed by a head layer that produces the final output.

Public Functions

ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)

Constructor.

Parameters:
  • in_channels – Number of input channels

  • out_channels – Number of output channels

  • channels – Number of channels in the hidden layers

  • dilations – Vector of dilation factors, one per block

  • batchnorm – Whether to use batch normalization

  • activation_config – Activation function configuration

  • weights – Model weights vector

  • expected_sample_rate – Expected sample rate in Hz (-1.0 if unknown)

  • groups – Number of groups for grouped convolution

~ConvNet() = default

Destructor.

virtual void process(NAM_SAMPLE **input, NAM_SAMPLE **output, const int num_frames) override

Process audio frames.

Parameters:
  • input – Input audio buffers

  • output – Output audio buffers

  • num_frames – Number of frames to process

virtual void SetMaxBufferSize(const int maxBufferSize) override

Resize all buffers to handle maxBufferSize frames.

Parameters:

maxBufferSize – Maximum number of frames to process in a single call

class ConvNetBlock

A single block in a ConvNet.

Consists of a dilated convolution, optional batch normalization, and activation.

Public Functions

inline ConvNetBlock()

Default constructor.

void set_weights_(const int in_channels, const int out_channels, const int _dilation, const bool batchnorm, const activations::ActivationConfig &activation_config, const int groups, std::vector<float>::iterator &weights)

Set the parameters (weights) of this block.

Parameters:
  • in_channels – Number of input channels

  • out_channels – Number of output channels

  • _dilation – Dilation factor for the convolution

  • batchnorm – Whether to use batch normalization

  • activation_config – Activation function configuration

  • groups – Number of groups for grouped convolution

  • weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void SetMaxBufferSize(const int maxBufferSize)

Resize buffers to handle maxBufferSize frames.

Parameters:

maxBufferSize – Maximum number of frames to process in a single call

void Process(const Eigen::MatrixXf &input, const int num_frames)

Process input matrix directly (new API, similar to WaveNet)

Parameters:
  • input – Input matrix (channels x num_frames)

  • num_frames – Number of frames to process

void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end)

Process input (legacy method for compatibility, uses indices)

Parameters:
  • input – Input matrix

  • output – Output matrix

  • i_start – Start index in input

  • i_end – End index in input

Eigen::Block<Eigen::MatrixXf> GetOutput(const int num_frames)

Get output from last Process() call.

Parameters:

num_frames – Number of frames to return

Returns:

Block reference to the output

long get_out_channels() const

Get the number of output channels.

Returns:

Number of output channels

Public Members

Conv1D conv

The dilated convolution layer.

class BatchNorm

Batch normalization layer.

In production mode, so really just an elementwise affine layer. Applies: y = (x - mean) / sqrt(variance + eps) * weight + bias which simplifies to: y = scale * x + loc

Public Functions

inline BatchNorm()

Default constructor.

BatchNorm(const int dim, std::vector<float>::iterator &weights)

Constructor with weights.

Parameters:
  • dim – Dimension of the input

  • weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void process_(Eigen::MatrixXf &input, const long i_start, const long i_end) const

Process input in-place.

Parameters:
  • input – Input matrix to process

  • i_start – Start index

  • i_end – End index