ConvNet API

namespace convnet

Functions

ConvNetConfig parse_config_json(const nlohmann::json &config)

Parse ConvNet configuration from JSON.

Parameters:: config – JSON configuration object
Returns:: ConvNetConfig

std::unique_ptr<ModelConfig> create_config(const nlohmann::json &config, double sampleRate): Config parser for ConfigParserRegistry.

class _Head

#include <convnet.h>

Public Functions

inline _Head()

_Head(const int in_channels, const int out_channels, std::vector<float>::iterator &weights)

void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end) const

class BatchNorm

#include <convnet.h>

Batch normalization layer.

In production mode, so really just an elementwise affine layer. Applies: y = (x - mean) / sqrt(variance + eps) * weight + bias which simplifies to: y = scale * x + loc

Public Functions

inline BatchNorm(): Default constructor.

BatchNorm(const int dim, std::vector<float>::iterator &weights)

Constructor with weights.

Parameters:

dim – Dimension of the input
weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void process_(Eigen::MatrixXf &input, const long i_start, const long i_end) const

Process input in-place.

Parameters:

input – Input matrix to process
i_start – Start index
i_end – End index

class ConvNet : public nam::Buffer 

#include <convnet.h>

Convolutional neural network model.

A ConvNet consists of multiple ConvNetBlocks with increasing dilation factors, followed by a head layer that produces the final output.

Public Functions

ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)

Constructor.

Parameters:

in_channels – Number of input channels
out_channels – Number of output channels
channels – Number of channels in the hidden layers
dilations – Vector of dilation factors, one per block
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
weights – Model weights vector
expected_sample_rate – Expected sample rate in Hz (-1.0 if unknown)
groups – Number of groups for grouped convolution

~ConvNet() = default: Destructor.

virtual void process(NAM_SAMPLE **input, NAM_SAMPLE **output, const int num_frames) override

Process audio frames.

Parameters:

input – Input audio buffers
output – Output audio buffers
num_frames – Number of frames to process

virtual void SetMaxBufferSize(const int maxBufferSize) override

Resize all buffers to handle maxBufferSize frames.

Parameters:: maxBufferSize – Maximum number of frames to process in a single call

class ConvNetBlock

#include <convnet.h>

A single block in a ConvNet.

Consists of a dilated convolution, optional batch normalization, and activation.

Public Functions

inline ConvNetBlock(): Default constructor.

void set_weights_(const int in_channels, const int out_channels, const int _dilation, const bool batchnorm, const activations::ActivationConfig &activation_config, const int groups, std::vector<float>::iterator &weights)

Set the parameters (weights) of this block.

Parameters:

in_channels – Number of input channels
out_channels – Number of output channels
_dilation – Dilation factor for the convolution
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
groups – Number of groups for grouped convolution
weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void SetMaxBufferSize(const int maxBufferSize)

Resize buffers to handle maxBufferSize frames.

Parameters:: maxBufferSize – Maximum number of frames to process in a single call

void Process(const Eigen::MatrixXf &input, const int num_frames)

Process input matrix directly (new API, similar to WaveNet)

Parameters:

input – Input matrix (channels x num_frames)
num_frames – Number of frames to process

void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end)

Process input (legacy method for compatibility, uses indices)

Parameters:

input – Input matrix
output – Output matrix
i_start – Start index in input
i_end – End index in input

Eigen::Block<Eigen::MatrixXf> GetOutput(const int num_frames)

Get output from last Process() call.

Parameters:: num_frames – Number of frames to return
Returns:: Block reference to the output

long get_out_channels() const

Get the number of output channels.

Returns:: Number of output channels

Public Members

Conv1D conv: The dilated convolution layer.

struct ConvNetConfig : public nam::ModelConfig 

#include <convnet.h>

Configuration for a ConvNet model.

Public Functions

virtual std::unique_ptr<DSP> create(std::vector<float> weights, double sampleRate) override

Construct a DSP object from this configuration.

Parameters:

weights – Model weights (taken by value to allow move for WaveNet)
sampleRate – Expected sample rate in Hz

Returns:

Unique pointer to a DSP object

Public Members

int channels

std::vector<int> dilations

bool batchnorm

activations::ActivationConfig activation

int groups

int in_channels

int out_channels

class ConvNet : public nam::Buffer

Convolutional neural network model.

A ConvNet consists of multiple ConvNetBlocks with increasing dilation factors, followed by a head layer that produces the final output.

Public Functions

ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)

Constructor.

Parameters:

in_channels – Number of input channels
out_channels – Number of output channels
channels – Number of channels in the hidden layers
dilations – Vector of dilation factors, one per block
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
weights – Model weights vector
expected_sample_rate – Expected sample rate in Hz (-1.0 if unknown)
groups – Number of groups for grouped convolution

~ConvNet() = default: Destructor.

virtual void process(NAM_SAMPLE **input, NAM_SAMPLE **output, const int num_frames) override

Process audio frames.

Parameters:

input – Input audio buffers
output – Output audio buffers
num_frames – Number of frames to process

virtual void SetMaxBufferSize(const int maxBufferSize) override

Resize all buffers to handle maxBufferSize frames.

Parameters:: maxBufferSize – Maximum number of frames to process in a single call

class ConvNetBlock

A single block in a ConvNet.

Consists of a dilated convolution, optional batch normalization, and activation.

Public Functions

inline ConvNetBlock(): Default constructor.

void set_weights_(const int in_channels, const int out_channels, const int _dilation, const bool batchnorm, const activations::ActivationConfig &activation_config, const int groups, std::vector<float>::iterator &weights)

Set the parameters (weights) of this block.

Parameters:

in_channels – Number of input channels
out_channels – Number of output channels
_dilation – Dilation factor for the convolution
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
groups – Number of groups for grouped convolution
weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void SetMaxBufferSize(const int maxBufferSize)

Resize buffers to handle maxBufferSize frames.

Parameters:: maxBufferSize – Maximum number of frames to process in a single call

void Process(const Eigen::MatrixXf &input, const int num_frames)

Process input matrix directly (new API, similar to WaveNet)

Parameters:

input – Input matrix (channels x num_frames)
num_frames – Number of frames to process

void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end)

Process input (legacy method for compatibility, uses indices)

Parameters:

input – Input matrix
output – Output matrix
i_start – Start index in input
i_end – End index in input

Eigen::Block<Eigen::MatrixXf> GetOutput(const int num_frames)

Get output from last Process() call.

Parameters:: num_frames – Number of frames to return
Returns:: Block reference to the output

long get_out_channels() const

Get the number of output channels.

Returns:: Number of output channels

Public Members

Conv1D conv: The dilated convolution layer.

class BatchNorm

Batch normalization layer.

In production mode, so really just an elementwise affine layer. Applies: y = (x - mean) / sqrt(variance + eps) * weight + bias which simplifies to: y = scale * x + loc

Public Functions

inline BatchNorm(): Default constructor.

BatchNorm(const int dim, std::vector<float>::iterator &weights)

Constructor with weights.

Parameters:

dim – Dimension of the input
weights – Iterator to the weights vector. Will be advanced as weights are consumed.

void process_(Eigen::MatrixXf &input, const long i_start, const long i_end) const

Process input in-place.

Parameters:

input – Input matrix to process
i_start – Start index
i_end – End index