ConvNet API
-
namespace convnet
Functions
-
ConvNetConfig parse_config_json(const nlohmann::json &config)
Parse ConvNet configuration from JSON.
- Parameters:
config – JSON configuration object
- Returns:
-
std::unique_ptr<ModelConfig> create_config(const nlohmann::json &config, double sampleRate)
Config parser for ConfigParserRegistry.
-
class _Head
- #include <convnet.h>
-
class BatchNorm
- #include <convnet.h>
Batch normalization layer.
In production mode, so really just an elementwise affine layer. Applies: y = (x - mean) / sqrt(variance + eps) * weight + bias which simplifies to: y = scale * x + loc
Public Functions
-
inline BatchNorm()
Default constructor.
-
BatchNorm(const int dim, std::vector<float>::iterator &weights)
Constructor with weights.
- Parameters:
dim – Dimension of the input
weights – Iterator to the weights vector. Will be advanced as weights are consumed.
-
void process_(Eigen::MatrixXf &input, const long i_start, const long i_end) const
Process input in-place.
- Parameters:
input – Input matrix to process
i_start – Start index
i_end – End index
-
inline BatchNorm()
-
class ConvNet : public nam::Buffer
- #include <convnet.h>
Convolutional neural network model.
A ConvNet consists of multiple ConvNetBlocks with increasing dilation factors, followed by a head layer that produces the final output.
Public Functions
-
ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)
Constructor.
- Parameters:
in_channels – Number of input channels
out_channels – Number of output channels
channels – Number of channels in the hidden layers
dilations – Vector of dilation factors, one per block
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
weights – Model weights vector
expected_sample_rate – Expected sample rate in Hz (-1.0 if unknown)
groups – Number of groups for grouped convolution
-
~ConvNet() = default
Destructor.
-
virtual void process(NAM_SAMPLE **input, NAM_SAMPLE **output, const int num_frames) override
Process audio frames.
- Parameters:
input – Input audio buffers
output – Output audio buffers
num_frames – Number of frames to process
-
virtual void SetMaxBufferSize(const int maxBufferSize) override
Resize all buffers to handle maxBufferSize frames.
- Parameters:
maxBufferSize – Maximum number of frames to process in a single call
-
ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)
-
class ConvNetBlock
- #include <convnet.h>
A single block in a ConvNet.
Consists of a dilated convolution, optional batch normalization, and activation.
Public Functions
-
inline ConvNetBlock()
Default constructor.
-
void set_weights_(const int in_channels, const int out_channels, const int _dilation, const bool batchnorm, const activations::ActivationConfig &activation_config, const int groups, std::vector<float>::iterator &weights)
Set the parameters (weights) of this block.
- Parameters:
in_channels – Number of input channels
out_channels – Number of output channels
_dilation – Dilation factor for the convolution
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
groups – Number of groups for grouped convolution
weights – Iterator to the weights vector. Will be advanced as weights are consumed.
-
void SetMaxBufferSize(const int maxBufferSize)
Resize buffers to handle maxBufferSize frames.
- Parameters:
maxBufferSize – Maximum number of frames to process in a single call
-
void Process(const Eigen::MatrixXf &input, const int num_frames)
Process input matrix directly (new API, similar to WaveNet)
- Parameters:
input – Input matrix (channels x num_frames)
num_frames – Number of frames to process
-
void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end)
Process input (legacy method for compatibility, uses indices)
- Parameters:
input – Input matrix
output – Output matrix
i_start – Start index in input
i_end – End index in input
-
Eigen::Block<Eigen::MatrixXf> GetOutput(const int num_frames)
Get output from last Process() call.
- Parameters:
num_frames – Number of frames to return
- Returns:
Block reference to the output
-
long get_out_channels() const
Get the number of output channels.
- Returns:
Number of output channels
-
inline ConvNetBlock()
-
struct ConvNetConfig : public nam::ModelConfig
- #include <convnet.h>
Configuration for a ConvNet model.
Public Functions
Public Members
-
int channels
-
std::vector<int> dilations
-
bool batchnorm
-
activations::ActivationConfig activation
-
int groups
-
int in_channels
-
int out_channels
-
int channels
-
ConvNetConfig parse_config_json(const nlohmann::json &config)
-
class ConvNet : public nam::Buffer
Convolutional neural network model.
A ConvNet consists of multiple ConvNetBlocks with increasing dilation factors, followed by a head layer that produces the final output.
Public Functions
-
ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)
Constructor.
- Parameters:
in_channels – Number of input channels
out_channels – Number of output channels
channels – Number of channels in the hidden layers
dilations – Vector of dilation factors, one per block
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
weights – Model weights vector
expected_sample_rate – Expected sample rate in Hz (-1.0 if unknown)
groups – Number of groups for grouped convolution
-
~ConvNet() = default
Destructor.
-
virtual void process(NAM_SAMPLE **input, NAM_SAMPLE **output, const int num_frames) override
Process audio frames.
- Parameters:
input – Input audio buffers
output – Output audio buffers
num_frames – Number of frames to process
-
virtual void SetMaxBufferSize(const int maxBufferSize) override
Resize all buffers to handle maxBufferSize frames.
- Parameters:
maxBufferSize – Maximum number of frames to process in a single call
-
ConvNet(const int in_channels, const int out_channels, const int channels, const std::vector<int> &dilations, const bool batchnorm, const activations::ActivationConfig &activation_config, std::vector<float> &weights, const double expected_sample_rate = -1.0, const int groups = 1)
-
class ConvNetBlock
A single block in a ConvNet.
Consists of a dilated convolution, optional batch normalization, and activation.
Public Functions
-
inline ConvNetBlock()
Default constructor.
-
void set_weights_(const int in_channels, const int out_channels, const int _dilation, const bool batchnorm, const activations::ActivationConfig &activation_config, const int groups, std::vector<float>::iterator &weights)
Set the parameters (weights) of this block.
- Parameters:
in_channels – Number of input channels
out_channels – Number of output channels
_dilation – Dilation factor for the convolution
batchnorm – Whether to use batch normalization
activation_config – Activation function configuration
groups – Number of groups for grouped convolution
weights – Iterator to the weights vector. Will be advanced as weights are consumed.
-
void SetMaxBufferSize(const int maxBufferSize)
Resize buffers to handle maxBufferSize frames.
- Parameters:
maxBufferSize – Maximum number of frames to process in a single call
-
void Process(const Eigen::MatrixXf &input, const int num_frames)
Process input matrix directly (new API, similar to WaveNet)
- Parameters:
input – Input matrix (channels x num_frames)
num_frames – Number of frames to process
-
void process_(const Eigen::MatrixXf &input, Eigen::MatrixXf &output, const long i_start, const long i_end)
Process input (legacy method for compatibility, uses indices)
- Parameters:
input – Input matrix
output – Output matrix
i_start – Start index in input
i_end – End index in input
-
Eigen::Block<Eigen::MatrixXf> GetOutput(const int num_frames)
Get output from last Process() call.
- Parameters:
num_frames – Number of frames to return
- Returns:
Block reference to the output
-
long get_out_channels() const
Get the number of output channels.
- Returns:
Number of output channels
Public Members
-
Conv1D conv
The dilated convolution layer.
-
inline ConvNetBlock()
-
class BatchNorm
Batch normalization layer.
In production mode, so really just an elementwise affine layer. Applies: y = (x - mean) / sqrt(variance + eps) * weight + bias which simplifies to: y = scale * x + loc
Public Functions
-
inline BatchNorm()
Default constructor.
-
BatchNorm(const int dim, std::vector<float>::iterator &weights)
Constructor with weights.
- Parameters:
dim – Dimension of the input
weights – Iterator to the weights vector. Will be advanced as weights are consumed.
-
void process_(Eigen::MatrixXf &input, const long i_start, const long i_end) const
Process input in-place.
- Parameters:
input – Input matrix to process
i_start – Start index
i_end – End index
-
inline BatchNorm()