Parameters to be measured

Inference time

How much time did the forward inference took for the model.
1 / inference time = FPS we can get from the model

FLOPs: Floating Point Operations

Calculate the total floating operations - Addtion, Subtraction, Multiplication, Division
Large memory doesn't mean that FLOP's are higher.

FLOPS

Floating Point Operations per Second
The higher number of operations per second, the faster inference for the model

Name	Unit	Value
kiloFLOPS	kFLOPS	10^3
megaFLOPS	MFLOPS	10^6
gigaFLOPS	GFLOPS	10^9
teraFLOPS	TFLOPS	10^12
petaFLOPS	PFLOPS	10^15
exaFLOPS	EFLOPS	10^18
zettaFLOPS	ZFLOPS	10^21
yottaFLOPS	YFLOPS	10^24

MACs

Multiply-Accumulate computations
In a neuron, multiplication and addition operations happen in most of the cases.

W1 I1 + W2 I2 + W3 * I3

NOTE: 1 MAC = 2 FLOPs

Since the operation consist's of 1 multiply and 1 addition
NOTE: dot product = n multiplications + n - 1 additions = 2n - 1 FLOPs

How to calculate?

We can actually define each layer computation in terms of it operations
Here we are considering the batch size to be 1. If the batch size increases, the flops also linearly increases.

Fully Connected Layer

LAYER = (INPUT NODES) * (OUTPUT NODES) + BIAS

where * is the dot product discussed previously.
So, For calculating FLOPs, we are just multiplying input and output. We can also add bias term, but for approximation we can leave it out.


def _linear_flops(module, inp, out):
    mul = module.in_features
    add = module.in_features - 1
    total_ops = (mul + add) * out.numel()
    return total_ops

Activations

Most of the activations don't come with any overhead of multiplication but they do have some simpler arithemetic operation to it.

RELU

LAYER = INPUT NODES

# y = max(x,0) 

def _relu_flops(module, inp, out):
    return inp.numel()

Tanh

LAYER = INPUT NODES * 5

# y = e^(x) - e^(-x) / e^(x) + e^(-x)

def _tanh_flops(module, inp, out):
    # exp, exp^-1, sub, add, div for each element
    total_ops = 5 * inp.numel()
    return total_ops

sigmoid

LAYER = INPUT NODES * 4

# y = 1 / (1 + e^(-x)) 


def _sigmoid_flops(module, inp, out):
    # negate, exp, add, div for each element
    total_ops = 4 * inp.numel()
    return total_ops

Pooling Layer

Depends on type of Pooling and Stride

MaxPool 1D, 2D, 3D

LAYER = Max(INPUT NODES)

# Same as output

def _maxpool_flops(module, inp, out):
    total_ops = out.numel()
    return total_ops

Average MaxPool 1D, 2D, 3D

LAYER = Average(INPUT NODES)

# Same as output with kernel size

def _avgpool_flops(module, inp, out):
    # pool: kernel size, avg: 1
    kernel_ops = _torch.prod(_torch.Tensor([module.kernel_size]))
    total_ops = (kernel_ops + 1) * out.numel()
    return total_ops

DropOut

LAYER = probability of dropout * INPUT
Can be considered as Zero

Batch Normalization : Only while training

LAYER = gamma * (y - mean) / sqrt(variance + epsilon) + beta

# 4 * number of input 

def _bn_flops(module, inp, out):
    nelements = inp.numel()
    # subtract, divide, gamma, beta
    total_ops = 4 * nelements
    return total_ops

Softmax

LAYER = e^(Zi)/ SUM(e^(Zi))

def _softmax_flops(module, inp, out):
    batch_size, nfeatures = inp.size()
    # exp: nfeatures, add: nfeatures-1, div: nfeatures
    total_ops = batch_size * (3 * nfeatures - 1)
    return total_ops

Convolutions

LAYER = Number of Kernel x Kernel Shape x Output Shape


def _convNd_flops(module, inp, out):
    kernel_ops = module.weight.size()[2:].numel()  # k_h x k_w
    bias_ops = 1 if module.bias is not None else 0
    # (batch x out_c x out_h x out_w) x  (in_c x k_h x k_w + bias)
    total_ops = out.nelement() * \
        (module.in_channels // module.groups * kernel_ops + bias_ops)
    return total_ops

Depthwise convolution

Filter and input is broken channel-wise and convolved separately. After that, they are stacked together. Example

Number of operations are reduced here:

LAYER = Number of Kernel x Kernel Shape x Output Shape(without channel)

Pointwise convolution

1x1 Filter is applied on each pixel of input with same channel. Example
Number of operations are reduced here:

LAYER = Number of Kernel x Kernel Shape(1x1) x Output Shape(without channel)