File Information

import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

OpenCV

To read a sample image, and reading its properties like height, width, and channels.

Reads in BGR format
shape is for finding the size of image
rows, column, and channel are returned from shape
rows means height, column means width, channels : 3 for color or 1 for Grayscale

img = cv2.imread("image_formats/test.png")

Row,Column,Channel = img.shape

print("Rows = ", Row, "Columns = ", Column)

Rows =  534 Columns =  800

print("Height = ", Row, "Width = ", Column, "Channels = ", Channel)

Height =  534 Width =  800 Channels =  3

PIL

Reads in RGB format
size is for finding the size of image
column and rows are returned from size
rows means height, column means width
mode is string which defines the type and depth of a pixel in image

img_pil = Image.open("image_formats/test.png")

1 (1-bit pixels, black and white, stored with one pixel per byte)
L (8-bit pixels, black and white)
RGB (3x8-bit pixels, true color)
RGBA (4x8-bit pixels, true color with transparency mask)
CMYK (4x8-bit pixels, color separation)
HSV (3x8-bit pixels, Hue, Saturation, Value color space)

img_pil.mode

'RGB'

Column,Row = img_pil.size

print("Rows = ", Row, "Columns = ", Column)

Rows =  534 Columns =  800

Numpy

Using OpenCV

png_np_img = np.asarray(img)
plt.imshow(png_np_img)

# if its grayscale plt.imshow(png_np_img, cmap='gray')
# Showing the np characteritics

print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element

shape is  (534, 800, 3)
dtype is  uint8
ndim is  3
itemsize is  1
nbytes is  1281600

Using PIL

png_np_img = np.asarray(img_pil)
plt.imshow(png_np_img) # this will graphit in a jupyter notebook
# or if its grayscale plt.imshow(png_np_img, cmap='gray')
# FWIW, this will show the np characteritics
print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element

shape is  (534, 800, 3)
dtype is  uint8
ndim is  3
itemsize is  1
nbytes is  1281600

Access Pixel

Using GIMP, the coordinate system (x,y) is defined as below

At (600,100) in GIMP we have following R,G,B values (21,131,172)

Using OpenCV

Uses BGR format
Accesses using row first, then column
The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)

row = 100
col = 600
pixel_b, pixel_g, pixel_r = img[row][col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

Using PIL

Uses RGB format
Accesses using column first, then row
The format (X,Y) -> (600,100) or (COL,ROW) -> (600,100)

row = 100
col = 600
pixel_r, pixel_g, pixel_b = img_pil.getpixel((col,row))
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

Using Numpy

Uses RGB/BGR depending on input
Accesses using row first, then column for all formats
The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)

row = 100
col = 600
pixel_b, pixel_g, pixel_r = np.array(img)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

row = 100
col = 600
pixel_r, pixel_g, pixel_b = np.array(img_pil)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

Channels

Using OpenCV

b,g,r = cv2.split(img)

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

<matplotlib.image.AxesImage at 0x15f9b7f70>

Using PIL

r,g,b = img_pil.split()
#r = img.getchannel(0)
#g = img.getchannel(1)
#b = img.getchannel(2)

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img_pil)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

<matplotlib.image.AxesImage at 0x15fc4b880>

Using Numpy

arr = np.array(img_pil)

r = arr[:,:,0]
g = arr[:,:,1]
b = arr[:,:,2]

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

<matplotlib.image.AxesImage at 0x16000c8e0>

arr = np.array(img)

b = arr[:,:,0]
g = arr[:,:,1]
r = arr[:,:,2]

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

<matplotlib.image.AxesImage at 0x160146fd0>

Arrangement

Column major matrix

Height X Width X Channel
Each sample is stored as a column-major matrix (height, width) of float[Channels] (b00, g00, r00, b10, g10, r10, b01, g01, r01, b11, g11, r11).

Offset (byte) :  0  1  2  3  4  5  6  7  8 ...29 30 31 32 33 34 35 36 37 ...
Height Pos    :  0  0  0  0  0  0  0  0  0 ... 0  0  0  1  1  1  1  1  1 ...
Width Pos     :  0  0  0  1  1  1  2  2  2 ... 9  9  9  0  0  0  1  1  1 ...
Color Index   :  B  G  R  B  G  R  B  G  R ... B  G  R  B  G  R  B  G  R ...

Row major matrix

Channel X Height X Width
Each sample is stored as a row-major matrix of float[Channels] of (height, width) (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).

Offset (byte) :  0  1  2  3 ... 9 10 11 12 13 ...90 91 92 93 ... 99 100 ... 199 200 ... 299 
Color Index   :  B  B  B  B ... B  B  B  B  B ... B  B  B  B ...  B   G ...   G   R ...   R
Height Pos    :  0  0  0  0 ... 0  0  0  0  0 ... 9  9  9  9 ...  9   0 ...   9   0 ...   9
Width Pos     :  0  1  2  3 ... 9  0  1  2  3 ... 0  1  2  3 ...  9   0 ...   9   0 ...   9

Tensorrt Cudnn Format

legacy ("HWC") mode (CPU and GPU without cudnn): Channels are tuples of scalars

Each sample is stored as a column-major matrix (height, width) of float[numChannels] (r00, g00, b00, r10, g10, b10, r01, g01, b01, r11, g11, b11).
- input : [C x W x H x N] or ARRAY[1..N] OF ARRAY[1..H] OF ARRAY[1..W] OF ARRAY[1..C]
- output : [C' x W' x H' x N] or ARRAY[1..N] OF ARRAY[1..H'] OF ARRAY[1..W'] OF ARRAY[1..C']
- filter : [C' x W" x H" x C] or ARRAY[1..C] OF ARRAY[1..H"] OF ARRAY[1..W"] OF ARRAY[1..C']
cudnn ("CHW") mode (GPU only): Channels are planes (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).
- input : [W x H x C x N] or ARRAY[1..N] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]
- output : [W' x H' x C' x N] or ARRAY[1..N] OF ARRAY[1..C'] OF ARRAY[1..H'] OF ARRAY[1..W']
- filter : [W" x H" x C x C'] or ARRAY[1..C'] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]

where:

using ' for output and " for filter
N = samples
W, H = width, height (W', H' for output, W", H" for kernel)
C = input channels
- 3 for color images, 1 for B&W images
- for hidden layer: dimension of activation vector for each pixel
C' = output channels = dimension of activation vector for each pixel