File Information

import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

OpenCV

To read a sample image, and reading its properties like height, width, and channels.

  • Reads in BGR format
  • shape is for finding the size of image
  • rows, column, and channel are returned from shape
  • rows means height, column means width, channels : 3 for color or 1 for Grayscale
img = cv2.imread("image_formats/test.png")
Row,Column,Channel = img.shape
print("Rows = ", Row, "Columns = ", Column)
Rows =  534 Columns =  800
print("Height = ", Row, "Width = ", Column, "Channels = ", Channel) 
Height =  534 Width =  800 Channels =  3

PIL

  • Reads in RGB format
  • size is for finding the size of image
  • column and rows are returned from size
  • rows means height, column means width
  • mode is string which defines the type and depth of a pixel in image
img_pil = Image.open("image_formats/test.png")
  • 1 (1-bit pixels, black and white, stored with one pixel per byte)
  • L (8-bit pixels, black and white)
  • RGB (3x8-bit pixels, true color)
  • RGBA (4x8-bit pixels, true color with transparency mask)
  • CMYK (4x8-bit pixels, color separation)
  • HSV (3x8-bit pixels, Hue, Saturation, Value color space)
img_pil.mode
'RGB'
Column,Row = img_pil.size
print("Rows = ", Row, "Columns = ", Column)
Rows =  534 Columns =  800

Numpy

Using OpenCV

png_np_img = np.asarray(img)
plt.imshow(png_np_img)

# if its grayscale plt.imshow(png_np_img, cmap='gray')
# Showing the np characteritics

print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element
shape is  (534, 800, 3)
dtype is  uint8
ndim is  3
itemsize is  1
nbytes is  1281600

Using PIL

png_np_img = np.asarray(img_pil)
plt.imshow(png_np_img) # this will graphit in a jupyter notebook
# or if its grayscale plt.imshow(png_np_img, cmap='gray')
# FWIW, this will show the np characteritics
print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element
shape is  (534, 800, 3)
dtype is  uint8
ndim is  3
itemsize is  1
nbytes is  1281600

Access Pixel

  • Using GIMP, the coordinate system (x,y) is defined as below

  • At (600,100) in GIMP we have following R,G,B values (21,131,172)

Using OpenCV

  • Uses BGR format
  • Accesses using row first, then column
  • The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)
row = 100
col = 600
pixel_b, pixel_g, pixel_r = img[row][col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values =  172 131 21

Using PIL

  • Uses RGB format
  • Accesses using column first, then row
  • The format (X,Y) -> (600,100) or (COL,ROW) -> (600,100)
row = 100
col = 600
pixel_r, pixel_g, pixel_b = img_pil.getpixel((col,row))
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values =  172 131 21

Using Numpy

  • Uses RGB/BGR depending on input
  • Accesses using row first, then column for all formats
  • The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)
row = 100
col = 600
pixel_b, pixel_g, pixel_r = np.array(img)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values =  172 131 21
row = 100
col = 600
pixel_r, pixel_g, pixel_b = np.array(img_pil)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values =  172 131 21

Channels

Using OpenCV

b,g,r = cv2.split(img)
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
<matplotlib.image.AxesImage at 0x15f9b7f70>

Using PIL

r,g,b = img_pil.split()
#r = img.getchannel(0)
#g = img.getchannel(1)
#b = img.getchannel(2)
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img_pil)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
<matplotlib.image.AxesImage at 0x15fc4b880>

Using Numpy

arr = np.array(img_pil)

r = arr[:,:,0]
g = arr[:,:,1]
b = arr[:,:,2]
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
<matplotlib.image.AxesImage at 0x16000c8e0>
arr = np.array(img)

b = arr[:,:,0]
g = arr[:,:,1]
r = arr[:,:,2]
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
<matplotlib.image.AxesImage at 0x160146fd0>

Arrangement

Column major matrix

  • Height X Width X Channel
  • Each sample is stored as a column-major matrix (height, width) of float[Channels] (b00, g00, r00, b10, g10, r10, b01, g01, r01, b11, g11, r11).
Offset (byte) :  0  1  2  3  4  5  6  7  8 ...29 30 31 32 33 34 35 36 37 ...
Height Pos    :  0  0  0  0  0  0  0  0  0 ... 0  0  0  1  1  1  1  1  1 ...
Width Pos     :  0  0  0  1  1  1  2  2  2 ... 9  9  9  0  0  0  1  1  1 ...
Color Index   :  B  G  R  B  G  R  B  G  R ... B  G  R  B  G  R  B  G  R ...

Row major matrix

  • Channel X Height X Width
  • Each sample is stored as a row-major matrix of float[Channels] of (height, width) (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).
Offset (byte) :  0  1  2  3 ... 9 10 11 12 13 ...90 91 92 93 ... 99 100 ... 199 200 ... 299 
Color Index   :  B  B  B  B ... B  B  B  B  B ... B  B  B  B ...  B   G ...   G   R ...   R
Height Pos    :  0  0  0  0 ... 0  0  0  0  0 ... 9  9  9  9 ...  9   0 ...   9   0 ...   9
Width Pos     :  0  1  2  3 ... 9  0  1  2  3 ... 0  1  2  3 ...  9   0 ...   9   0 ...   9

Tensorrt Cudnn Format

  • legacy ("HWC") mode (CPU and GPU without cudnn): Channels are tuples of scalars

    Each sample is stored as a column-major matrix (height, width) of float[numChannels] (r00, g00, b00, r10, g10, b10, r01, g01, b01, r11, g11, b11).

    • input : [C x W x H x N] or ARRAY[1..N] OF ARRAY[1..H] OF ARRAY[1..W] OF ARRAY[1..C]
    • output : [C' x W' x H' x N] or ARRAY[1..N] OF ARRAY[1..H'] OF ARRAY[1..W'] OF ARRAY[1..C']
    • filter : [C' x W" x H" x C] or ARRAY[1..C] OF ARRAY[1..H"] OF ARRAY[1..W"] OF ARRAY[1..C']
  • cudnn ("CHW") mode (GPU only): Channels are planes (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).

    • input : [W x H x C x N] or ARRAY[1..N] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]
    • output : [W' x H' x C' x N] or ARRAY[1..N] OF ARRAY[1..C'] OF ARRAY[1..H'] OF ARRAY[1..W']
    • filter : [W" x H" x C x C'] or ARRAY[1..C'] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]

where:

  • using ' for output and " for filter
  • N = samples
  • W, H = width, height (W', H' for output, W", H" for kernel)
  • C = input channels
    • 3 for color images, 1 for B&W images
    • for hidden layer: dimension of activation vector for each pixel
  • C' = output channels = dimension of activation vector for each pixel