Image Formats¶

Introduction for the various formats in reading an image and then use it in tensorrt/Cudnn.

toc: true
comments: true
layout: post
categories: [OpenCV, PIL, TensorRT, Python]
image: images/trtlogo.png
author: Prince

File Information¶

No description has been provided for this image

In [37]:

import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

OpenCV¶

To read a sample image, and reading its properties like height, width, and channels.

Reads in BGR format
shape is for finding the size of image
rows, column, and channel are returned from shape
rows means height, column means width, channels : 3 for color or 1 for Grayscale

In [38]:

img = cv2.imread("image_formats/test.png")

In [39]:

Row,Column,Channel = img.shape

In [40]:

print("Rows = ", Row, "Columns = ", Column)

Rows =  534 Columns =  800

In [41]:

print("Height = ", Row, "Width = ", Column, "Channels = ", Channel)

Height =  534 Width =  800 Channels =  3

PIL¶

Reads in RGB format
size is for finding the size of image
column and rows are returned from size
rows means height, column means width
mode is string which defines the type and depth of a pixel in image

In [42]:

img_pil = Image.open("image_formats/test.png")

1 (1-bit pixels, black and white, stored with one pixel per byte)
L (8-bit pixels, black and white)
RGB (3x8-bit pixels, true color)
RGBA (4x8-bit pixels, true color with transparency mask)
CMYK (4x8-bit pixels, color separation)
HSV (3x8-bit pixels, Hue, Saturation, Value color space)

In [48]:

img_pil.mode

Out[48]:

'RGB'

In [49]:

Column,Row = img_pil.size

In [50]:

print("Rows = ", Row, "Columns = ", Column)

Rows =  534 Columns =  800

Numpy¶

Using OpenCV¶

In [46]:

png_np_img = np.asarray(img)
plt.imshow(png_np_img)

# if its grayscale plt.imshow(png_np_img, cmap='gray')
# Showing the np characteritics

print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element

shape is  (534, 800, 3)
dtype is  uint8
ndim is  3
itemsize is  1
nbytes is  1281600

Using PIL¶

In [47]:

png_np_img = np.asarray(img_pil)
plt.imshow(png_np_img) # this will graphit in a jupyter notebook
# or if its grayscale plt.imshow(png_np_img, cmap='gray')
# FWIW, this will show the np characteritics
print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element

shape is  (534, 800, 3)
dtype is  uint8
ndim is  3
itemsize is  1
nbytes is  1281600

Access Pixel¶

Using GIMP, the coordinate system (x,y) is defined as below

At (600,100) in GIMP we have following R,G,B values (21,131,172)

Using OpenCV¶

Uses BGR format
Accesses using row first, then column
The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)

In [92]:

row = 100
col = 600
pixel_b, pixel_g, pixel_r = img[row][col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

Using PIL¶

Uses RGB format
Accesses using column first, then row
The format (X,Y) -> (600,100) or (COL,ROW) -> (600,100)

In [93]:

row = 100
col = 600
pixel_r, pixel_g, pixel_b = img_pil.getpixel((col,row))
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

Using Numpy¶

Uses RGB/BGR depending on input
Accesses using row first, then column for all formats
The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)

In [101]:

# OpenCV
row = 100
col = 600
pixel_b, pixel_g, pixel_r = np.array(img)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

In [102]:

# PIL
row = 100
col = 600
pixel_r, pixel_g, pixel_b = np.array(img_pil)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)

BGR values =  172 131 21

Channels¶

Using OpenCV¶

In [63]:

b,g,r = cv2.split(img)

In [64]:

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

Out[64]:

<matplotlib.image.AxesImage at 0x15f9b7f70>

Using PIL¶

In [69]:

r,g,b = img_pil.split()
#r = img.getchannel(0)
#g = img.getchannel(1)
#b = img.getchannel(2)

In [70]:

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img_pil)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

Out[70]:

<matplotlib.image.AxesImage at 0x15fc4b880>

Using Numpy¶

In [75]:

# PIL
arr = np.array(img_pil)

r = arr[:,:,0]
g = arr[:,:,1]
b = arr[:,:,2]

In [76]:

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

Out[76]:

<matplotlib.image.AxesImage at 0x16000c8e0>

In [77]:

# OpenCV
arr = np.array(img)

b = arr[:,:,0]
g = arr[:,:,1]
r = arr[:,:,2]

In [78]:

f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)

Out[78]:

<matplotlib.image.AxesImage at 0x160146fd0>

Arrangement¶

Column major matrix¶

Height X Width X Channel
Each sample is stored as a column-major matrix (height, width) of float[Channels] (b00, g00, r00, b10, g10, r10, b01, g01, r01, b11, g11, r11).

In [ ]:

Offset (byte) :  0  1  2  3  4  5  6  7  8 ...29 30 31 32 33 34 35 36 37 ...
Height Pos    :  0  0  0  0  0  0  0  0  0 ... 0  0  0  1  1  1  1  1  1 ...
Width Pos     :  0  0  0  1  1  1  2  2  2 ... 9  9  9  0  0  0  1  1  1 ...
Color Index   :  B  G  R  B  G  R  B  G  R ... B  G  R  B  G  R  B  G  R ...

Row major matrix¶

Channel X Height X Width
Each sample is stored as a row-major matrix of float[Channels] of (height, width) (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).

In [ ]:

Offset (byte) :  0  1  2  3 ... 9 10 11 12 13 ...90 91 92 93 ... 99 100 ... 199 200 ... 299 
Color Index   :  B  B  B  B ... B  B  B  B  B ... B  B  B  B ...  B   G ...   G   R ...   R
Height Pos    :  0  0  0  0 ... 0  0  0  0  0 ... 9  9  9  9 ...  9   0 ...   9   0 ...   9
Width Pos     :  0  1  2  3 ... 9  0  1  2  3 ... 0  1  2  3 ...  9   0 ...   9   0 ...   9

Tensorrt Cudnn Format¶

legacy ("HWC") mode (CPU and GPU without cudnn): Channels are tuples of scalars

Each sample is stored as a column-major matrix (height, width) of float[numChannels] (r00, g00, b00, r10, g10, b10, r01, g01, b01, r11, g11, b11).
- input : [C x W x H x N] or ARRAY[1..N] OF ARRAY[1..H] OF ARRAY[1..W] OF ARRAY[1..C]
- output : [C' x W' x H' x N] or ARRAY[1..N] OF ARRAY[1..H'] OF ARRAY[1..W'] OF ARRAY[1..C']
- filter : [C' x W" x H" x C] or ARRAY[1..C] OF ARRAY[1..H"] OF ARRAY[1..W"] OF ARRAY[1..C']
cudnn ("CHW") mode (GPU only): Channels are planes (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).
- input : [W x H x C x N] or ARRAY[1..N] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]
- output : [W' x H' x C' x N] or ARRAY[1..N] OF ARRAY[1..C'] OF ARRAY[1..H'] OF ARRAY[1..W']
- filter : [W" x H" x C x C'] or ARRAY[1..C'] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]

where:

using ' for output and " for filter
N = samples
W, H = width, height (W', H' for output, W", H" for kernel)
C = input channels
- 3 for color images, 1 for B&W images
- for hidden layer: dimension of activation vector for each pixel
C' = output channels = dimension of activation vector for each pixel