🖼️ Image Formats
2022-09-03
Image Formats¶
Introduction for the various formats in reading an image and then use it in tensorrt/Cudnn.
- toc: true
- comments: true
- layout: post
- categories: [OpenCV, PIL, TensorRT, Python]
- image: images/trtlogo.png
- author: Prince
In [37]:
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
OpenCV¶
To read a sample image, and reading its properties like height, width, and channels.
- Reads in BGR format
shapeis for finding the size of image- rows, column, and channel are returned from
shape - rows means height, column means width, channels : 3 for color or 1 for Grayscale
In [38]:
img = cv2.imread("image_formats/test.png")
In [39]:
Row,Column,Channel = img.shape
In [40]:
print("Rows = ", Row, "Columns = ", Column)
Rows = 534 Columns = 800
In [41]:
print("Height = ", Row, "Width = ", Column, "Channels = ", Channel)
Height = 534 Width = 800 Channels = 3
PIL¶
- Reads in RGB format
sizeis for finding the size of image- column and rows are returned from
size - rows means height, column means width
- mode is string which defines the type and depth of a pixel in image
In [42]:
img_pil = Image.open("image_formats/test.png")
- 1 (1-bit pixels, black and white, stored with one pixel per byte)
- L (8-bit pixels, black and white)
- RGB (3x8-bit pixels, true color)
- RGBA (4x8-bit pixels, true color with transparency mask)
- CMYK (4x8-bit pixels, color separation)
- HSV (3x8-bit pixels, Hue, Saturation, Value color space)
In [48]:
img_pil.mode
Out[48]:
'RGB'
In [49]:
Column,Row = img_pil.size
In [50]:
print("Rows = ", Row, "Columns = ", Column)
Rows = 534 Columns = 800
Numpy¶
Using OpenCV¶
In [46]:
png_np_img = np.asarray(img)
plt.imshow(png_np_img)
# if its grayscale plt.imshow(png_np_img, cmap='gray')
# Showing the np characteritics
print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element
shape is (534, 800, 3) dtype is uint8 ndim is 3 itemsize is 1 nbytes is 1281600
Using PIL¶
In [47]:
png_np_img = np.asarray(img_pil)
plt.imshow(png_np_img) # this will graphit in a jupyter notebook
# or if its grayscale plt.imshow(png_np_img, cmap='gray')
# FWIW, this will show the np characteritics
print("shape is ", png_np_img.shape)
print("dtype is ", png_np_img.dtype)
print("ndim is ", png_np_img.ndim)
print("itemsize is ", png_np_img.itemsize) # size in bytes of each array element
print("nbytes is ", png_np_img.nbytes) # size in bytes of each array element
shape is (534, 800, 3) dtype is uint8 ndim is 3 itemsize is 1 nbytes is 1281600
Access Pixel¶
- Using GIMP, the coordinate system (x,y) is defined as below
- At (600,100) in GIMP we have following R,G,B values (21,131,172)
Using OpenCV¶
- Uses BGR format
- Accesses using row first, then column
- The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)
In [92]:
row = 100
col = 600
pixel_b, pixel_g, pixel_r = img[row][col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values = 172 131 21
Using PIL¶
- Uses RGB format
- Accesses using column first, then row
- The format (X,Y) -> (600,100) or (COL,ROW) -> (600,100)
In [93]:
row = 100
col = 600
pixel_r, pixel_g, pixel_b = img_pil.getpixel((col,row))
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values = 172 131 21
Using Numpy¶
- Uses RGB/BGR depending on input
- Accesses using row first, then column for all formats
- The format (X,Y) -> (600,100) or (ROW,COL) -> (100,600)
In [101]:
# OpenCV
row = 100
col = 600
pixel_b, pixel_g, pixel_r = np.array(img)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values = 172 131 21
In [102]:
# PIL
row = 100
col = 600
pixel_r, pixel_g, pixel_b = np.array(img_pil)[row,col]
print("BGR values = ", pixel_b, pixel_g, pixel_r)
BGR values = 172 131 21
Channels¶
Using OpenCV¶
In [63]:
b,g,r = cv2.split(img)
In [64]:
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
Out[64]:
<matplotlib.image.AxesImage at 0x15f9b7f70>
Using PIL¶
In [69]:
r,g,b = img_pil.split()
#r = img.getchannel(0)
#g = img.getchannel(1)
#b = img.getchannel(2)
In [70]:
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(img_pil)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
Out[70]:
<matplotlib.image.AxesImage at 0x15fc4b880>
Using Numpy¶
In [75]:
# PIL
arr = np.array(img_pil)
r = arr[:,:,0]
g = arr[:,:,1]
b = arr[:,:,2]
In [76]:
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
Out[76]:
<matplotlib.image.AxesImage at 0x16000c8e0>
In [77]:
# OpenCV
arr = np.array(img)
b = arr[:,:,0]
g = arr[:,:,1]
r = arr[:,:,2]
In [78]:
f, axarr = plt.subplots(2,2)
axarr[0,0].imshow(arr)
axarr[0,1].imshow(b)
axarr[1,0].imshow(g)
axarr[1,1].imshow(r)
Out[78]:
<matplotlib.image.AxesImage at 0x160146fd0>
Arrangement¶
Column major matrix¶
- Height X Width X Channel
- Each sample is stored as a column-major matrix (height, width) of float[Channels] (b00, g00, r00, b10, g10, r10, b01, g01, r01, b11, g11, r11).
In [ ]:
Offset (byte) : 0 1 2 3 4 5 6 7 8 ...29 30 31 32 33 34 35 36 37 ...
Height Pos : 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 1 1 1 1 1 ...
Width Pos : 0 0 0 1 1 1 2 2 2 ... 9 9 9 0 0 0 1 1 1 ...
Color Index : B G R B G R B G R ... B G R B G R B G R ...
Row major matrix¶
- Channel X Height X Width
- Each sample is stored as a row-major matrix of float[Channels] of (height, width) (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).
In [ ]:
Offset (byte) : 0 1 2 3 ... 9 10 11 12 13 ...90 91 92 93 ... 99 100 ... 199 200 ... 299
Color Index : B B B B ... B B B B B ... B B B B ... B G ... G R ... R
Height Pos : 0 0 0 0 ... 0 0 0 0 0 ... 9 9 9 9 ... 9 0 ... 9 0 ... 9
Width Pos : 0 1 2 3 ... 9 0 1 2 3 ... 0 1 2 3 ... 9 0 ... 9 0 ... 9
Tensorrt Cudnn Format¶
legacy ("HWC") mode (CPU and GPU without cudnn): Channels are tuples of scalars
Each sample is stored as a column-major matrix (height, width) of float[numChannels] (r00, g00, b00, r10, g10, b10, r01, g01, b01, r11, g11, b11).
- input : [C x W x H x N] or ARRAY[1..N] OF ARRAY[1..H] OF ARRAY[1..W] OF ARRAY[1..C]
- output : [C' x W' x H' x N] or ARRAY[1..N] OF ARRAY[1..H'] OF ARRAY[1..W'] OF ARRAY[1..C']
- filter : [C' x W" x H" x C] or ARRAY[1..C] OF ARRAY[1..H"] OF ARRAY[1..W"] OF ARRAY[1..C']
cudnn ("CHW") mode (GPU only): Channels are planes (b00, b10, b01, b11, g00, g10, g01, g11, r00, r10, r01, r11).
- input : [W x H x C x N] or ARRAY[1..N] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]
- output : [W' x H' x C' x N] or ARRAY[1..N] OF ARRAY[1..C'] OF ARRAY[1..H'] OF ARRAY[1..W']
- filter : [W" x H" x C x C'] or ARRAY[1..C'] OF ARRAY[1..C] OF ARRAY[1..H] OF ARRAY[1..W]
where:
- using ' for output and " for filter
- N = samples
- W, H = width, height (W', H' for output, W", H" for kernel)
- C = input channels
- 3 for color images, 1 for B&W images
- for hidden layer: dimension of activation vector for each pixel
- C' = output channels = dimension of activation vector for each pixel