This blog series is for learning Tensorflow and Python Of , Because it's a rookie , If there are any mistakes, please point out .

One . Training data preparation

In training data preparation , It mainly includes the following three parts ：

* How to analyze the training method of vehicle detection KITTI data set
* How to expand data to increase the diversity of training data
* How to supply model in training stage batch Training data

First to KITTI Official website http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d
<http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d>

*

*

*
<http://kitti.is.tue.mpg.de/kitti/devkit_object.zip>

KITTI The data set adopts the form of one picture corresponding to one annotation file , Where the annotation file is TXT format , The content is N That's ok 15 column , Space each column . this 15 The contents of the column are ：

Column number name describe
1 category Target category , common 8 class ：’Car’, ‘Van’, ‘Truck’,’Pedestrian’, ‘Person_sitting’, ‘Cyclist’,
‘Tram’, ‘Misc’ perhaps ‘DontCare’
2 Whether there is truncation Whether the target is beyond the image boundary ,0： (non-truncated), 1： (truncated)
3 Occlusion 0 = fully visible, 1 = partly occluded 2 = largely occluded, 3 = unknown
4 Target observation angle Range [-pi..pi]
5-8 target bbox Coordinates from 0 start ,[left, top, right, bottom]
9-11 3D dimension 3D object dimensions: height, width, length (in meters)
12-14 3D Spatial coordinates D object location x,y,z in camera coordinates (in meters)
15 Y Rotation angle Rotation ry around Y-axis in camera coordinates [-pi..pi]
16 confidence score Only for Test, Floating point , For drawing p/r curve
remarks ：
‘DontCare’ Indicates an unmarked area ignored , This may be because the laser scanner is out of range . When testing , Results in this section are automatically ignored . This part can also be ignored during training , To prevent the continuous occurrence of Hard
Mining operation .

Because only vehicle detection is carried out here , So we only focus on category and BBox information . in addition , take ’Car’, ‘Van’,
‘Truck’ this 3 Class merge to positive sample target , Rest as background area .

first , We need to read each annotation file in batch ：
# readKITTI.py For parsing KITTI data set import os # Gets the list of files with the specified suffix def get_filelist(path,ext)
: # Get all files under a folder filelist_temp = os.listdir(path) filelist = [] #
By comparing suffixes , Check all TXT Annotation file for i in filelist_temp: if os.path.splitext(i) == ext:
filelist.append(os.path.splitext(i)) return filelist # Resolves the annotation file and returns the bounding
box information , dimension Nx4 def get_bbox(filename): bbox = [] # Judge whether the file exists if
os.path.exists(filename):with open(filename) as fi: label_data = fi.readlines()
# Read the annotation information of each line in turn for l in label_data: data = l.split() # Record if vehicle target exists bounding box if
data in ['Van','Car','Truck']: bbox.append((float(data),float(data),
float(data),float(data))) return bbox # Batch access to annotation files bounding box information def
get_bboxlist(rootpath,imagelist): bboxlist = [] for i in imagelist:
bboxlist.append(get_bbox(rootpath + i +'.txt')) return bboxlist
By calling the above function , We can read it KITTI Dataset is the form we need ：
import readKITTI IMAGE_DIR = './image/training/image_2/' LABEL_DIR =
2. Data expansion

In the training process of deep learning model , Data expansion (Data Augmentation) Usually used . among , Random scaling , Cutting , random invert
Should be the most widely used and effective means .（ As for contrast adjustment , Color adjustment ,PCA These things , It's hard to say .）

For target detection , It's very important ： While adjusting the image , We also need to ensure the goal bounding box Validity and correctness of .

zoom

It can be used for subsequent model training Batch, Usually we fix the input image to the same size , So the image resize It is necessary to adjust the color uniformity .
# imAugment.py Provide some functions for data expansion import cv2 # Scale the image to the specified size , Simultaneous processing boundingbox And color information
def imresize(in_img,in_bbox,out_w,out_h,is_color = True): # Determine whether it is a string if
isinstance(in_img,str): in_img = cv2.imread(in_img)# Get image width and height height, width =
in_img.shape[:2] out_img = cv2.resize(in_img,(out_w, out_h)) # Adjust image color if
is_color ==True and in_img.ndim == 2 : out_img = cv2.cvtColor(out_img,
cv2.COLOR_GRAY2BGR)elif is_color == False and in_img.ndim == 3 : out_img =
cv2.cvtColor(out_img, cv2.COLOR_BGR2GRAY)# adjustment bounding box s_h = out_h / height
s_w = out_w / width out_bbox = []for i in in_bbox: out_bbox.append((i*s_w, i[
1]*s_h, i*s_w, i*s_h)) return out_img, out_bbox
Flip horizontally

For vehicle detection , No need to flip vertically , We're only doing a horizontal flip here , And the corresponding flip bounding box.
# imAugment.py Provide some functions for data expansion # Flip the image horizontally , Simultaneous processing boundingbox def immirror
(in_img,in_bbox): # Determine whether it is a string if isinstance(in_img,str): in_img =
cv2.imread(in_img)# Image flip horizontally out_img = cv2.flip(in_img,1) # Get image width width =
out_img.shape # Relocate the target on the flipped image out_bbox = [] for i in in_bbox:
out_bbox.append((width - i, i, width-i, i)) return out_img, out_bbox
Random crop

In fact, there are many constraints and precautions in random cutting , There are mainly the following points ：

* The minimum crop block size needs to be specified . Otherwise, if the cutting block is too small , Not for training .
* Too small an image should no longer be cropped
* Because we can't accurately describe whether a cut target is an effective and recognizable target , So our crop region should include all the bounding box. #
imAugment.py Provide some functions for data expansion import random # Randomize images crop, Simultaneous processing boundingbox,
min_wh by crop Minimum width and height of block def imcrop(in_img,in_bbox,min_hw): # Determine whether it is a string if
isinstance(in_img,str): in_img = cv2.imread(in_img)# Get image width and height height, width =
in_img.shape[:2] # If the image is too small , Give up crop if height <= min_hw and width <= min_hw:
return in_img, in_bbox # In order to prevent effective targets from being crop truncation ,crop Scope should include all objectives # Look for the smallest rectangle with all targets
min_x1, min_y1, min_x2, min_y2 = width-1, height-1, 0, 0 for i in in_bbox:
min_x1 = min(min_x1,int(i)) min_y1 = min(min_y1,int(i)) min_x2 =
max(min_x2,int(i)) min_y2 = max(min_y2,int(i)) #
According to the minimum bounding box , Then randomly generate a rectangle , And prevent the frame from exceeding the image range rand_x1, rand_y1, rand_x2, rand_y2 = 0, 0, width,
heightif min_x1 <= 1: rand_x1 = 0 else: rand_x1 = random.randint(0
,min(min_x1,max(width - min_hw,1))) if min_y1 <= 1: rand_y1 = 0 else: rand_y1 =
random.randint(0,min(min_y1,max(height - min_hw,1))) if min_x2 >= width or
rand_x1 + min_hw >= width: rand_x2 = widthelse: rand_x2 =
random.randint(max(rand_x1+min_hw,min_x2),width)if min_y2 >= height or rand_y1
+ min_hw >= height: rand_y2 = heightelse: rand_y2 =
random.randint(max(rand_y1+min_hw,min_y2),height)# crop image out_img =
in_img[rand_y1:rand_y2-1,rand_x1:rand_x2-1] # handle bounding box out_bbox = [] for i
in in_bbox: out_bbox.append((i-rand_x1,i-rand_y1,i-rand_x1,i[3
]-rand_y1))return out_img, out_bbox
Here is the rendering ：（ The top is the original picture , The following is a horizontal flip , Zoom and random crop ）

3. Batch generate

In the training stage, we need to generate one by one batch For training , Generally, the required parameter settings include ：batchsize, Size of training picture , colour , whether shuffle data , Random or not crop etc. . Based on this , Here is a supply batch Code of ：
# genBatch.py Used to provide training data in training stage # coding=utf-8 import random import readKITTI import
imAugmentimport cv2 class genBatch: image_dir, label_dir = [], [] image_list,
bbox_list = [], [] initOK =False # Initialize read data def initdata(self, imagedir,
labeldir): self.image_dir, self.label_dir = imagedir, labeldir self.image_list
readKITTI.get_bboxlist(labeldir,self.image_list)# If the data is not empty and the picture and label Quantity matches if
len(self.image_list) >0 and len(self.image_list) == len(self.bbox_list):
self.initOK =True else: print("The amount of images is %d, while the amount of"
"corresponding label is %d"%(len(self.image_list),len(self.bbox_list)))
self.initOK =False return self.initOK readPos = 0 # Generate a new batch def genbatch
(self,batchsize,newh,neww,iscolor=True,isshuffle=False, mirrorratio=0.0,
cropratio=0.0): if self.initOK == False: print("The initdata() function must be
successfully called first.") return [] batch_data, batch_bbox = [], [] for i in
range(batchsize):# When the data is traversed if self.readPos >= len(self.image_list)-1:
self.readPos =0 if isshuffle == True: # Specify the same random seed , Guarantee pictures and label In the same disordered order r_seed =
random.random() random.seed(r_seed) random.shuffle(self.image_list)
random.seed(r_seed) random.shuffle(self.bbox_list) img =