This blog series is for learning Tensorflow and Python Of , Because it's a rookie , If there are any mistakes, please point out .

The whole project can be downloaded from Baidu cloud : link :https://pan.baidu.com/s/1f2JPJpE7m5M2kSifMP0-Lw
<https://pan.baidu.com/s/1f2JPJpE7m5M2kSifMP0-Lw> password :9p8v

One . Training data preparation

In training data preparation , It mainly includes the following three parts :

* How to analyze the training method of vehicle detection KITTI data set
* How to expand data to increase the diversity of training data
* How to supply model in training stage batch Training data
1. read KITTI data set

First to KITTI Official website http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d
<http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d>
Download vehicle test data set .

Specifically , Just download the following 3 Compressed packages :( Email required for download link )

*
Download left color images of object data set (12 GB)
<http://www.cvlibs.net/download.php?file=data_object_image_2.zip>

*
Download training labels of object data set (5 MB)
<http://www.cvlibs.net/download.php?file=data_object_label_2.zip>

*
Download object development kit (1 MB)
<http://kitti.is.tue.mpg.de/kitti/devkit_object.zip>

KITTI The data set adopts the form of one picture corresponding to one annotation file , Where the annotation file is TXT format , The content is N That's ok 15 column , Space each column . this 15 The contents of the column are :

Column number name describe
1 category Target category , common 8 class :’Car’, ‘Van’, ‘Truck’,’Pedestrian’, ‘Person_sitting’, ‘Cyclist’,
‘Tram’, ‘Misc’ perhaps ‘DontCare’
2 Whether there is truncation Whether the target is beyond the image boundary ,0: (non-truncated), 1: (truncated)
3 Occlusion 0 = fully visible, 1 = partly occluded 2 = largely occluded, 3 = unknown
4 Target observation angle Range [-pi..pi]
5-8 target bbox Coordinates from 0 start ,[left, top, right, bottom]
9-11 3D dimension 3D object dimensions: height, width, length (in meters)
12-14 3D Spatial coordinates D object location x,y,z in camera coordinates (in meters)
15 Y Rotation angle Rotation ry around Y-axis in camera coordinates [-pi..pi]
16 confidence score Only for Test, Floating point , For drawing p/r curve
remarks :
‘DontCare’ Indicates an unmarked area ignored , This may be because the laser scanner is out of range . When testing , Results in this section are automatically ignored . This part can also be ignored during training , To prevent the continuous occurrence of Hard
Mining operation .

Because only vehicle detection is carried out here , So we only focus on category and BBox information . in addition , take ’Car’, ‘Van’,
‘Truck’ this 3 Class merge to positive sample target , Rest as background area .

first , We need to read each annotation file in batch :
# readKITTI.py For parsing KITTI data set import os # Gets the list of files with the specified suffix def get_filelist(path,ext)
: # Get all files under a folder filelist_temp = os.listdir(path) filelist = [] #
By comparing suffixes , Check all TXT Annotation file for i in filelist_temp: if os.path.splitext(i)[1] == ext:
filelist.append(os.path.splitext(i)[0]) return filelist # Resolves the annotation file and returns the bounding
box information , dimension Nx4 def get_bbox(filename): bbox = [] # Judge whether the file exists if
os.path.exists(filename):with open(filename) as fi: label_data = fi.readlines()
# Read the annotation information of each line in turn for l in label_data: data = l.split() # Record if vehicle target exists bounding box if
data[0] in ['Van','Car','Truck']: bbox.append((float(data[4]),float(data[5]),
float(data[6]),float(data[7]))) return bbox # Batch access to annotation files bounding box information def
get_bboxlist(rootpath,imagelist): bboxlist = [] for i in imagelist:
bboxlist.append(get_bbox(rootpath + i +'.txt')) return bboxlist
By calling the above function , We can read it KITTI Dataset is the form we need :
import readKITTI IMAGE_DIR = './image/training/image_2/' LABEL_DIR =
'./label/training/label_2/' imagelist = readKITTI.get_filelist(IMAGE_DIR,'.png'
) bboxlist = readKITTI.get_bboxlist(LABEL_DIR,imagelist)
2. Data expansion

In the training process of deep learning model , Data expansion (Data Augmentation) Usually used . among , Random scaling , Cutting , random invert
Should be the most widely used and effective means .( As for contrast adjustment , Color adjustment ,PCA These things , It's hard to say .)

For target detection , It's very important : While adjusting the image , We also need to ensure the goal bounding box Validity and correctness of .

zoom

It can be used for subsequent model training Batch, Usually we fix the input image to the same size , So the image resize It is necessary to adjust the color uniformity .
# imAugment.py Provide some functions for data expansion import cv2 # Scale the image to the specified size , Simultaneous processing boundingbox And color information
def imresize(in_img,in_bbox,out_w,out_h,is_color = True): # Determine whether it is a string if
isinstance(in_img,str): in_img = cv2.imread(in_img)# Get image width and height height, width =
in_img.shape[:2] out_img = cv2.resize(in_img,(out_w, out_h)) # Adjust image color if
is_color ==True and in_img.ndim == 2 : out_img = cv2.cvtColor(out_img,
cv2.COLOR_GRAY2BGR)elif is_color == False and in_img.ndim == 3 : out_img =
cv2.cvtColor(out_img, cv2.COLOR_BGR2GRAY)# adjustment bounding box s_h = out_h / height
s_w = out_w / width out_bbox = []for i in in_bbox: out_bbox.append((i[0]*s_w, i[
1]*s_h, i[2]*s_w, i[3]*s_h)) return out_img, out_bbox
Flip horizontally

For vehicle detection , No need to flip vertically , We're only doing a horizontal flip here , And the corresponding flip bounding box.
# imAugment.py Provide some functions for data expansion # Flip the image horizontally , Simultaneous processing boundingbox def immirror
(in_img,in_bbox): # Determine whether it is a string if isinstance(in_img,str): in_img =
cv2.imread(in_img)# Image flip horizontally out_img = cv2.flip(in_img,1) # Get image width width =
out_img.shape[1] # Relocate the target on the flipped image out_bbox = [] for i in in_bbox:
out_bbox.append((width - i[0], i[1], width-i[2], i[3])) return out_img, out_bbox
Random crop

In fact, there are many constraints and precautions in random cutting , There are mainly the following points :

* The minimum crop block size needs to be specified . Otherwise, if the cutting block is too small , Not for training .
* Too small an image should no longer be cropped
* Because we can't accurately describe whether a cut target is an effective and recognizable target , So our crop region should include all the bounding box. #
imAugment.py Provide some functions for data expansion import random # Randomize images crop, Simultaneous processing boundingbox,
min_wh by crop Minimum width and height of block def imcrop(in_img,in_bbox,min_hw): # Determine whether it is a string if
isinstance(in_img,str): in_img = cv2.imread(in_img)# Get image width and height height, width =
in_img.shape[:2] # If the image is too small , Give up crop if height <= min_hw and width <= min_hw:
return in_img, in_bbox # In order to prevent effective targets from being crop truncation ,crop Scope should include all objectives # Look for the smallest rectangle with all targets
min_x1, min_y1, min_x2, min_y2 = width-1, height-1, 0, 0 for i in in_bbox:
min_x1 = min(min_x1,int(i[0])) min_y1 = min(min_y1,int(i[1])) min_x2 =
max(min_x2,int(i[2])) min_y2 = max(min_y2,int(i[3])) #
According to the minimum bounding box , Then randomly generate a rectangle , And prevent the frame from exceeding the image range rand_x1, rand_y1, rand_x2, rand_y2 = 0, 0, width,
heightif min_x1 <= 1: rand_x1 = 0 else: rand_x1 = random.randint(0
,min(min_x1,max(width - min_hw,1))) if min_y1 <= 1: rand_y1 = 0 else: rand_y1 =
random.randint(0,min(min_y1,max(height - min_hw,1))) if min_x2 >= width or
rand_x1 + min_hw >= width: rand_x2 = widthelse: rand_x2 =
random.randint(max(rand_x1+min_hw,min_x2),width)if min_y2 >= height or rand_y1
+ min_hw >= height: rand_y2 = heightelse: rand_y2 =
random.randint(max(rand_y1+min_hw,min_y2),height)# crop image out_img =
in_img[rand_y1:rand_y2-1,rand_x1:rand_x2-1] # handle bounding box out_bbox = [] for i
in in_bbox: out_bbox.append((i[0]-rand_x1,i[1]-rand_y1,i[2]-rand_x1,i[3
]-rand_y1))return out_img, out_bbox
Here is the rendering :( The top is the original picture , The following is a horizontal flip , Zoom and random crop )



3. Batch generate


In the training stage, we need to generate one by one batch For training , Generally, the required parameter settings include :batchsize, Size of training picture , colour , whether shuffle data , Random or not crop etc. . Based on this , Here is a supply batch Code of :
# genBatch.py Used to provide training data in training stage # coding=utf-8 import random import readKITTI import
imAugmentimport cv2 class genBatch: image_dir, label_dir = [], [] image_list,
bbox_list = [], [] initOK =False # Initialize read data def initdata(self, imagedir,
labeldir): self.image_dir, self.label_dir = imagedir, labeldir self.image_list
= readKITTI.get_filelist(imagedir,'.png') self.bbox_list =
readKITTI.get_bboxlist(labeldir,self.image_list)# If the data is not empty and the picture and label Quantity matches if
len(self.image_list) >0 and len(self.image_list) == len(self.bbox_list):
self.initOK =True else: print("The amount of images is %d, while the amount of"
"corresponding label is %d"%(len(self.image_list),len(self.bbox_list)))
self.initOK =False return self.initOK readPos = 0 # Generate a new batch def genbatch
(self,batchsize,newh,neww,iscolor=True,isshuffle=False, mirrorratio=0.0,
cropratio=0.0): if self.initOK == False: print("The initdata() function must be
successfully called first.") return [] batch_data, batch_bbox = [], [] for i in
range(batchsize):# When the data is traversed if self.readPos >= len(self.image_list)-1:
self.readPos =0 if isshuffle == True: # Specify the same random seed , Guarantee pictures and label In the same disordered order r_seed =
random.random() random.seed(r_seed) random.shuffle(self.image_list)
random.seed(r_seed) random.shuffle(self.bbox_list) img =
cv2.imread(self.image_dir + self.image_list[self.readPos] +'.png') bbox =
self.bbox_list[self.readPos] self.readPos +=1 # According to the specified probability crop, Remember that tailoring should happen at resize before
if cropratio > 0 and random.random() < cropratio: img, bbox =
imAugment.imcrop(img,bbox,min(neww,newh))# Adjust image size and color img, bbox =
imAugment.imresize(img,bbox,neww,newh,iscolor)# Random image according to specified probability if mirrorratio > 0
and random.random() < mirrorratio: img, bbox = imAugment.immirror(img,bbox)
batch_data.append(img) batch_bbox.append(bbox)return batch_data, batch_bbox