This blog series is for learning Tensorflow and Python Of , Because it's a rookie , If there are any mistakes, please point out .

The whole project can be downloaded from Baidu cloud : link :
<> password :9p8v

One . Training data preparation

In training data preparation , It mainly includes the following three parts :

* How to analyze the training method of vehicle detection KITTI data set
* How to expand data to increase the diversity of training data
* How to supply model in training stage batch Training data
1. read KITTI data set

First to KITTI Official website
Download vehicle test data set .

Specifically , Just download the following 3 Compressed packages :( Email required for download link )

Download left color images of object data set (12 GB)

Download training labels of object data set (5 MB)

Download object development kit (1 MB)

KITTI The data set adopts the form of one picture corresponding to one annotation file , Where the annotation file is TXT format , The content is N That's ok 15 column , Space each column . this 15 The contents of the column are :

Column number name describe
1 category Target category , common 8 class :’Car’, ‘Van’, ‘Truck’,’Pedestrian’, ‘Person_sitting’, ‘Cyclist’,
‘Tram’, ‘Misc’ perhaps ‘DontCare’
2 Whether there is truncation Whether the target is beyond the image boundary ,0: (non-truncated), 1: (truncated)
3 Occlusion 0 = fully visible, 1 = partly occluded 2 = largely occluded, 3 = unknown
4 Target observation angle Range [-pi..pi]
5-8 target bbox Coordinates from 0 start ,[left, top, right, bottom]
9-11 3D dimension 3D object dimensions: height, width, length (in meters)
12-14 3D Spatial coordinates D object location x,y,z in camera coordinates (in meters)
15 Y Rotation angle Rotation ry around Y-axis in camera coordinates [-pi..pi]
16 confidence score Only for Test, Floating point , For drawing p/r curve
remarks :
‘DontCare’ Indicates an unmarked area ignored , This may be because the laser scanner is out of range . When testing , Results in this section are automatically ignored . This part can also be ignored during training , To prevent the continuous occurrence of Hard
Mining operation .

Because only vehicle detection is carried out here , So we only focus on category and BBox information . in addition , take ’Car’, ‘Van’,
‘Truck’ this 3 Class merge to positive sample target , Rest as background area .

first , We need to read each annotation file in batch :
# For parsing KITTI data set import os # Gets the list of files with the specified suffix def get_filelist(path,ext)
: # Get all files under a folder filelist_temp = os.listdir(path) filelist = [] #
By comparing suffixes , Check all TXT Annotation file for i in filelist_temp: if os.path.splitext(i)[1] == ext:
filelist.append(os.path.splitext(i)[0]) return filelist # Resolves the annotation file and returns the bounding
box information , dimension Nx4 def get_bbox(filename): bbox = [] # Judge whether the file exists if
os.path.exists(filename):with open(filename) as fi: label_data = fi.readlines()
# Read the annotation information of each line in turn for l in label_data: data = l.split() # Record if vehicle target exists bounding box if
data[0] in ['Van','Car','Truck']: bbox.append((float(data[4]),float(data[5]),
float(data[6]),float(data[7]))) return bbox # Batch access to annotation files bounding box information def
get_bboxlist(rootpath,imagelist): bboxlist = [] for i in imagelist:
bboxlist.append(get_bbox(rootpath + i +'.txt')) return bboxlist
By calling the above function , We can read it KITTI Dataset is the form we need :
import readKITTI IMAGE_DIR = './image/training/image_2/' LABEL_DIR =
'./label/training/label_2/' imagelist = readKITTI.get_filelist(IMAGE_DIR,'.png'
) bboxlist = readKITTI.get_bboxlist(LABEL_DIR,imagelist)
2. Data expansion

In the training process of deep learning model , Data expansion (Data Augmentation) Usually used . among , Random scaling , Cutting , random invert
Should be the most widely used and effective means .( As for contrast adjustment , Color adjustment ,PCA These things , It's hard to say .)

For target detection , It's very important : While adjusting the image , We also need to ensure the goal bounding box Validity and correctness of .


It can be used for subsequent model training Batch, Usually we fix the input image to the same size , So the image resize It is necessary to adjust the color uniformity .
# Provide some functions for data expansion import cv2 # Scale the image to the specified size , Simultaneous processing boundingbox And color information
def imresize(in_img,in_bbox,out_w,out_h,is_color = True): # Determine whether it is a string if
isinstance(in_img,str): in_img = cv2.imread(in_img)# Get image width and height height, width =
in_img.shape[:2] out_img = cv2.resize(in_img,(out_w, out_h)) # Adjust image color if
is_color ==True and in_img.ndim == 2 : out_img = cv2.cvtColor(out_img,
cv2.COLOR_GRAY2BGR)elif is_color == False and in_img.ndim == 3 : out_img =
cv2.cvtColor(out_img, cv2.COLOR_BGR2GRAY)# adjustment bounding box s_h = out_h / height
s_w = out_w / width out_bbox = []for i in in_bbox: out_bbox.append((i[0]*s_w, i[
1]*s_h, i[2]*s_w, i[3]*s_h)) return out_img, out_bbox
Flip horizontally

For vehicle detection , No need to flip vertically , We're only doing a horizontal flip here , And the corresponding flip bounding box.
# Provide some functions for data expansion # Flip the image horizontally , Simultaneous processing boundingbox def immirror
(in_img,in_bbox): # Determine whether it is a string if isinstance(in_img,str): in_img =
cv2.imread(in_img)# Image flip horizontally out_img = cv2.flip(in_img,1) # Get image width width =
out_img.shape[1] # Relocate the target on the flipped image out_bbox = [] for i in in_bbox:
out_bbox.append((width - i[0], i[1], width-i[2], i[3])) return out_img, out_bbox
Random crop

In fact, there are many constraints and precautions in random cutting , There are mainly the following points :

* The minimum crop block size needs to be specified . Otherwise, if the cutting block is too small , Not for training .
* Too small an image should no longer be cropped
* Because we can't accurately describe whether a cut target is an effective and recognizable target , So our crop region should include all the bounding box. # Provide some functions for data expansion import random # Randomize images crop, Simultaneous processing boundingbox,
min_wh by crop Minimum width and height of block def imcrop(in_img,in_bbox,min_hw): # Determine whether it is a string if
isinstance(in_img,str): in_img = cv2.imread(in_img)# Get image width and height height, width =
in_img.shape[:2] # If the image is too small , Give up crop if height <= min_hw and width <= min_hw:
return in_img, in_bbox # In order to prevent effective targets from being crop truncation ,crop Scope should include all objectives # Look for the smallest rectangle with all targets
min_x1, min_y1, min_x2, min_y2 = width-1, height-1, 0, 0 for i in in_bbox:
min_x1 = min(min_x1,int(i[0])) min_y1 = min(min_y1,int(i[1])) min_x2 =
max(min_x2,int(i[2])) min_y2 = max(min_y2,int(i[3])) #
According to the minimum bounding box , Then randomly generate a rectangle , And prevent the frame from exceeding the image range rand_x1, rand_y1, rand_x2, rand_y2 = 0, 0, width,
heightif min_x1 <= 1: rand_x1 = 0 else: rand_x1 = random.randint(0
,min(min_x1,max(width - min_hw,1))) if min_y1 <= 1: rand_y1 = 0 else: rand_y1 =
random.randint(0,min(min_y1,max(height - min_hw,1))) if min_x2 >= width or
rand_x1 + min_hw >= width: rand_x2 = widthelse: rand_x2 =
random.randint(max(rand_x1+min_hw,min_x2),width)if min_y2 >= height or rand_y1
+ min_hw >= height: rand_y2 = heightelse: rand_y2 =
random.randint(max(rand_y1+min_hw,min_y2),height)# crop image out_img =
in_img[rand_y1:rand_y2-1,rand_x1:rand_x2-1] # handle bounding box out_bbox = [] for i
in in_bbox: out_bbox.append((i[0]-rand_x1,i[1]-rand_y1,i[2]-rand_x1,i[3
]-rand_y1))return out_img, out_bbox
Here is the rendering :( The top is the original picture , The following is a horizontal flip , Zoom and random crop )

3. Batch generate

In the training stage, we need to generate one by one batch For training , Generally, the required parameter settings include :batchsize, Size of training picture , colour , whether shuffle data , Random or not crop etc. . Based on this , Here is a supply batch Code of :
# Used to provide training data in training stage # coding=utf-8 import random import readKITTI import
imAugmentimport cv2 class genBatch: image_dir, label_dir = [], [] image_list,
bbox_list = [], [] initOK =False # Initialize read data def initdata(self, imagedir,
labeldir): self.image_dir, self.label_dir = imagedir, labeldir self.image_list
= readKITTI.get_filelist(imagedir,'.png') self.bbox_list =
readKITTI.get_bboxlist(labeldir,self.image_list)# If the data is not empty and the picture and label Quantity matches if
len(self.image_list) >0 and len(self.image_list) == len(self.bbox_list):
self.initOK =True else: print("The amount of images is %d, while the amount of"
"corresponding label is %d"%(len(self.image_list),len(self.bbox_list)))
self.initOK =False return self.initOK readPos = 0 # Generate a new batch def genbatch
(self,batchsize,newh,neww,iscolor=True,isshuffle=False, mirrorratio=0.0,
cropratio=0.0): if self.initOK == False: print("The initdata() function must be
successfully called first.") return [] batch_data, batch_bbox = [], [] for i in
range(batchsize):# When the data is traversed if self.readPos >= len(self.image_list)-1:
self.readPos =0 if isshuffle == True: # Specify the same random seed , Guarantee pictures and label In the same disordered order r_seed =
random.random() random.seed(r_seed) random.shuffle(self.image_list)
random.seed(r_seed) random.shuffle(self.bbox_list) img =
cv2.imread(self.image_dir + self.image_list[self.readPos] +'.png') bbox =
self.bbox_list[self.readPos] self.readPos +=1 # According to the specified probability crop, Remember that tailoring should happen at resize before
if cropratio > 0 and random.random() < cropratio: img, bbox =
imAugment.imcrop(img,bbox,min(neww,newh))# Adjust image size and color img, bbox =
imAugment.imresize(img,bbox,neww,newh,iscolor)# Random image according to specified probability if mirrorratio > 0
and random.random() < mirrorratio: img, bbox = imAugment.immirror(img,bbox)
batch_data.append(img) batch_bbox.append(bbox)return batch_data, batch_bbox