Multi-oriented and Scale Invariant License Plate Detection Based on Convolutional Neural Networks

Jing Han, Jian Yao*, Jiao Zhao, Jingmin Tu and Yahui Liu

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei, P.R.China

*EMail: jian.yao@whu.edu.cn

*Web: http://www.scholat.com/jianyao http://cvrs.whu.edu.cn

1. Abstract

License plate detection (LPD) is the first and key step in license plate recognition. State-of-the-art object detection algorithms based on deep learning provide a promising way for LPD. However, there still exist two main challenges. First, existing methods often enclose objects by horizontal rectangles. However, horizontal rectangles are not always suitable since license plates in images are multi-oriented reflected by rotation and perspective distortion. Second, the scale of license plates often varies, leading to the difficulty of multi-scale detection. To address the aforementioned problems, we propose a novel method of multi-oriented and scale invariant license plate detection (MOSI-LPD) based on convolutional neural networks. Our MOSI-LPD tightly encloses the multi-oriented license plates with bounding parallelograms, regardless of the license plate scales. To obtain bounding parallelograms, we first parameterize the edge points of license plates by relative positions. Next, we design mapping functions between oriented regions and horizontal proposals. Then, we enforce the symmetry constraint in the loss function and train the model with a multi-task loss. Finally, we map region proposals to three edge points of a near-by license plate, and infer the fourth point to form bounding parallelograms. To achieve scale-invariance, we first design anchor boxes based on inherent shapes of license plates. Next, we search different layers to generate region proposals with multiple scales. Finally, we up-sample the last layer and combine proposal features extracted from different layers to recognize true license plates. Experimental results have demonstrated that the proposed method outperforms existing approaches in terms of detecting license plates with different orientations and multiple scales.

2. Approach

2.1. Overall Structure

The overall structure of our multi-oriented and scale-invariant license plate detection (MOSI-LPD) is illustrated in Fig. 1.

Fig. 1. The overall structure of our MOSI-LPD.

Our MOSI-LPD follows the basic structure of Faster R-CNN, the classic region-based deep learning network for object detection. It consists of two sub-networks: (1) a region proposal network (RPN) generating proposals that probably contain a license plate; (2) a detection network classifying the proposals and regressing license plate positions. The two sub-networks share the fundamental convolutional neural network (CNN) structure.

To achieve multi-oriented and scale invariant detection, several vital modifications are proposed. For the RPN sub-network, license plate proposals are generated on both the ``Conv5'' layer and the ``Conv4'' layer to combine and produce stronger proposals. The anchor boxes are set based on the priori knowledge regarding the inherent shapes of license plates. For the detection sub-network, RoI pooling is conducted on the up-sampled ``Conv5'' layer, and features extracted from the ``Conv4'' layer are added to facilitate multi-scale detection. We estimate three edge points of the license plates by regressing relative positions from horizontal proposals. The fourth edge point is inferred based on the symmetry constraint to form final bounding parallelograms that tightly enclose the multi-oriented license plates.

2.2. Multi-oriented Detection Based on Bounding Parallelograms

We propose novel strategies to tightly enclose the multi-oriented license plates with bounding parallelograms.

We first reformat edge point coordinates of the license plates. For each license plate, we reformulate the coordinates of three edge points by their relative positions to the central point. Next, we design mapping functions to regress oriented regions from horizontal proposals. The central point of a parallelogram is mapped from the central point of a proposal via scale-invariant translations, while relative positions of edge points are mapped from the proposal side lengths via log space translations. Then, we enforce the symmetry constraint in the loss function by adding a symmetry loss defined as the sum of the diagonal relative positions. Finally, we train the model by minimizing the multi-task loss consisting of clssification loss, regression loss and symmetry loss.

In the inference stage, license plate proposals raised by RPN are transformed by the learned mapping functions to three edge points of a near-by parallelogram. The fourth edge point is inferred based on the symmetry property to form final bounding parallelograms.

2.3. Scale Invariant Detection

We design effective strategies to detect license plates with multiple scales.

Firstly, in the RPN sub-network, priori knowledge is considered and multiple layers are exploited to generate better license plate proposals. We analyze the scale ranges and aspect ratios of the license plates, and set anchor boxes based on the statistic results. The priori knowledge regarding inherent license plate shapes enables the proposals to better match the license plates. In addition, we search license plate proposals on multiple output layers rather than only on the last convolutional layer. The results with complementary scales are combined to produce stronger proposals for the license plates with multiple scales.

Secondly, in the detection sub-network, different convolutional layers with various resolutions are also exploited to extract better features. We up-sample the “Conv5” layer to the size of the “Conv4” layer via the deconvolution operation, and combine the up-sampled layer “Conv5-2x” with the “Conv4” layer. For the feature extraction of the raised proposals, we perform RoI pooling on the combined layers, which shows better performance than traditional method that projects the proposals only to the last “Conv5” layer.

3. Experimental Results

We have conducted a series of experiments to evaluate the proposed MOSI-LPD. The same experiments were also performed on corresponding baseline models for comparison.

3.1. Overall Performance

We randomly sampled 10,000 images from all test data containing license plates with different orientations and multiple scales. Our MOSI-LPD was tested on the test subset for a brief overview on its performance. The representative detection results are shown in Fig. 2.

Fig. 2. Representative detection results of our MOSI-LPD: (a) results on license plates with different orientations (skewing violently, modestly and slightly for each row); (b) results on license plates with multiple scales (tiny, medium and large in scale for each row); (c) results on special or low resolution license plates in the first row, and scarce cases of mistaking or missing of license plates (indicated by yellow ellipses) in the second row.

For in-depth statistic analysis, we calculate precision, recall, f-measure and average IoU of the detection results. Table. 1 reports the comparison between performances of our MOSI-LPD and some state-of-the-art LPD methods:

(1) Traditional method based on boundary features and color features, which is denoted as BOCO-LPD;

(2) Backbone region-based deep learning method of Faster R-CNN.

Table. 1. Comparison between our MOSI-LPD, BOCO-LPD and Faster R-CNN.

3.2. Multi-oriented Detection Based on Bounding Parallelograms

We constructed three test subsets based on the skew degrees of license plates. The first “Slight” subset contained license plates that were skewed within 5 degrees. The second “Modest” subset had license plates skewed between 5 and 25 degrees while license plates in the last “Severe” subset were skewed over 25 degrees.These subsets were applied to evaluate the proposed strategy for multi-oriented detection based on bounding parallelograms. The representative detection results of our MOSI-LPD, the backbone Faster R-CNN and two state-of-the-art multi-oriented text detection methods (RRPN and TextBoxes++) are shown in Fig. 3. The statistic analysis is reported in Table. 2.

Fig. 3. Representative detection results: (a) MO-LPD; (b): Faster R-CNN; (c) RRPN; (d):TextBoxes++. In each sub-figure, license plates in the first to the last row were severely, modestly and slightly skewed, respectively.

Table. 2. Comparison between MO-LPD, Faster R-CNN, RRPN and TextBoxes++ on the “Slight”, “Modest” and “Severe” test subsets.

3.3. Scale Invariant Detection

We built three test subsets according to the scales of license plates. The first “Tiny” subset contained license plates smaller than 300 pixels and the second “Medium” subset had license plates between 300 to 1,200 pixels, while license plates in the last “Large” subset were bigger than 1,200 pixels. These subsets were utilized to evaluate the proposed strategy for scale invariant detection. The representative detection results of our MOSI-LPD and MO-LPD are shown in Fig. 4. The statistic analysis is reported in Table. 3.

Fig. 4. Representative detection results: (a) our MOSI-LPD; (b) MO-LPD. In each sub-figure, license plates in the first to the last row were tiny, medium and large, respectively.

Table. 3. Comparison between our MOSI-LPD and MO-LPD on the “Tiny”, “Medium” and “Large” test subsets.

3.4. Robustness

We manually blurred and added noise to the test images. Our MOSI-LPD was tested on these data to evaluate its robustness to challenging conditions. The representative detection results are shown in Fig. 5. The statistic analysis is reported in Table. 4.

Fig. 5. Representative detection results of our MOSI-LPD on challenging data: (a) performance on blurred images; (b) performance on images with noise.

Table. 4. Comparison between performances on original data and challenging data.

3.5. Detection Speed

We evaluated the detection speed of our MOSI-LPD on various test subsets. We recorded the average time costs of the shared fundamental convolution neural network (Conv), unshared structure of the region proposal network (Proposal) and unshared structure of the detection sub-network (Detection), respectively. Based on these statistics, we further reported the average detection time of the overall system. The comparison with the backbone Faster R-CNN is reported in Table. 5.

Table. 5. The average time costs of our MOSI-LPD and backbone framwork Faster R-CNN on various test subsets (unit: second).

Dataset

We have collected more than 7,000 images containing license plates with different orientations and multiple scales. All the the license plates are manually labeled with the exact four edge points.

Click here to get the images: Images

Click here to get the annotations: Annotations

Note:

Images with filenames prefixed by "ne_" are negative samples containing objects similar to license plates (such as traffic signs, trademarks and banners).

The annotaion for each image is stored in the corresponding xml file, and the label for each license plate is stored in the "object" element. The "bndbox" sub-element gives coordinates of the horizontal bounding box, and the "structure" sub-element provides coordinates of the four edge points.

Citation

Jing Han, Jian Yao, Jiao Zhao, Jingmin Tu and Yahui Liu. “Multi-oriented and Scale Invariant License Plate Detection Based on Convolutional Neural Networks”*, submitted to Sensors, 2019.