ANGLE-PROXIMITY BASED TEXT BLOCK VERIFICATION METHOD FOR TEXT DETECTION IN SCENE IMAGES

BASAVANNA M.1*, SHIVAKUMARA P.2, SRIVATSA S.K.3, HEMANTHA KUMAR G.4
1Research Scholar, School of Computing Science, VELS University, Chennai-Tamil Nadu-India
2School of Computing, National University of Singapore, Singapore
3Senior Professor, St. Joseph College of Engineering, Chennai-Tamil Nadu-India
4Department of Studies in Computer Science, University of Mysore-Karnataka-India
* Corresponding Author : basavanna_m@yahoo.com

Received : 06-11-2011     Accepted : 09-12-2011     Published : 12-12-2011
Volume : 3     Issue : 4       Pages : 245 - 250
Int J Mach Intell 3.4 (2011):245-250
DOI : http://dx.doi.org/10.9735/0975-2927.3.4.245-250

Conflict of Interest : None declared
Acknowledgements/Funding : This research work is supported by University Grants Commission (UGC), New Delhi, India (F.No.UGC/MRP(s)/800/10-11/KAMY022).

Cite - MLA : BASAVANNA M., et al "ANGLE-PROXIMITY BASED TEXT BLOCK VERIFICATION METHOD FOR TEXT DETECTION IN SCENE IMAGES ." International Journal of Machine Intelligence 3.4 (2011):245-250. http://dx.doi.org/10.9735/0975-2927.3.4.245-250

Cite - APA : BASAVANNA M., SHIVAKUMARA P., SRIVATSA S.K., HEMANTHA KUMAR G. (2011). ANGLE-PROXIMITY BASED TEXT BLOCK VERIFICATION METHOD FOR TEXT DETECTION IN SCENE IMAGES . International Journal of Machine Intelligence, 3 (4), 245-250. http://dx.doi.org/10.9735/0975-2927.3.4.245-250

Cite - Chicago : BASAVANNA M., SHIVAKUMARA P., SRIVATSA S.K., and HEMANTHA KUMAR G. "ANGLE-PROXIMITY BASED TEXT BLOCK VERIFICATION METHOD FOR TEXT DETECTION IN SCENE IMAGES ." International Journal of Machine Intelligence 3, no. 4 (2011):245-250. http://dx.doi.org/10.9735/0975-2927.3.4.245-250

Copyright : © 2011, BASAVANNA M., et al, Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

Text block verification is important in enhancing the text detection accuracy in natural scene images because it is hard to develop general or objective heuristics to differentiate text and non-text block. In this paper, we propose new objective heuristics to verify the blocks detected by the text detection method based on angle information and proximity of the blocks. The angle for the detected block is computed using PCA to find the direction of the text block. In the same way, the proximity between pixels in the detected block is estimated to find closeness between pixels. Then the method combines these two heuristics to verify the text block to obtain a better result. We conduct experimental results on different databases to show that the performance of the text detection method increases in terms of recall, precision and f-measure with the text block verification methods. The database includes benchmark database ICDAR-2003 competition data, our own data captured by high resolution camera and captured by low resolution mobile camera.

Keywords

Scene text detection, Angle computation, Proximity estimation, Text block verification.

Introduction

Due to complex background, variation in fonts, font sizes, color and contrast of the text in natural scene images, developing a general algorithm for text block verification to improve text detection method’s accuracy is challenging in the field of image processing and pattern recognition [1] . Text detection with good accuracy in scene images is useful in many applications such as tracking license plate of moving vehicles and guiding a blind person to walk on the road. Now a day’s, text detection in scene images has drawn attention of the researchers because text detection and extraction bridges gap between low level and high level features with the help of OCR and semantics [2] . It is noted from literature on text detection in scene images especially on ICDAR-2003 data that the developed methods give good recall even for complex background images but they fail in achieving good precision due to more false positives [1] . Jung et al. [3] have proposed text extraction system (TES) with four stages: text detection, text localization, text extraction and enhancement, and recognition. Among these stages, text detection and text verification are critical to the overall method performance. In the last decade, many methods have been proposed to address image and video text detection and localization problems, and some of them have achieved impressive results for specific applications [3-7] . However, accurate text detection with good accuracy in natural scene images is still challenging due to the variations of text font, font size, color and alignment orientation and it is often affected by complex background, illumination changes, image distortion and degrading [8] .
The existing methods of text detection and localization can be roughly categorized into two groups: region based and connected component (CC) based. Region based methods use texture analysis as features and classifiers to classify text and non-text region. Because text regions have distinct textural properties from non-text ones, these methods can detect accurately even when images are noisy. On the other hand CC based methods directly segment candidate text components by edge detection or color clustering. The non-text components are then pruned with heuristic rules or classifiers. Since the number of segmented candidate components is relatively small, CC based methods have lower computation cost and the located text components can be directly used for recognition.
Although the existing methods have reported promising text detection performance, there still remain several problems to solve. For region based methods, the speed is relatively slow and the performance is sensitive to font size and font. On the other hand, CC based methods cannot segment text components accurately when complex background is present in the images [8] . Lienhart and Wernicke [9] have proposed a method for segmenting text in images and videos based on multi-layer feed forward network and temporal information. Automatic detection and recognition of signs from natural scenes is proposed in [10] which is based on multiresolution and multiscale edge detection, adaptive searching, color analysis and affine rectification in a hierarchical framework. Method for Devanagari and Bangla text extraction from natural scene images is proposed in [11] based on connected component analysis and mathematical morphology operation to achieve good accuracy. However, this method works well for those two scripts since the features are extracted based on headline. Pan et al. [12] have proposed a robust method to detect text in natural scene images based on cascade AdaBoost classifier. Non-text components are filtered out using Markov Random Fields (MRF). Chen and Yuille [13] have presented a method for text detection in natural scene images based on statistical analysis and AdaBoost classifier. Epshtein et al. [14] have proposed a method based on stroke width transform for text detection in scene images. This method works well as long as stroke width is constant in the character shape. Hybrid approach for text detection in natural scene images is proposed in [8] to overcome the problems of existing methods. This paper introduces text region detector to estimate the text existing confidence and scale information in image pyramid, which help segment candidate text components by local binarization. Recently, dot text detection in camera images is proposed based on FAST points [15] and active contour method [16] is also explored for locating text in camera images. To overcome the problems of scene text, Kunishige et al. [17] have tried to separate scene text and graphics based on environmental context features. This work is evident that scene text in camera images makes problem more complex and hence the methods get low precision.
It is observed from the literature that there are several methods for text detection in scene images but none of the methods achieve good precision to this problem because the proposed methods suffer from fixing constant thresholds, rules without any base and tuning parameters of classifier for classification. Further most of the existing methods use geometrical properties of the blocks to eliminate false positives. These properties are good if character shape is preserved and for uniform sized text but not for scene text in natural scene images. In addition, few methods have addressed the issue of text block verification to improve the text detection accuracy. Hence, in this paper, we propose two objective heuristics based on direction and proximity of pixel in the blocks detected by text detection method.

Proposed Methodology

The input for the proposed text block verification method (TBV) is blocks detected by the text detection method [18] which works based on run length criteria (RLM). It is observed from the results of text detection methods that for horizontal text images, the direction of text block is almost equal to zero angle with respect to x-axis since text components in the block usually aligned in horizontal direction while the direction of non-text block may not be equal to zero angle since component in the block may not have alignment as in text blocks. In order to extract this observation, we propose PCA to compute angle for each block detected by the text detection method to find the direction of the blocks. The reason to use PCA is that PCA gives accurate principal axis for the objects that have regular shape. For instance, the principal axis gives zero angle for horizontal text block in our work while for non-text objects (blocks) the principal axis may not give zero angle since the pixels in non-text object may not have alignment in a particular direction. In addition, the method does not expect PCA to give exact zero angle with respect to x axis for the text blocks. The method feeds x and y coordinates of pixels to PCA to compute angle for each block as the angle is computed based on Sobel edges corresponding to pixels in the detected blocks. The Sobel edge map is obtained by performing Sobel edge operator on the input image. If PCA gives an angle between 0-10 degrees with respect to x axis then we consider the block as a text block. Otherwise, it is considered as a non-text block. The range 0-10 is fixed based on sample experimental results. Sometimes, there are chances of obtaining angle for non-text blocks as same as text blocks due to complex background. Therefore, we present one more heuristic based on proximity between pixels.
One can notice from edges of detected blocks that the proximity between pixels in the text block will be close to each other because of cluster of pixels, while in non-text blocks, the distance or proximity between pixels may not be as close as text pixels because pixel in non-text objects usually will be scattered. As we are inspired by the work presented in [19] for text frame classification using proximity based features, we choose this feature for text block verification in this work. This feature is extracted by computing distance between first pixel to remaining pixel, second pixel to remaining pixels and so on. As a result, we get proximity matrix for each block. To measure the proximity, we compute standard deviation for each proximity matrix of the blocks. If the standard deviation of the block is less than 1 then we consider the block as a text block, otherwise it is considered as a non-text block. Finally, we combine both PCA based text block verification and proximity based text block verification to get better results. [Fig-1] shows sample results for PCA, Proximity and both combined text block verification. For the input image shown in [Fig-1] (a), the text detection method detects text with bounding box as shown in [Fig-1] (b). Text block verification (TBV) by PCA is shown in [Fig-1] (c) where it eliminates some of the non-text blocks but not at the cost of text blocks. [Fig-1] (d) shows text block verification by proximity, where proximity feature eliminates some of the non-text blocks without scarifying text blocks. [Fig-1] (e) shows sample result for the combined text block verification method, where the method eliminates all non-text blocks without affecting text blocks. The output can be seen in [Fig-1] (f).

Experimental Results

We use the text detection method proposed in [18] for detecting text blocks in an image. The detected text blocks are verified by the proposed heuristics. In this experimental study, we have found that the proposed text block verification methods are useful in improving the performance of the text detection methods. For this purpose, we consider ICDAR 2003 competition scene text data as a benchmark database [20] . We also test our method on our own data captured by high resolution camera and low resolution mobile camera to know the effectiveness of the method. In this work, we have considered 1218 images which include 210 from ICDAR 2003 competition, 523 from high resolution camera and 485 from mobile camera for evaluating the performance of the proposed method. We use recall, precision, f-measure and average processing time (APT) for measuring the performance of the proposed method, since these measures are quite common and used in most of the papers for text detection. We count manually the number of text blocks detected by the method as the ground truth or actual number of text blocks. True text blocks are counted when detected block area overlaps more than 80% with ground truth area otherwise we count it as a misdetection [6] . We follow instructions given in [6] for computing measures.

Experiments on ICDAR-2003 Competition Data

Text detection in ICDAR-2003 dataset is challenging due to complex background, non-uniform illumination and unfavorable characteristics of scene text. Therefore, false positive elimination is not as easy as differentiating text and non-text in document images. Hence, we consider this dataset as a benchmark dataset to show that the proposed text block verification method helps in improving text detection method’s accuracy in terms of recall, precision and f-measure at the costs of processing time. Sample results are shown in [Fig-2] where for the input image shown in [Fig-2] (a), text blocks are detected by the method in [18] as shown in [Fig-2] (b). For the detected text blocks, text block verification by PCA removes some of the false positives without scarifying true text blocks as shown in [Fig-2] (c), the same thing is true for text block verification by proximity for the detected text blocks [Fig-2] (b) as shown in [Fig-2] (d). The final result given by the combined PCA and proximity is shown in [Fig-2] (e) where true text blocks are preserved and false positives are removed. [Fig-2] (f) shows the output of the text block verification methods. This experiment shows that the proposed text block verification methods are useful for preserving true text blocks. The experimental results are reported in [Table-1] where the recall, precision and f-measure of PCA, Proximity and PCA+Proximity are higher than the recall, precision and f-measure of the run length based text detection method (RLM). However, the average processing time in seconds (APT) is higher for the proposed text block verification methods compared to the text detection-RLM because of additional text block verification methods.

Experiment on High Resolution Camera Images

This experiment is to show that the proposed text verification methods work well for the high resolution camera captured images. This dataset is not much complex compared to ICDAR-2003 dataset as these images have less complex background. However, all the images in this dataset include scene text. [Fig-3] shows sample results for text block verification methods. It is observed from the experiment as shown in [Fig-3] that the text block verification by PCA+Proximity method preserves all the true text blocks detected by the RLM method [Fig-3] (b) and eliminates non-text blocks as shown in [Fig-3] (c), (d), (e) and (f). The experimental results are reported in [Table-2] . According to results in [Table-2] , the proposed text block verification method (PCA+Proximity) helps in improving recall, precision and f-measure compared to the original recall, precision and f-measure of text detection-RLM at the cost of computational time.

Experiment on Low Resolution Mobile Camera Images

This experiment is to show that the proposed text verification methods work well even for the low resolution mobile camera images. This is complex data as these images are low resolution images compared to ICDAR-2003 and camera captured images. Since images are low resolution images, it is difficult to study the features of text and non-text components. However, text detection-RLM is capable of detecting text in low resolution images. Sample results for the text block verification methods are shown in [Fig-4] . It is observed from [Fig-4] that the text block verification by PCA+Proximity method preserves all the true text blocks detected by the RLM method [Fig-4] (b) and eliminates non-text blocks as shown in [Fig-4] (c), (d), (e) and (f). According to results in [Table-3] , the proposed text block verification method (PCA+Proximity) is useful in improving recall, precision and f-measure compared to the original recall, precision and f-measure of text detection-RLM at the cost of computational time. This concludes the proposed text block verification methods are good for both high resolution and low resolution images.
text block detected by the text detection method. We have used existing run length based text detection method for text blocks detection in images. We have added new text block verification method to this text detection method to show that text block verification method help in improving the performance of the text detection accuracy. These text block verification methods can be used with any other text detection method to eliminate false positives. Experimental results show that the proposed text block verification methods work well for different datasets. We are planning to extend these methods for verifying non-horizontal text blocks detected by the non-horizontal text detection method.

Acknowledgement

This research work is supported by University Grants Commission (UGC), New Delhi, India (F.No.UGC/MRP(s)/800/10-11/KAMY022).

References

[1] Zhang J., Goldgof D. and Kasturi R. (2008) ICPR, December, pp 1-4.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Doermann D., Liang J. and Li H. (2003) In Proc. ICDAR, pp 606-616.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Jung K., Kim K.I. and Jain A.K. (2004) Pattern Recognition, pp 977-997.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Jung K. (2001) Pattern Recognition Letters, pp 1503-1515.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Ye Q., Huang Q., Gao W. and Zhao D. (2005) Image and Vision Computing, pp 565-576.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Chen D., Odobez J.M. and Bourlard H. (2004) Pattern Recognition, pp 595-608.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Wu V., Manmatha R. and Riseman E.M. (1999) IEEE Transactions on PAMI, pp 1224-1229.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Pan Y.F., Hou X., and Liu C.L. (2011) IEEE Transactions on Image Processing, pp 800-813.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Lienhart R. and Wernicke A. (2002) IEEE Transactions on CSVT, pp 256-268.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Chen X., Yang J., Zhang J. and Waibel A. (2004) IEEE Transactions on Image Processing, pp 87-99.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Bhattacharya U., Parui S.K. and Mondal S. (2009) In Proc. ICDAR, pp 171-175.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] Pan Y.F., Hou X. and Liu C.L. (2008) In Proc. DAS, pp 35-42.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Chen X. and Yuille A. (2004) In Proc. CVPR, pp 366-373.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Epshtein B., Ofek E. and Wexler Y. (2010) In Proc. CVPR, pp 2963-2970.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] Du Y., Ai H. and Lao S. (2011) In Proc. ICDAR, pp 435-439.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Navon Y., Kluzner V. and Ophir B. (2011) In Proc. ICDAR, pp 222-226.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Kunishige Y., Yaokai F. and Uchida S. (2011) In Proc. ICDAR, pp 1049-1053.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Basavanna M., Shivakumara P., Srivatsa S.K. and Hemantha Kumar G. (2011) IICAI.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[19] Shivakumara P., Trung Quy Phan and Chew Lim Tan (2010) IEEE Trans. on CSVT, 2010, pp 1520-1532.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[20] Lucas S.M., Panaretos A., Sosa L., Tang A., Wong S. and Young R. (2003) In Proc. ICDAR, 2003, pp 1-6.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Text block verification based on angle and proximity information
Fig. 2- Experiment on ICDAR-2003 competition scene data
Fig. 3- Experiment on high resolution camera images
Fig. 4- Experiment on low resolution mobile camera images
Table 1- Sample result on ICDAR 2003 competition data
Table 2- Sample result on Camera (HR) image data
Table 3- Sample result on Mobile (LR) images data