CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION

PATIL S.M.1*, KOLEKAR U.D.2*
1Department of Information Technology, Bharati Vidyapeeth College of Engineering, Navi Mumbai, MS, India
2Department of Elec. & Telecommunication, Lokmanya Tilak College of Engineering, Navi Mumbai, MS, India
* Corresponding Author : udkolekar@gmail.com

Received : 01-01-2011     Accepted : 12-01-2011     Published : 15-07-2011
Volume : 1     Issue : 1       Pages : 1 - 4
J Pattern Intell 1.1 (2011):1-4

Cite - MLA : PATIL S.M. and KOLEKAR U.D. "CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION." Journal of Pattern Intelligence 1.1 (2011):1-4.

Cite - APA : PATIL S.M., KOLEKAR U.D. (2011). CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION. Journal of Pattern Intelligence, 1 (1), 1-4.

Cite - Chicago : PATIL S.M. and KOLEKAR U.D. "CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION." Journal of Pattern Intelligence 1, no. 1 (2011):1-4.

Copyright : © 2011, PATIL S.M. and KOLEKAR U.D., Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

We propose a content based image retrieval system based on object extraction through image segmentation. A general and powerful multiscale segmentation algorithm automates the segmentation process, the output of which is assigned novel colour and texture descriptors which are both effi¬cient and effective. Query strategies consisting of a semi-automated and a fully automated mode are developed which are shown to produce good results. We then show the supe¬riority of our approach over the global histogram approach which proves that the ability to access images at the level of objects is essential for CBIR.

Introduction

Image retrieval has traditionally been based on manual caption insertion describing the scene which can then be searched using keywords. Caption insertion is a very sub¬jective procedure and quickly becomes extremely tedious and time consuming, especially for large image databases which are becoming ever more common with the growing availability of digital cameras and scanners. There is thus an urgent need for effective content-based image retrieval (CBIR) systems.
We believe the key to effective CBIR performance lies in the ability to access the image at the level of objects. This is because users generally want to search for images con¬taining particular object(s) of interest and thus the ability to represent, index and query images at the level of objects is critical [3] . In this paper, we present a framework for CBIR based on unsupervised segmentation of images into classes and querying using properties of these classes. As these seg¬mented classes are homogeneous in some sense (in our case, colour and texture), they correlate well with the identity of objects. By decomposing images as combinations of ob¬jects in this manner, querying becomes more meaningful and intuitive than it is with global image properties. This is obviously true for images with distinct foreground objects but the rationale also holds for ‘background’ images where no interesting foreground objects are present. Images be¬longing to the latter category can be thought of consisting of combinations of classes with homogeneous colour and texture (for example, images of the seaside generally con¬sist of the beach and the sea, images of sunset scenes gen¬erally consist of the reddish sky and dark silhouettes and so forth) and querying is made more effective by being based on these class combinations which characterise the scene.
In our CBIR implementation, images are firstly seg-mented based on joint colour and textural features using our previously developed unsupervised multiscale segmen¬tation algorithm [6,7] . The segmentation process is com¬pletely unsupervised and performed off-line for each image. Following this, we represent each image using effective and compact colour and textural descriptors of its classes. We then structure the descriptor database following a relational model which allows its implementation on powerful rela¬tional database engines. Class attribute queries are pro¬cess a up in the retrieval process if parallel processor ma-chines are used.
In Section 2, we will briefly describe the segmentation algorithm employed. We will then discuss the descrip-tors assigned to each class in Section 3. In Section 4, we present our query strategy as well as preliminary results from queries on our image database testbed consisting of various natural images.

Unsupervised Segmentation

Our unsupervised segmentation algorithm involves the following steps:
Normalised colour and texture features (three for colour and two for texture) are mapped to a multidi-mensional feature space. Spatial information is incor-porated into the process by including spatial features into the feature space. The colour space used is S-CIE L*a*b*, the spatial extension of the perceptual uniform CIE L*a*b*, originally developed by Zhang and Wandell [10] . This colour space takes into account the appearance of fine-patterned colours on the human visual system. Textural features meanwhile are gen¬erated using the logarithm of the energies of the 2-D complex wavelet coefficients [8] 1 and taking the top two principal components.
Significant features which correspond to clusters in the feature space are assumed to be representations of un¬derlying classes, the recovery of which is achieved us¬ing the mean shift procedure [4] , a robust kernel based decomposition method. The kernel size used was fixed for all images, resulting in a decomposition into an ap¬propriate number of classes for each image. By determining the number of classes and the prop¬erties of each class via step 2, a Bayesian multiscale processing approach, which models the inherent un¬certainty in the joint specification of class and position spaces using the Multiscale Random Field model [1] , is used for the subsequent classification process. Typical segmentation maps of images in our database are shown in [Fig-2] .

Describing the Classes

Once an image has been segmented, we proceed to ex¬tract a description of each class with the total description of classes constituting a description of the image. A class descriptor has to embody the class characteristics (which typically translates to representing a particular object) in an effective fashion to facilitate efficient indexing and accurate retrieval. Thus, designing an effective class descriptor is more difficult than designing feature extractors for segmentation and thus they should be seen as separate processes.

Colour Descriptors

In order to represent the colour distribution of each class, we store the colour histograms of the pixels of the class. This histogram is based on bins with width 10 in each dimension of the S-CIE L*a*b* colour space. This spacing yields 10 bins in the L* dimension and 40 bins in each of the a* and b* dimensions, for a total of 90 numbers as colour descriptors.
To evaluate the dissimilarity between the colour histograms of two classes/objects, we apply the Kolmogorov-Smirnov (K-S) distance, as originally proposed in [5] The K-S distance essentially measures the difference between two probability distribution function. If F1(k) and F2(k) are two independent sample distribution functions (i.e. histograms) defined such that:



where n is the number of data samples, so that 1 < i < n, then the K-S distance is the maximum difference between the distribution over all k:



The overall colour dissimilarity measure between two classes with colour histograms xCOL and yCOL is taken to be the root mean square of the K-S distances of each of the L*, a* and b* histograms:





As the range of K-S distances lie between 0 and 1, the colour similarity measure, () is simply taken as:

Texture Descriptors

For each class, texture is described by the distribution of the magnitude of its complex wavelet coefficients, f(xTEX). Using four levels of the 2-D complex wavelet transform (which yields six complex subbands at every level) produces a total of 24 subbands, the magnitude of each is converted into a histogram and modelled as a generalised Rayleigh distribution



where, to achieve the same mean and variance as the input sample distribution
Thus, for each class, the generalised Rayleigh model parameters, σi and βi is calculated for each of the 24 his¬tograms, for a total of 48 numbers as texture descriptors. To compute the texture dissimilarity between two classes, we:

1. Generate probability distribution functions for each of the 24 subbands of each class using the stored values of σi and βi.
Apply the K-S distance between histograms corre-sponding to the same subband.
The overall texture dissimilarity measure is calculated as the root mean square of all the K-S distances. The final similarity measure is given by the subtraction of the dissimilarity measure from unity.

Image Retrieval

The class descriptor database is structured using a relational model. This allows its implementation on powerful commercial relational database engines and for queries and retrieval to be described using SQL’s Select and Join operations [9] . For example, as the first step, descriptors of particular classes of an image can be extracted from the database using a simple Select operation.

Querying Strategy

There are two modes of operations for our image re-trieval system: a semi-automated mode and a fully auto¬mated mode. In the semi-automated mode, the user com¬poses a query by submitting an image and by seeing the seg¬mentation map, selects the class or classes to match. There is also an option of selecting the relative importance of the classes (should there be more than one in the query com¬position); by default, all classes in a query are considered equally important.
All ‘compound’ queries, i.e. queries being based on more than one class, are firstly decomposed into ‘simple’ queries, i.e. queries based on a single class. The similarity match for each simple query is calculated as follows:
Colour and texture descriptors for the queried class are retrieved from the descriptor database
The similarity measures for colour and texture are computed for classes in the database whose sizes (specified as a fraction of the image) are at least 25% of the queried class
The overall similarity measure is taken to be the weighted combination of the colour and texture simi-larity measures, with the weights set by the user. By default, colour and texture similarities are weighted equally
The SQL Join operation on the simple queries’ match lists will obtain the set of common images, with the best match maximising the similarity measures, weighted ac¬cording to their relative importance. As simple queries can be processed in parallel, significant speed-up in the retrieval process is possible with parallel processor machines.
In the fully automated mode, the user has only to sub¬mit a query image and the algorithm is designed to handle the rest. In this case, we first perform simple queries on classes of the image which constitute at least 10% of the image. In the absence of a theoretical foundation to deter¬mine the relative importance of the classes, we simply sum up the top 10 similarity measures of the match lists of each of these classes. This step will provide us information as to which classes have relatively high matching scores and thus possess a higher probability of being an ‘object of in¬terest’. Finally, a compound query is performed on the top two classes of the image with the highest matching scores.

Results

We have performed a variety of queries on our small im¬age database testbed for both the semi-automated and the fully automated mode. Preliminary results are encouraging as shown in Figures 3 and 4. We are currently in the process of expanding our image database to include as many varied natural images as possible.
Results were generated using a one-class default query for zebra and tiger images while automated retrieval was utilised for sunset and autumn scene images where no interesting fore¬ground objects are present. Retrieval performance is particularly good for sunset and zebra images while results for tiger and autumn scene images aren’t too bad either.
We also compared the performance of our method with and without the presegmentation stage (i.e. in the latter case, querying based on global histograms). For fair comparison, the fully automated mode is used for the approach with the presegmentation stage.
The table below compares the image retrieval rates be¬tween the two approaches for leopard, bear, sunset and winter scene images. The approach with the pre-segmentation stage performs better for all image categories tested al¬though the global histogram approach produces reasonable results especially for sunset images. These results are con-sistent with our belief that the key to effective CBIR per¬formance lies in the ability to access images at the level of objects.
Image and the selected class (with green borders, top left), with the re¬trieved images, depicted with the matching class, arranged from highest similarity, from left to right, top to bottom.

References

[1] Bouman C. and Shapiro M. (1994) IEEE Transac¬tions on Image Processing, 3(2):162–177.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Burges C. (1988) Data Mining and Knowledge Discovery, 2, 2.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Carson C., Thomas M., Belongie S., Hellerstein J. and Malik J.(1999) In Proceedings of the 3rd Interna¬tional Conference on Visual Information Systems.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Fukunaga K. and Hosteler L. (1975) IEEE Transactions on Information Theory, 21:32–40.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Graffigne C., Geman D., Geman S. and Dong P. (1990) IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7):609– 628.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Kam A. and Fitzgerald W. (2000) Proceedings of the 3rd In¬ternational Conference on Computer Vision, Pattern Recog¬nition and Image Processing (CVPRIP 2000), volume 2, pages 54–57, Atlantic City, New Jersey, USA, February 27-March 3, 2000.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Kam A. and Fitzgerald W. (1999) Proceedings of the 33rd Asilomar Conference on Signals, Computers and Systems, Pacific Grove, California, USA.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Kingsbury N. (1999) Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP’99), Phoenix, Arizona, USA, SPTM 3.6.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Smith J. (1997) PhD thesis, Graduate School of Arts and Sciences, Columbia University, USA, February  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Zhang X. and Wandell B. (1998) Color Image Fidelity Metrics Evaluated Using Image Distortion Maps. Signal Processing, 70:201–214.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Decomposing an image by segmen¬tation into classes corresponding to ‘objects’
Fig. 2- Typical segmentation maps of im¬ages parallel strategy which results in significant speed-up in the retrieval process if parallel processor ma¬chines are used.
Fig. 3- Example of a one-class default query: Query image and the selected class (with green borders, top left), with the re¬trieved images, depicted with the matching class, arranged from highest similarity, from left to right, top to bottom
Fig. 4- Example of an automated retrieval: Query image (with green borders, top left) and the retrieved images, arranged from highest similarity, from left to right, top to bottom
Table 1-