CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION

PATIL S.M.; KOLEKAR U.D.

CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION

PATIL S.M.¹*, KOLEKAR U.D.²*
¹Department of Information Technology, Bharati Vidyapeeth College of Engineering, Navi Mumbai, MS, India
²Department of Elec. & Telecommunication, Lokmanya Tilak College of Engineering, Navi Mumbai, MS, India
* Corresponding Author : udkolekar@gmail.com

Received : 01-01-2011 Accepted : 12-01-2011 Published : 15-07-2011
Volume : 1 Issue : 1 Pages : 1 - 4
J Pattern Intell 1.1 (2011):1-4

Cite - MLA : PATIL S.M. and KOLEKAR U.D. "CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION." Journal of Pattern Intelligence 1.1 (2011):1-4.

Cite - APA : PATIL S.M., KOLEKAR U.D. (2011). CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION. Journal of Pattern Intelligence, 1 (1), 1-4.

Cite - Chicago : PATIL S.M. and KOLEKAR U.D. "CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION." Journal of Pattern Intelligence 1, no. 1 (2011):1-4.

Copyright : © 2011, PATIL S.M. and KOLEKAR U.D., Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

We propose a content based image retrieval system based on object extraction through image segmentation. A general and powerful multiscale segmentation algorithm automates the segmentation process, the output of which is assigned novel colour and texture descriptors which are both effiÂ¬cient and effective. Query strategies consisting of a semi-automated and a fully automated mode are developed which are shown to produce good results. We then show the supeÂ¬riority of our approach over the global histogram approach which proves that the ability to access images at the level of objects is essential for CBIR.

Introduction

Image retrieval has traditionally been based on manual caption insertion describing the scene which can then be searched using keywords. Caption insertion is a very subÂ¬jective procedure and quickly becomes extremely tedious and time consuming, especially for large image databases which are becoming ever more common with the growing availability of digital cameras and scanners. There is thus an urgent need for effective content-based image retrieval (CBIR) systems.
We believe the key to effective CBIR performance lies in the ability to access the image at the level of objects. This is because users generally want to search for images conÂ¬taining particular object(s) of interest and thus the ability to represent, index and query images at the level of objects is critical [3] . In this paper, we present a framework for CBIR based on unsupervised segmentation of images into classes and querying using properties of these classes. As these segÂ¬mented classes are homogeneous in some sense (in our case, colour and texture), they correlate well with the identity of objects. By decomposing images as combinations of obÂ¬jects in this manner, querying becomes more meaningful and intuitive than it is with global image properties. This is obviously true for images with distinct foreground objects but the rationale also holds for â€˜backgroundâ€™ images where no interesting foreground objects are present. Images beÂ¬longing to the latter category can be thought of consisting of combinations of classes with homogeneous colour and texture (for example, images of the seaside generally conÂ¬sist of the beach and the sea, images of sunset scenes genÂ¬erally consist of the reddish sky and dark silhouettes and so forth) and querying is made more effective by being based on these class combinations which characterise the scene.
In our CBIR implementation, images are firstly seg-mented based on joint colour and textural features using our previously developed unsupervised multiscale segmenÂ¬tation algorithm [6,7] . The segmentation process is comÂ¬pletely unsupervised and performed off-line for each image. Following this, we represent each image using effective and compact colour and textural descriptors of its classes. We then structure the descriptor database following a relational model which allows its implementation on powerful relaÂ¬tional database engines. Class attribute queries are proÂ¬cess a up in the retrieval process if parallel processor ma-chines are used.
In Section 2, we will briefly describe the segmentation algorithm employed. We will then discuss the descrip-tors assigned to each class in Section 3. In Section 4, we present our query strategy as well as preliminary results from queries on our image database testbed consisting of various natural images.

Unsupervised Segmentation

Our unsupervised segmentation algorithm involves the following steps:
Normalised colour and texture features (three for colour and two for texture) are mapped to a multidi-mensional feature space. Spatial information is incor-porated into the process by including spatial features into the feature space. The colour space used is S-CIE L*a*b*, the spatial extension of the perceptual uniform CIE L*a*b*, originally developed by Zhang and Wandell [10] . This colour space takes into account the appearance of fine-patterned colours on the human visual system. Textural features meanwhile are genÂ¬erated using the logarithm of the energies of the 2-D complex wavelet coefficients [8] 1 and taking the top two principal components.
Significant features which correspond to clusters in the feature space are assumed to be representations of unÂ¬derlying classes, the recovery of which is achieved usÂ¬ing the mean shift procedure [4] , a robust kernel based decomposition method. The kernel size used was fixed for all images, resulting in a decomposition into an apÂ¬propriate number of classes for each image. By determining the number of classes and the propÂ¬erties of each class via step 2, a Bayesian multiscale processing approach, which models the inherent unÂ¬certainty in the joint specification of class and position spaces using the Multiscale Random Field model [1] , is used for the subsequent classification process. Typical segmentation maps of images in our database are shown in [Fig-2] .

Describing the Classes

Once an image has been segmented, we proceed to exÂ¬tract a description of each class with the total description of classes constituting a description of the image. A class descriptor has to embody the class characteristics (which typically translates to representing a particular object) in an effective fashion to facilitate efficient indexing and accurate retrieval. Thus, designing an effective class descriptor is more difficult than designing feature extractors for segmentation and thus they should be seen as separate processes.

Colour Descriptors

In order to represent the colour distribution of each class, we store the colour histograms of the pixels of the class. This histogram is based on bins with width 10 in each dimension of the S-CIE L*a*b* colour space. This spacing yields 10 bins in the L* dimension and 40 bins in each of the a* and b* dimensions, for a total of 90 numbers as colour descriptors.
To evaluate the dissimilarity between the colour histograms of two classes/objects, we apply the Kolmogorov-Smirnov (K-S) distance, as originally proposed in [5] The K-S distance essentially measures the difference between two probability distribution function. If F1(k) and F2(k) are two independent sample distribution functions (i.e. histograms) defined such that:

$\tiny \dpi{200} \bg_white F_{1}(k)=\frac{1}{n}\#(i:y_{t}^{i}\leq k)\; \; \; \; \; (1)$

where n is the number of data samples, $\tiny \dpi{150} y_{t}^{i}$ so that 1 < i < n, then the K-S distance is the maximum difference between the distribution over all k:

$\tiny \dpi{200} K-S (y^{1},y^{2})=max\left | F_{1}(k)-F_{2}(k) \right |\; \; \; \; \; (2)$

The overall colour dissimilarity measure between two classes with colour histograms xCOL and yCOL is taken to be the root mean square of the K-S distances of each of the L*, a* and b* histograms:

$\tiny \dpi{200} d_{COL}(x^{COL},y^{COL})=\frac{1}{3}\left \{ \left [ K-S(x_{L}^{COL},y_{L}^{COL}) \right ] ^{2}\right.$

$\tiny \dpi{200} \left. \left [ K-S(x_{a*}^{COL},y_{a*}^{COL}) \right ]^{2} +\left [ K-S(x_{b*}^{COL},y_{b*}^{COL}) \right ]^{2} \right \}\; \; \; \; \; (3)$

As the range of K-S distances lie between 0 and 1, the colour similarity measure, $\tiny \dpi{150} s^{COL}$ ( $\tiny \dpi{150} x^{COL},y^{COL}$ ) is simply taken as:

$\tiny \dpi{200} s_{COL}(x,y)=1-d_{COL}(x,y)\; \; \; \; \; (4)$

Texture Descriptors

For each class, texture is described by the distribution of the magnitude of its complex wavelet coefficients, f(xTEX). Using four levels of the 2-D complex wavelet transform (which yields six complex subbands at every level) produces a total of 24 subbands, the magnitude of each is converted into a histogram and modelled as a generalised Rayleigh distribution

$\tiny \dpi{200} f(x_{i}^{TEX})=k_{i}x_{i}^{TEX}exp-\left ( \frac{x_{i}^{TEX}}{\sigma _{i}} \right )^{\beta i};i=1,2,...,24\; \; \; \; \; (5)$

where, to achieve the same mean and variance as the input sample distribution
Thus, for each class, the generalised Rayleigh model parameters, Ïƒi and Î²i is calculated for each of the 24 hisÂ¬tograms, for a total of 48 numbers as texture descriptors. To compute the texture dissimilarity between two classes, we:

1. Generate probability distribution functions for each of the 24 subbands of each class using the stored values of Ïƒi and Î²i.
Apply the K-S distance between histograms corre-sponding to the same subband.
The overall texture dissimilarity measure is calculated as the root mean square of all the K-S distances. The final similarity measure is given by the subtraction of the dissimilarity measure from unity.

Image Retrieval

The class descriptor database is structured using a relational model. This allows its implementation on powerful commercial relational database engines and for queries and retrieval to be described using SQLâ€™s Select and Join operations [9] . For example, as the first step, descriptors of particular classes of an image can be extracted from the database using a simple Select operation.

Querying Strategy

There are two modes of operations for our image re-trieval system: a semi-automated mode and a fully autoÂ¬mated mode. In the semi-automated mode, the user comÂ¬poses a query by submitting an image and by seeing the segÂ¬mentation map, selects the class or classes to match. There is also an option of selecting the relative importance of the classes (should there be more than one in the query comÂ¬position); by default, all classes in a query are considered equally important.
All â€˜compoundâ€™ queries, i.e. queries being based on more than one class, are firstly decomposed into â€˜simpleâ€™ queries, i.e. queries based on a single class. The similarity match for each simple query is calculated as follows:
Colour and texture descriptors for the queried class are retrieved from the descriptor database
The similarity measures for colour and texture are computed for classes in the database whose sizes (specified as a fraction of the image) are at least 25% of the queried class
The overall similarity measure is taken to be the weighted combination of the colour and texture simi-larity measures, with the weights set by the user. By default, colour and texture similarities are weighted equally
The SQL Join operation on the simple queriesâ€™ match lists will obtain the set of common images, with the best match maximising the similarity measures, weighted acÂ¬cording to their relative importance. As simple queries can be processed in parallel, significant speed-up in the retrieval process is possible with parallel processor machines.
In the fully automated mode, the user has only to subÂ¬mit a query image and the algorithm is designed to handle the rest. In this case, we first perform simple queries on classes of the image which constitute at least 10% of the image. In the absence of a theoretical foundation to deterÂ¬mine the relative importance of the classes, we simply sum up the top 10 similarity measures of the match lists of each of these classes. This step will provide us information as to which classes have relatively high matching scores and thus possess a higher probability of being an â€˜object of inÂ¬terestâ€™. Finally, a compound query is performed on the top two classes of the image with the highest matching scores.

Results

We have performed a variety of queries on our small imÂ¬age database testbed for both the semi-automated and the fully automated mode. Preliminary results are encouraging as shown in Figures 3 and 4. We are currently in the process of expanding our image database to include as many varied natural images as possible.
Results were generated using a one-class default query for zebra and tiger images while automated retrieval was utilised for sunset and autumn scene images where no interesting foreÂ¬ground objects are present. Retrieval performance is particularly good for sunset and zebra images while results for tiger and autumn scene images arenâ€™t too bad either.
We also compared the performance of our method with and without the presegmentation stage (i.e. in the latter case, querying based on global histograms). For fair comparison, the fully automated mode is used for the approach with the presegmentation stage.
The table below compares the image retrieval rates beÂ¬tween the two approaches for leopard, bear, sunset and winter scene images. The approach with the pre-segmentation stage performs better for all image categories tested alÂ¬though the global histogram approach produces reasonable results especially for sunset images. These results are con-sistent with our belief that the key to effective CBIR perÂ¬formance lies in the ability to access images at the level of objects.
Image and the selected class (with green borders, top left), with the reÂ¬trieved images, depicted with the matching class, arranged from highest similarity, from left to right, top to bottom.

References

[1] Bouman C. and Shapiro M. (1994) IEEE TransacÂ¬tions on Image Processing, 3(2):162â€“177.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[2] Burges C. (1988) Data Mining and Knowledge Discovery, 2, 2.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[3] Carson C., Thomas M., Belongie S., Hellerstein J. and Malik J.(1999) In Proceedings of the 3rd InternaÂ¬tional Conference on Visual Information Systems.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[4] Fukunaga K. and Hosteler L. (1975) IEEE Transactions on Information Theory, 21:32â€“40.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[5] Graffigne C., Geman D., Geman S. and Dong P. (1990) IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7):609â€“ 628.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[6] Kam A. and Fitzgerald W. (2000) Proceedings of the 3rd InÂ¬ternational Conference on Computer Vision, Pattern RecogÂ¬nition and Image Processing (CVPRIP 2000), volume 2, pages 54â€“57, Atlantic City, New Jersey, USA, February 27-March 3, 2000.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[7] Kam A. and Fitzgerald W. (1999) Proceedings of the 33rd Asilomar Conference on Signals, Computers and Systems, Pacific Grove, California, USA.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[8] Kingsbury N. (1999) Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSPâ€™99), Phoenix, Arizona, USA, SPTM 3.6.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[9] Smith J. (1997) PhD thesis, Graduate School of Arts and Sciences, Columbia University, USA, February
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

[10] Zhang X. and Wandell B. (1998) Color Image Fidelity Metrics Evaluated Using Image Distortion Maps. Signal Processing, 70:201â€“214.
» CrossRef » Google Scholar » PubMed » DOAJ » CAS » Scopus

Images

	Fig. 1- Decomposing an image by segmenÂ¬tation into classes corresponding to â€˜objectsâ€™
	Fig. 2- Typical segmentation maps of imÂ¬ages parallel strategy which results in significant speed-up in the retrieval process if parallel processor maÂ¬chines are used.
	Fig. 3- Example of a one-class default query: Query image and the selected class (with green borders, top left), with the reÂ¬trieved images, depicted with the matching class, arranged from highest similarity, from left to right, top to bottom
	Fig. 4- Example of an automated retrieval: Query image (with green borders, top left) and the retrieved images, arranged from highest similarity, from left to right, top to bottom
	Table 1-

Licence

ISSN & EISSN

Scan QR Code

Journal Details

Special Issues

Publishing Ethics

Share

CONTENT BASED IMAGE RETRIEVAL THROUGH OBJECT EXTRACTION

Translate Article

Article Category

Article Statistics

Downloads

Citations

Cited By

Cite

Import Cite

Share

Related Article

Google Scholar

PubMed