Image Storage and Retrieval in OpenSolaris
Antonin Brjetchka - lead student researcher Sanjiv K. Bhatia - faculty advisor Drew Garrett, Dante Avery, Prasad Raghavendra - student researchers
University of Missouri – Saint Louis, St. Louis, MO, U.S.A.
Final Report [PDF]
Second Progress Report [PDF]
First Progress Report [PDF]
Statement of Purpose
In this proposal, we plan to develop a utility that can be used to store and organize image files in OpenSolaris. Our aim is to expand the utilization of OpenSolaris by providing adequate means to store large number of image files, and their fast and accurate retrieval. Our faculty advisor, Dr. Sanjiv Bhatia, and one of his graduate students, Debangshu Goswami, have developed a content-based image retrieval (CBIR) system named Robust Image Search Engine, or RISE. RISE has been developed on Sun’s hardware using Oracle and Java Advanced Imaging (JAI) tools. RISE is effective in retrieving images that are similar to a given image. However, it is limited in the sense that it does not have any facility to provide for image annotations, and hence, cannot support text-based image retrieval (TBIR). We will adapt the implementation of RISE using open source tools like ImageMagick and MySQL to provide for both CBIR and TBIR capabilities.
Background
With the increasing popularity and mobility of digital imaging devices, such as cameras, scanners, home entertaining systems, and cell phones, there is an equally increasing demand for building, storing, and querying a large image database. This task becomes increasingly complex as millions of internet users generate image files at an exponential rate. The currently dominant image retrieval techniques use human language to describe the contents of an image, also known as text-based image retrieval (TBIR). The drawbacks of this technique are poor accuracy and the retrieval of possibly unrelated images. These drawbacks are due to the varying perception of a single image by different individuals, and the existence of different ways to describe a single image in human language. While humans can easily discern objects in an image, they are unable to describe its full semantic content in an unambiguous language that can be tokenized for automatic indexing and retrieval [2]. This problem ensues from the fact that some image content that is important to one person may not be relevant to another. Furthermore, some content of the image may even be left out, which further increases the margin of error. Finally, this method is severely limited as it is time intensive, involves a human element to annotate images, and hence, is impractical for large databases [4]. The attempts to solve this problem led to the development of a technique, described as content-based image retrieval (CBIR).
One of the first CBIR systems resulted from the QBIC (Query by Image Content) project at IBM [3]. QBIC pioneered the idea of using a probability distribution function to describe color in the images. Later work in CBIR concentrated on creating indices to describe the images that improved the efficiency and effectiveness of the retrieval process.
There are two query methods in CBIR: query-by-example and query-by-memory [1]. Query-by-example allows the user to select an image to be used for the query. The system then retrieves images that have a similar probability distribution function for color as the query image. In query-by-memory the user may select one or more image features, such as texture, color, and shape, from their memory to be used in the query. RISE was developed as a CBIR system that provides an image repository with an interface, which allows querying and maintaining an image database. However, RISE completely ignored the TBIR aspect of retrieval, which may be of tremendous use in a real application for query and retrieval. RISE has a nice web-based interface that can be extended to provide it the capability of both CBIR and TBIR systems. Having both methods of image retrieval will further increase the usability and scalability of the system.
The TBIR system of RISE will allow the user to specify image descriptive tags as images are inserted into the database. These tags may later be used to query the database, providing functionality similar to popular websites, such as Google and flickr. The user will be able to select the method of image retrieval, options will be provided for TBIR, CBIR, or a combination of both methods. These options will enable to user to narrow or widen the range of results. The use of TBIR will yield a winder range, less accurateboth techniques in a single query will increase result accuracy even more. By using TBIR and CBIR together images that have similar color attributes but contain undesirable tags, or contain matching tags but different color attributes, will be omitted thus producing more desirable results.
RISE uses the distribution of colors in different regions of the image in order to compute its signature at multiple resolutions. The signature of an image in the database is computed by systematically dividing the image into a set of small blocks of pixels and then computing the average color of each block [4]. This is based on the Discrete Cosine Transform (DCT) that forms the basis for popular JPEG image file format [4]. In RISE, this process can be applied to any image format, which allows for great flexibility of the system. RISE divides an image in a quadtree structure [7] and computes the average color in each pixel block within the subdivided image. RISE stores the computed information into a quad tree structure [7] to form the signature of an image. RISE extracts the RGB data of an image and converts it into L*a*b* color space [6], this provides a uniform way of comparing images of different file formats.
RISE uses the L*a*b* color space for its perceptual linearity. The L*a*b* color model was developed by CIE (Commission Internationale d’Eclairage) to describe the colors as perceived by the human eye [4]. This model represents colors in a three dimensional space. The vertical axis corresponds to the luminance channel L*, the contrast between green and red is represented by the axis a*, and the contrast between yellow and blue is represented by the axis b*. This model was designed to be device independent and to be used as a reference. It provides linear response to human visual perception [4]. The conversion from RGB to L*a*b* color space is based on the Rec. 709 standard [5]. The conversion is based on CIE XYZ, a set of three linear light components that embed the spectral properties of human color perception.
In RISE, the size of an image is converted to 512 x 512 pixels, and the resized image is divided into 8x8 blocks. We decided on using the 8x8 pixel block in accordance with the requirement in JPEG compression where each 8x8 block is filtered through Discrete Cosine Transform to separate the frequencies. The DC value of the block, in location (0,0), gives us the average pixel value in the block. The average color values of four 8x8 blocks, comprising a 16x16 block, represent leaf nodes in the quad tree, and each parent node contains an average of the values of its child nodes, thus the root node contains an average value for the whole tree. This setup allows two images to be compared at different levels and a degree of similarity may be determined. Using a commercial relational database system (RDBMS) to store and query signatures of images improves the efficiency of the system [4]. The signatures of images are stored in a table in the RDBMS. The columns of the table include the name of the image, the average L*a*b* values at each level of the quad tree structure. We plan to extend RISE by adding additional attributes in the RDBMS to store tokens to be used by the TBIR system. Using Structured Query Language (SQL) allows the system to specify the level of comparison of two images. In addition an RDBMS provides an index for faster access to data. An index contains an entry for each value that appears in the indexed column(s) of the relation and provides direct, fast access to rows [4].
Approach
RISE will be implemented into OpenSolaris in C, for its speed and efficiency. In addition, MySQL and ImageMagick will be used in this project.
MySQL is open source and a widely available RDBMS. Its speed, reliability, and ease of use make it the perfect RDBMS to use for this project. MySQL will be used to store and maintain image signatures, and to provide a fast and efficient search and retrieval of the images. This project will take advantage of the ability of MySQL to be embedded into applications; this feature will eliminate the complexity of creating a separate database in order to use RISE. Embedded MySQL eliminates the administration of a stand-alone database server, and provides a fully featured database at the cost of a very small footprint. Furthermore, embedding MySQL into the application also eliminates the overhead of establishing a connection and the client-server correspondence.
ImageMagick will be used for image-specific operations; this will additionally improve system efficiency. ImageMagick will allow us the capability to read multiple image formats; this capability came in RISE from the use of JAI tools. MagickWand is a C application program interface (API) of ImageMagick, which will be used to convert images and calculate their average values. Image processing will be very fast and efficient, since MagickWand is already optimized to perform these operations. This will also allow us to concentrate on developing the algorithm, rather than manipulating the images. MagickWand provides a wide variety of functions to retrieve information about an image, these functions will allow us to fine-tune the signature calculation process and increase the overall accuracy and performance of the system.
End Result
At the end of this project we expect to have an application, which will provide the ability to store, search, and retrieve images through a web based graphical user interface ( GUI ). In addition, this implementation of RISE will provide both TBIR and CBIR methods of image retrieval. This will allow the user to describe content to be searched for or specify an example image.
References:
- E. L. van den Broek, P. M. Kisters, and L. G. Vuurpijl. Design guidelines for a contentbased image retrieval color-selection interface. In Proceedings of the Conference on Dutch Directions in HCI, Amsterdam, Holland, 2004.
- S. Climer and S. K. Bhatia. Image database indexing using JPEG coefficients. Pattern Recognition, 35(11):2479–2488, November 2002.
- M. Flickner, et al. Query by Image Content: The QBIC System. IEEE Computer, 28(9): 23-32, September 1995.
- D. Goswami, S.K. Bhatia, and A. Samal. RISE: A Robust Image Search Engine. Pattern Recognition Theory and Applications. E. A. Zoeller (ed.). Nova Science Publishers, 2007.
- ITU-R Recommendation BT.709. Basic parameter values for the hdtv standard for the studio and for international programme exchange. Technical Report BT.709 [formerly CCIR Rec. 709], International Telecommunications Union, 1211 Geneva 20, Switzerland, 1993.
- C. Poynton. A guided tour of color space. In New Foundations for Video Technology: Proceedings of the SMPTE Advanced Television and Electronic Imaging Conference, San Francisco, CA. February 1995. Pages 167-180.
- H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys, 16(2):187–260, June 1984.