Evaluation of Image Retrieval Systems: Role of User Feedback

Evaluation of Image Retrieval Systems: Role of User Feedback

Samantha K. Hastings


INTELLECTUAL ACCESS TO A GROWING NUMBER OF NETWORKED image repositories is but a small part of the much larger problem of intellectual access to new information formats. As more and more information becomes available in digital formats, it is imperative that we understand how people retrieve and use images. Several studies have investigated how users search for images, but there are few evaluation studies of image retrieval systems. Preliminary findings from research in progress indicate a need for improved browsing tools, image manipulation software, feedback mechanisms, and query analysis. Comparisons are made to previous research results from a study of intellectual access to digital art images. This discussion will focus on the problems of image retrieval identified in current research projects, report on an evaluation project in process, and propose a framework for evaluation studies of image retrieval systems that emphasizes the role of user feedback.


Problems with the retrieval of images are complicated by a lack of knowledge of how people search for, and use, images. There is a proliferation of image databases available on servers connected to the Web. As the number of images available increases, the more difficult it becomes to find the image that meets a specific information need. In addition, many of the documents that are being converted into electronic formats contain images. Traditional retrieval and indexing methods for providing access to large text databases do not offer adequate access to the images. Text retrieval research has a history of several thousand years. Retrieval research for images has been going on for approximately ten years, and we are just now beginning to examine the content of images instead of viewing them as black boxes described by textual descriptors.

The differences between text and images necessitate that research in retrieval techniques for images begin with an understanding of how people search for images, how images are indexed, how images are used, input from users, and what manipulations of the images are needed for specific tasks. When the focus is narrowed to digital art images, the problem is even more complex because there are queries of art that are not specific or dependent on content. The investigation of intellectual access to art images is a small piece of the retrieval problem, but the nature of how people search art images reflects the difficulty of the problem. This is not just an indexing problem; sophisticated technology does not solve it, and it seems that pattern-matching algorithms only seem to work with known item searches.


The major problems with the retrieval of digital images may be divided into four main categories: technical, semantic, content, and relativity. Technical problems include load time and bandwidth, lack of standard formats, color match systems, the size of image files in general, compression losses, and resolution variables. Most of these technical issues are capable of being resolved (Lynch, 1991; Besser & Trant, 1995). If we assume that bandwidth will increase, compression algorithms will improve, color match systems will be standardized, and needed resolutions will become available, then these technical problems should not consume us in the investigation of intellectual access to digital art images.

Semantic or concept-based problems deal with image retrieval terminology. Controlled vocabularies and standards to enable uniform access are used for concept-based indexing and retrieval. Projects such as the Art and Architecture Thesaurus, ICONCLASS, The Thesaurus for Graphic Materials (TGM), The Consortium for the Computer Interchange of Museum Information (CIMI), The Art Museum Image Consortium (AMICO), and many European projects attempt to standardize the language and retrieval mechanisms used to search for images (Barnett & Petersen, 1989; Busch, 1992; Moen, 1998).

We know that terms contained in a user’s query are important indicators for indexed retrieval of images (Enser, 1995; Armitage & Enser, 1997; Jorgensen, 1996). Natural language searching is also investigated in a hypermedia environment with information in text nodes connected to an image for generation of a descriptor for the image (Dunlop & Van Rijsbergen, 1993). However, it is clear that using text to index a nontextual medium leaves much to be desired. Enser (1995) states “linguistic identifiers, in the form of indexing terms, titles and captions, attached to images within a collection offer little promise as an effective pictorial information retrieval procedure” (p. 156).

Content-based issues in the retrieval of images are the current focus of at least twenty research groups (Gupta & Jain, 1997). Early research by Rorvig (1990) suggests that users presented with an image do not require textual descriptions. Content research includes systems that automatically identify and extract one or more of the following image attributes: color, shape, texture, spatial similarity, and text contained in an image. For example, Lunin (1994) presents a solid case for the use of texture for automated retrieval of fabric designs. Gupta and Jain (1997) give a detailed discussion of the capabilities of content-based image retrieval systems. Gudivada and Raghavan (1995) provide an excellent overview of the capabilities of content-based image retrieval systems. Examples include Query by Image Content (QBIC), ART MUSEUM using Query by Visual Example (QVE), CORE, the Chabot project from UC Berkeley, Virage for multimedia management, and Photobook. In addition to the work being conducted with still images, Goodrum (1997) and Turner (1995) have both looked at automatic indexing for video and moving images. Recently, Turner (1998) used closed-captioning as a source for index terms in the retrieval of moving images.

Of course, there are problems with content-based retrieval systems. For example, a search using the AltaVista search engine (which uses Virage for image retrieval) and limited to photos with the search term “Homer,” retrieves two busts of the Greek Homer, six photos of Homer Simpson, a photo of a Winslow Homer painting, and so on. Most interesting is that when you click on “visually similar images” under a photo of a bust of the Greek Homer, the returns include many curious and questionable images but no other bronze busts or images of Homer.

The last category of problems in the retrieval of digital images deals with relativity issues. Relativity includes problems surrounding the aboutness of an image. Queries that deal with thematic and iconographical concepts or ask “Why is?” are particularly difficult to address in automated image retrieval systems. Shatford (1986) clearly interprets Panofsky’s theory of meaning. Shatford distinguishes Panofsky’s factual and expressional meaning as determining what the picture is of and what it is about. She concludes that, at the iconographical level, an image “cannot be indexed with any degree of consistency” (p. 45).

There are a number of user-centered approaches focused on query analysis and image retrieval tasks presented by Enser (1995), Hastings (1995), Jorgensen (1996), and Keister (1994). More work on user needs and query types in content-based retrieval is needed. Armitage and Enser (1997) continue their work with an additional collection of user queries and a suggested matrix for classification of the query terms based on Panofsky’s categories.

Based on the Jorgensen finding that category use may depend on the task in which a user is involved, Fidel (1997) questions “should the design and evaluation of image databases be guided by the tasks involved in image retrieval?” (p. 186). Using Jorgensen’s attribute classes, Fidel analyzed 100 actual requests from an agency with a large collection of stock photos similar to the one in Enser’s study. Fidel refines the question to whether performance measurements should apply to all retrieval tasks or “does each task require its own measurement?” (p. 186). The summary of searching-behavior characteristics is presented in the categories of data pole and object pole. In the data pole, images provide information, and relevance criteria can often be determined ahead of time. In the object pole, images are objects, relevance criteria are invoked when viewing the images, and browsing the whole answer set is required. Fidel concludes that, for the image-retrieval tasks analyzed in the study, “precision and recall as used for text retrieval might not be adequate tests in image retrieval” (p. 198).

O’Connor (in press) focuses on the users and uses to circumvent some of the difficulties in describing images in words. User generation of captions and verbal responses are gathered from a collection of 300 diverse images. The role of user feedback is highlighted in the belief that indexing must have an active functional quality to be effective (O’Connor, 1994). In addition, O’Connor is investigating the ability of people to rapidly browse many images without the constraints of categorizations. In this “show-me-the-pictures” approach lies great promise for increased retrieval effectiveness. Combined with user-supplied functional captions and responses, some of the problems and challenges inherent in the relativity category of image retrieval may be met.

However, the major problem of intellectual access to digitized images in a networked environment remains largely unsolved (Mostafa, 1994; Rasmussen, 1997). Reliable measures for evaluating image retrieval systems need to be developed or revised from text retrieval methods. We do know that providing surrogate or thumbnail representations of an image for browsing greatly improves access to a collection (Besser, 1990), but we are still unsure when and how to match the need to browse with the retrieval task or query.

Cawkell (1992) points out that co-citation patterns reveal very little communication and collaboration between the content-based and concept-based researchers. Unfortunately, this remains a difficult obstacle in the design and testing of image retrieval systems.


In a previous study of intellectual access to digital art images, all aspects of search and retrieval in an art image database were analyzed (Hastings, 1994). The study investigated how variations in the retrieval parameters and access points affected the queries by art historians when they conduct research using an art image database. Access points include existing information about the collection such as artist, title, provenance, and suggestions from participants for additional access points. Categories of query complexity were compared to image complexities. The current study compares the findings from identified user queries, user-supplied access terms, and retrieval tasks on the Web to previous findings.

For the purposes of the current study, “intellectual access” is defined as the image searcher’s ability to find and use (retrieve) the image that meets a stated need. A “query” consists of either a stated need or an expression of intended use. “Image” is used to represent a surrogate representation of a real painting. The following research questions frame the study:

1. Are there categories of queries that can be met by thumbnail (small surrogate) images?

2. Is there a relationship between queries and manipulation of images?

3. Do queries contain indicators to access points used for the retrieval of images?

4. Are there identifiable categories of images that increase the ability to browse a collection of images?

5. Are there identifiable image manipulations that need to be added to satisfy queries in the networked database of images?


The population of this study is image searchers on the Web. The subset of the population for this study is students in the School of Library and Information Science and the School of Visual Arts at the University of North Texas and members of the Image-L listserv. The selection of the sample within this population subset is based on subject interest (Caribbean paintings) and willingness of the subjects to participate in the study. It must be noted that the sample is self-selected, and sometimes it is not possible to match online survey data with interview data.

The Collection

The images used for this study are of paintings in the Bryant West Indies Collection housed in the Special Collections Department at the Main Library, University of Central Florida. There are sixty-six Caribbean paintings with a special focus on Haitian art. The collection contains paintings acquired from 1965 through 1990. Images of the paintings are stored on a Kodak Photo CD and are the property of the researcher. The images and thumbnails of the paintings are available in JPEG format at the University of North Texas Web site (http://www.unt.edu/Bryantart).


The summer 1997 indexing and abstracting class at the School of Library and Information Science constructed a database of index fields for the digital images of the Bryant Collection of Caribbean Art. Each image record contains a unique image identifier (code), a corresponding thumbnail, and information for each index field. The fields include artist name, working title, index terms, abstracts, dimensions, and assigned categories for content and style. The user can view a high-resolution image of the painting by clicking on the thumbnail from the database template. Thumbnail images are available for browsing by random order (see Figure 1) and by categories of content or style. The project team assigned the categories of content and style.


The index is assembled from controlled vocabularies and terms applied by the project team. The index includes thesaurus terms and is hypertext-linked to the thumbnail templates. In order to collect user-defined terms, a note form is included on each thumbnail template for searchers to add their own terms (see Figure 2). In addition, users are asked to rate the assigned index terms.


A user survey is available online and responses are sent to an e-mail account. T. J. Russell, research assistant, designed the Web pages. Russell conducted all pilot tests and contributed an integral part to the project. The introductory page for the project is represented in Figure 3. Survey and user-supplied data from approximately 200 responses are used for the preliminary analysis reported below. Additional data are currently being collected. Analysis is an ongoing process, and the preliminary results reported here will be expanded.


Data Analysis

The data are being analyzed in three stages. First, the preliminary data from the online surveys and query statements are categorized and classified. The data are arranged in tables by query type. When possible, interview data are matched to each query, access points suggested, and image(s) used.

The second stage of analysis ranks user responses to existing index terms and looks for patterns in the searches for images on the Web. These patterns are derived from the tables produced in the first stage of data analysis. Relationships are noted for associations between query type and (1) display of the images; (2) access points or combinations of access points; and (3) stated requests for manipulations. The data are examined for patterns of variation.

The third stage of analysis compares the current data to previously collected data from a study of intellectual access to digital art images. Assertions were discovered from the analysis of the data and concepts were formed. The following concepts listed in Table 1 were developed from the assertions to describe the process of searching and retrieving digitized art images:

1. There are types and levels of queries used by art historians for searching photographic and digital art images.

2. The queries of art historians change when searching digital images. They become more complex, and they build on retrieved answer sets to create new queries.

3. There are computer functions needed for different levels of queries.

4. There is a relationship among level of query, access points, and computer manipulations for intellectual access to art images.

5. Some level one queries (see Table 1) can be answered without images.

6. Some level four queries (see Table 1) cannot be answered by the image or with primary textual information. Secondary subject resources are needed.

7. Digital images provide browse-style searchers with more opportunity to winnow for relevant retrieval sets.

8. Images can be described by level of complexity based on the analysis of color, composition, complexity, contrast, perspective, proportion, and style.

9. Queries of style retrieved more complex images.


Levels of Complexity Queries Access Points

Level 1: Includes identi- Includes text

Least Complex fication queries fields and

for who, where, image in gen-

when eral

Level 2: For queries of Includes

Complex the type What sorted text

are?”–requires information

sorting of the and images

text informa-

tion in the

answer set

Level 3: Includes Includes

More Complex queries of style, style, key-

subject, how, words, and

and ID of complex

objects or act- images


Level 4: Includes Includes style

Most Complex queries for and subject

meaning, sub-

ject, and why


Levels of Complexity Manipulations

Level 1: Use of search,

Least Complex sort, and display

Level 2: Use of search,

Complex select, sort,

display, and enlarge

Level 3: Use of compare,

More Complex enlarge, mark,

resolution, and style

Level 4: Use of style &

Most Complex subject searches

plus access to full-text secondary

subject resources

Table I lists the major components of intellectual access identified in the analysis of the study data by level of query complexity. Level one represents the least complex query level and level four represents the most complex. The table explains how the discovered concepts depend on complexity of the query and are linked to access points, computer manipulations, and traits of the image.

The previously defined categories showed a direct correlation between type of query and index access points and between type of query and complexity of image (Hastings, 1995). The results of the comparison to current data collected from the Web are discussed in the following section.


The major difference in the. data collected on the Web compared to previous data is the lack of ability to manipulate the images to meet the stated need in the query. Query categories for the Web searches fit into two categories. The first category is a combination of levels 1 and 2 (see Table 1) from the previous study Almost 60 percent of the queries collected asked for identification of the artist, activities, or place. The remaining 40 percent of the queries asked something about the subject of the painting, especially if the painting included voodoo ritual symbologies. This may change as we continue to collect and analyze data.

We are not able to compare computer manipulations or access points used at this time. Queries requiring a manipulation of the image to provide the answer could not be answered because the ability to compare images in sets and zoom-in or enlarge sections of the paintings was not possible.

The original research questions used to frame the current study are listed below with the findings we can support at this time:

1. Are there categories of queries that can be met by thumbnail (small surrogate) images? Almost 60 percent of the queries collected were answered with the use of thumbnail images. In the next stage of analysis, we will look at whether browsing the thumbnails could have answered the queries.

2. Is there a relationship between queries and manipulation of images? Several queries requested that portions of each image in a retrieved set be enlarged and compared on the same screen. The requested manipulations of the images were not available in this first set.

3. Do queries contain indicators to the access points used in retrieving needed images? For the queries that used text search terms, most of them appeared to have used the index and thesaurus as a guide in the formulation of the query.

4. Are there identifiable categories of images that increase the ability to browse a collection of images? The majority of users in the current set of data used the browse by category option, but it is unknown if that was from curiosity about the categories or from a relationship between their queries and the available categories. We do know from the survey and interview data that users suggest their own categories for sorting images for browsing and seemed to prefer the random categorization of images.

5. Are there identifiable image manipulations that need to be added to meet queries in the networked database of images? User notes from the online survey and interviews indicate that users need to be able to compare images, form images into sets for comparison, and have the ability to zoom-in or enlarge sections of the images.


The purpose of this study is to investigate how people query and retrieve digital art images on the Web. The study provides new information about the retrieval of images in a distributed network environment. However, there are also several problem areas discovered in this attempt to collect data from the Web. The very nature of the Web complicates the attempt to study how people access and use images because it is difficult to correlate online survey data with interview data. It is also difficult to separate duplicate responses. The Web environment presses the issue of testing because it continues to develop without waiting for the results from scholarly inquiry. Despite the complexities and lack of control over the environment, we are able to present three findings based on the data analysis.

We now know that browsing, manipulation of the images, and need for user interaction are important aspects of the search for images on the Web. As discussed in the implications section above, the capabilities to zoom-in on, enlarge, and group the images were not available on the Web. Image searchers on the Web need the additional capabilities that such software offers. For example, users with queries about the style of a painting often want to zoom-in on, and enlarge, an area to study color or brush strokes. Queries from the “compare” category need to be able to group different sets of images for comparison. It is especially important for users to be able to move and manipulate high-resolution images, not just the thumbnails. The conclusion is that the more complex the query, the more options for manipulation are required.

The responses collected from the survey form indicate the need for users to add their own descriptors and index terms in the search process. The application of relevance feedback mechanisms needs to dramatically improve. As we continue to collect and analyze the terms supplied by the users of the Caribbean art images, we will look for patterns or relationships between the supplied terms and the query.

The ability to browse the images becomes even more important on the Web. Thumbnail surrogates, as representations of the high-resolution image, are used as access points. However, thumbnails as surrogates present their own problems. Automatic extractions often capture only part of the high-resolution image, and there is little control over what part is used. We need to look at the importance of thumbnail categories to aid browsing. So far, there are more users of the random browse category than the supplied categories of content and style. It is important that users have the capability of applying their own categories for sorting and browsing. It may be that there are indicators in a query that system designers can use to supply possible categories.

Finally, the whole problem of “relativity” or queries of “why” is largely unsolved. We are finding some attempts by users to add dimensions of their own knowledge to the subject of a painting–especially for queries about meaning in the paintings, such as voodoo rituals. It is this role for user feedback that brought on the discussion of what is needed to effectively evaluate an image retrieval system.


Based on the work of the researchers mentioned in the background section of this article and the preliminary results of the current study, a combination of methods for evaluation of image retrieval systems are suggested in Table 2.


Query or Retrieval or Evaluation

Retrieval Task Search Tools Method

Identification of Index text User & relevance

known item and fields feedback

or image Browse images Relevant? Yes or No

Measures of time &


Identification of Select & display User supplied

unknown item (s) in sets of images terms &

image and/or index Sort sets categories

Enlarge for browsing

Survey form

Online user

feedback mechanisms

Measures of time

& effort

Investigations of style Content-based Log analysis

and image content retrieval tools Screen captures

such as color, Survey form

texture, shape,

and so on

Queries asking “why” Random browsing Amount of

and investigations and extensive user effort

for “aboutness” answer set Observation

displays of browsing

behavior and answer

set development

May require Capture retrieved

secondary sets and

resources–e.g., compare to

biographical and query/task



The important questions that arise from the suggested framework are:

* How and when are user feedback mechanisms that include opportunities for user knowledge added to the database?

* What is the nature of browsing in an image database and what types of flexibility need to be inherent in the system?

* What types of manipulation of the images are needed and when? and finally,

* How does user interaction and feedback improve the retrieval of images?


Armitage, L. H., & Enser, P. G. B. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287-299.

Barnett, P.J., & Petersen, T. (1989). Subject analysis and AAT/MARC implementation. Art Documentation, 8(4), 171-190.

Besser, H. (1990). Visual access to visual images: The UC Berkeley image database project. Library Trends, 38(4), 787-798.

Besser, H., & Trant, J. (1995). Introduction to imaging: Issues in constructing an image database. Santa Monica, CA: The Getty Art History Information Program.

Busch, J. A. (1992). Overview of art information endeavors. Bulletin of the American Society for Information Science, 18, 8-13.

Cawkell, A. E. (1992). Selected aspects of image processing and management: Review and future prospects. Journal of Information Science, 18(3), 179-192.

Dunlop, M. D., & VanRijsbergen, C. J. (1993). Hypermedia and free text retrieval. Information Processing & Management, 29(3), 287-298.

Enser, P. G. B. (1995). Pictorial information retrieval. Journal of Documentation, 51(2), 126-170.

Fidel, R. (1997). The image retrieval task: Implications for the design and evaluation of image databases. New Review of Hypermedia and Multimedia, 3, 181-199.

Goodrum, A. (1997). Evaluation of text-based and image-based representations for moving image documents. Unpublished doctoral dissertation, University of North Texas, Denton.

Gudivada, V. N., & Raghavan, V. V. (1995). Content-based image retrieval systems. Computer, 28(9), 18-22.

Gupta, A.; Santini, S.; & Jain, R. (1997). In search of information in visual media. Communications of the ACM, 40(12), 34-42.

Hastings, S. K. (1994). An exploratory study of intellectual access to digitized art images. Unpublished doctoral dissertation, Florida State University, Tallahassee.

Hastings, S. K. (1995). Query categories in a study of intellectual access to digitized art images. In T. Kinney (Ed.), ASIS ’95 (Proceedings of the 58th annual meeting of the American Society for Information Science, October 9-12, 1995, Chicago, IL) (pp. 3-8). Medford, NJ: American Society for Information Science.

Jorgensen, C. (1996). Indexing images: Testing an image description template. In P. Solomon (Ed.), ASIS ’96 (Proceedings of the 59th annual meeting of the American Society for Information Science, October 21-24, 1996, Baltimore, MD) (pp. 209-213). Medford, NJ: American Society for Information Science.

Keister, L. H. (1994). User types and queries: Impact on image access systems. In R. Fidel, T. Bellardo Hahn, E. M. Rasmussen, & P.J. Smith (Eds.), Challenges in indexing electronic text and images (pp. 7-22). Medford, NJ: Learned Information.

Layne, S. S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging & Classification Quarterly, 6(3), 39-62.

Lunin, L. (1994). Analyzing art objects for an image database. In R. Fidel, T. Bellardo Hahn, E. M. Rasmussen, & P.J. Smith (Eds.), Challenges in indexing electronic text and images (pp. 57-72). Medford, NJ: Learned Information.

Lynch, C.A. (1991). The technologies of electronic imaging. Journal of the American Society for Information Science, 42(8), 578-585.

Moen, W. E. (1998). Accessing distributed cultural heritage information. Communications of the ACM, 41(4), 44-48.

Mostafa, J. (1994). Digital image representation and access. Annual Review of Information Science and Technology, 29, 91-135.

Panofsky, E. (1955). Meaning in the visual arts: Papers in and on art history. Garden City, NY: Doubleday.

Rasmussen, E. M. (1997). Indexing images. Annual Review of Information Science and Technology, 32, 169-196.

Rorvig, M. E. (1990). Intellectual access to graphic information (issue theme). Library Trends, 38(4), 639-815.

Turner, J. (1995). Comparing user-assigned terms with indexer-assigned terms for storage and retrieval of moving images: Research results. In T. Kinney (Ed.), ASIS ’95 (Proceedings of the 58th annual meeting of the American Society for Information Science, October 9-12, 1995, Chicago, IL) (pp. 9-12). Medford, NJ: American Society for Information Science.


Enser, P. G. B. (1993). Query analysis in a visual information retrieval context. Journal of Document and Text Management, 1 (1), 25-52.

Layne, S. S. (1994). Some issues in the indexing of images. Journal of the American Society for Information Science, 45(8), 583-588.

Markey, K. (1986). Subject access to visual resources collections: A model for computer construction of thematic catalogs. New York: Greenwood Press.

O’Connor, B. C. (1996). Explorations in indexing and abstracting: Pointing, virtue, and power. Englewood, CO: Libraries Unlimited.

O’Connor, B. C.; O’Connor, M. K.; & Abbas, J. M. (1999). User reactions to access mechanisms: An exploration based on captions for images. Journal of the American Society for Information Science, 50(8), 681-697.

Samantha K. Hastings, School of Library and Information Science, University of North Texas, P. O. Box 311068, Denton, TX 76203-1068

LIBRARY TRENDS, Vol. 48, No. 2, Fall 1999, pp. 438-452

SAMANTHA K. HASTINGS is a faculty member in the School of Library and Information Sciences, University of North Texas in Denton. Ms. Hastings teaches a variety of courses including indexing and abstracting and telecommunications. She runs a program of study for digital image managers with the help of a grant from the federal Institute of Museum and Library Services. In addition, the grant funds a study investigating the impact of Web access to the collections at the African American Museum of Art in Dallas, Texas.

COPYRIGHT 1999 University of Illinois at Urbana-Champaign

COPYRIGHT 2000 Gale Group