Relevance Evaluation

Relevance Evaluation

Relevance Evaluation measures accuracy, effectiveness, efficiency, and usefulness of the search engine. In the prototype, the relevance criteria measure how much relevant the retrieved websites are to the query.

Methodology

In the prototype, Relevance is measured by how many words are matched with the query in retrieved websites. This method is used by Google Rankings Ultimate SEO Tool as keyword density (SEO Tools – Keyword Density). That is, the relevance of the CUPE criteria bases on Keyword Density. Keyword density may be not appropriate for websites that have many images and sounds instead of words. However, keyword density is generally an efficient method to assess relevance of obtain results and how much obtained websites are related to the query. The keyword density and relevance rate are calculated like the below.

Keyword Density, ki = (number of the query word in a retrieved web page / total number of words in the web page) × 100

Equation 5: Keyword Density

Average Relevance Rate = (∑ ki ) / n,  ki = keyword density of a webpage, n = total number of the retrieved websites

Equation 6: Relevance Rate

Outline of the Relevance evaluation

With the Relevance Rate equation, therefore, overall method to measure relevance in the paper is as follows:

  • First, the queries are created from the words in the home page of each digital library, filtering out common stop words (e.g., symbols, prepositions, and other unnecessary words). The queries are relevant closely to the subject area and contents of a digital library.
  • Next, each query in a digital library is inputted as a search query in its search engine.
  • Then, the relevance of the query results is measured for each retrieved website by keyword density with Equation 5 and 6. Thus, as much sum of Keyword Density is high, the Relevance rate is also high.
The detail with the designed computer program

To measure relevance of search results, the program that was used to measure the search response times is modified in order to measure also relevance from the retrieved results of queries. The queries that are used in measuring search response time are also used to measure relevance, too.

While some digital libraries have few queries that are retrieved from title and paragraphs in a home page of the digital library, some others have many queries. Also, while some digital libraries retrieve few search results for the related websites, some other digital libraries provide many retrieved websites. For example, if in the program, the query, “welfare” is inputted in the search engine of THOMAS digital library in the political science and law subject area, 189 websites are retrieved for the query.

The Relevance program, thus, measures the relevance with limitations. Only thirty retrieved websites by priority are evaluated to measure relevance, because it takes very long time to investigate all retrieved websites for all queries. But, all chosen queries are used to measure relevance for each digital library. The keyword density of each retrieved website for each query is measured with the words of title and paragraphs from the source code of the retrieved website. An average relevance rate is calculated for each candidate digital library. Lastly, the averages are scored into 5-point scale method like other evaluations.

Results of Relevance Evaluation

As a result, about 13% digital libraries of the candidate sixty two digital libraries show good relevance performance. U.S. Department of Health & Human Services HHS.gov digital library shows the highest relevance in the relevance evaluations with 149 queries. On the other hand, about 35% digital libraries show the lowest relevance with queries.

Analyses of the Relevance Evaluation

In the experiment, since we investigate sixty two digital libraries in fifteen subject domains, queries could not be settled and sampled for all subject domains and all digital libraries. To differentiate each digital library in its subject domain, all words in the title and paragraphs are investigated as queries. The words are recruited from the source codes of a home page in each digital library. Numbers of queries vary from 0 to 634. Art History Resources on the Web digital library has the most many queries, 634 queries. Thus, it took very long time to simulate the Relevance program. But, it is decided that more queries would give more accurate relevance results.

Limitations 

Nevertheless, the relevance evaluation has limitations. That is, two words combinations as queries are not applied to measure the relevance. Synonyms of queries also are not investigated to count matching words. The digital libraries that have merely images were difficult to get good relevance by the Relevance program, because it measures only keyword density. Science Photo Library is one of the cases.

Difficulties

There are some difficulties in executing the relevance program. Digital libraries do not use a unified standard form for urls for search engines. It was not simple to evaluate all sixty two digital libraries in a program embracing many different search urls and methods. Three digital libraries cannot be evaluated, because their search urls will not accept new queries. To illustrate, Online Medieval & Classical Library uses google search and java script without using a search url. Africa Focus: Sights and Sounds of a Continent University of Wisconsin Digital Collections uses id numbers like, ‘http://digicoll.library.wisc.edu/WebZ/SearchOrBrowse?sessionid=01-57990-112842062’, instead of using queries. The id numbers are unknown. And, Library of Congress: American History & Culture’s url is a little bit vague not being able to be used in the Relevance program. Although three digital libraries provide search engines so that users can retrieve information from their websites, they cannot be evaluated by the Relevance program. Digital Past digital library cannot be evaluated, because it has broken links.

Analysis based on Scores of Each Digital library

According to the Relevance Rates, 13% digital libraries of sixty two digital libraries show good relevance rates (more than 1.5). But, 40% digital libraries show very low relevance rates (less than 1), including the digital libraries that cannot be evaluated by not being able to access for search engines or search urls. In the 40% group, 35% digital libraries show more than 0.4 relevance rates. As a result, overall relevance rates are much lower than other evaluation results.

relevancerate

Figure 1. Relevance Rates, based on the scores

Analysis based on Each Subject Domain

Overall, the averages of relevance rates in each subject domain are low as Figure 2 shows. Geography and Military Science subject domains’ digital libraries show better relevance rates than other subject domains. Music subject domain’s digital libraries show the lowest relevance rates. But, we should recognize that they provide music sheets that do not include texts. The relevance program examined only texts. Thus, we cannot conclude that the digital libraries in the music subject domain do not provide enough good relevance by the result of the program.

averagerelevance

Figure 2. Averages of Relevance Rates based on Subject Domains

*More details are in the paper, Chapter VI. Performance Evaluation, Chapter 2. Relevance Evaluation. This website and the paper are developed by the same person.

Comments are closed.