creativelibrarian.com

The Creative Librarian is a hub for matters important to librarians/information scientists of today. There is a definite lean towards electronic issues, however it isn't restricted to only those. Hopefully this site will also be useful for informing non-librarians on these issues as so many of them affect us all.

How well do search engines index the OA repositories?

Open Access News

Frank McCown and three co-authors, Search Engine Coverage of the OAI-PMH Corpus, IEEE Internet Computing, March/April 2006.

Abstract: The major search engines are competing to index as much of the Web as possible. Having indexed much of the surface Web, search engines are now using a variety of approaches to index the deep Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings, some of which are indexed by search engines and some of which are not. To determine how much of the current OAI-PMH corpus search engines index, we harvested nearly 10M records from 776 OAI-PMH repositories. From these records we extracted 3.3M unique resource identifiers and then conducted searches on samples from this collection. Of this OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN (7%). Twenty-one percent of the resources were not indexed by any of the three search engines.

On one hand, Yahoo!, with the most, still only covers 65% of the OA literature. On the other hand, how much of the proprietary literature do you think it sees?

This entry was posted on Friday, March 10th, 2006 at 11:46 am and is filed under Open Access. You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site.

Comments are closed.