creativelibrarian.com

The Creative Librarian is a hub for matters important to librarians/information scientists of today. There is a definite lean towards electronic issues, however it isn’t restricted to only those. Hopefully this site will also be useful for informing non-librarians on these issues as so many of them affect us all.

How well do search engines index the OA repositories?

Open Access News

Frank McCown and three co-authors, Search Engine Coverage of the OAI-PMH Corpus, IEEE Internet Computing, March/April 2006.

Abstract: The major search engines are competing to index as much of the Web as possible. Having indexed much of the surface Web, search engines are now using a variety of approaches to index the deep Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings, some of which are indexed by search engines and some of which are not. To determine how much of the current OAI-PMH corpus search engines index, we harvested nearly 10M records from 776 OAI-PMH repositories. From these records we extracted 3.3M unique resource identifiers and then conducted searches on samples from this collection. Of this OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN (7%). Twenty-one percent of the resources were not indexed by any of the three search engines.

On one hand, Yahoo!, with the most, still only covers 65% of the OA literature. On the other hand, how much of the proprietary literature do you think it sees?

This entry was posted on Friday, March 10th, 2006 at 11:46 am and is filed under Open Access. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.