Library 102 Second Class Session    

 
 

Search Engines
 

There are three main characteristics of search engines . First, they index billions of sites whereas directories usually index millions or thousands. Secondly, they are created by indexing software rather than by human selection. Thirdly, they function best when the search query is specific ; they are not very effective for browsing.

A search engine works first by "crawling" the web for new sites or sites that have changed. Crawlers are also referred to as "spiders".
The spider programs turn the new information over to the indexing program which usually indexes every word on the page compiling a database. Next the retrieval program matches words in the database with the search query and decides the relevancy ranking. The relevancy ranking is determined mathematically usually based on such factors as popularity of the site (how many sites link to it), the number of times the search term appears on the page, the proximity of search terms to each other, and where on the page the terms appear. The interface created in HTML ( the search boxes etc.) allows the user to access the data.

Each search engine indexes only a fraction of the web. The sites indexed will vary from one search engine to another, so you may need to try your search in more than one engine. At the time of your search, the search engine is retrieving sites from its index, not the actual page; as a result, there could be pages retrieved that no longer exist.

Typical search engine search options are by phrase, title, URL site, domain, language, date, file type, and Boolean operators.

The University of California at Berkeley Library sponsors an excellent tutorial and table of features for recommended search engines .

 

Search Strategies
 

There is not just one right or wrong way to search the Web, but because the Web is so vast, it is important to find an efficient way to search. That is why a search strategy is important. Some components of a search strategy are as follows:

1. Try to formulate what you are looking for in a complete sentence and identify the key concepts or words in that sentence.

2. Identify concepts or terms that can be searched as a phrase and words that may have synonyms or terms that may need to be more specific.

3. Do a preliminary search and examine your first results. Decide if you need to narrow or broaden your terms and keep examining your results until you find what you want.

4. Use Boolean connectors or advanced search options to refine your search.

 

Search Engines: Google

 

Larry Page and Sergey Brin, doctorial candidates at Stanford University's school of computer science, are the creators of what is now known all over the world as Google. Their academic backgrounds gave them the basis for Google. In academic publishing, citing previously published works gives rank and authority. Annotation refers to description and evaluation of a citation. Rank is given to those publications which contain citations, and to those citations which are cited often themselves. Based on the ideas of citation and annotation, Larry Page called his project BackRub. It consisted of a way to discover links (citations) on the web, store them, and expose who was linking to any site on the Web. The theory is the more citations, the more important the work.
 
But the next important step was to figure out a way to rank who was linking to a site. Page and Sergey Brin, a math prodigy, developed a ranking system with higher ranking for authoritative sites and lower ranking for unimportant ones. They created an algorithm which they called PageRank after Larry Page that could take into account the number of links into a site and the number of links into each of the linking sites. PageRank brought up relevant results where the ordering of follow-up pages was also good.
 
They realized that since PageRank worked by analyzing links , the bigger the Web got, the better the search engine would be.
So as a result they named their invention Google, after googol, the term for the number 1 followed by 100 zeroes. 1
The first version of Google was released on the Stanford web site in August 1996.
Since August 1996,Google has expanded tremendously and provides a wide range of services. To find out more about the services Google offers, click on these links:

 

This page lists available Google services.
 
"Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations. Google Scholar helps you identify the most relevant research across the world of scholarly research."

 

"Finding books with Google Book Search is as easy as finding websites with Google Web Search; just enter the keyword or phrase you're looking for into the Google Book Search box. For example, when you search for "rock climbing" or for a phrase like " one small step for man ," we'll find all the books whose contents match your search terms. Click on a book title and you'll see basic info about the book just like you'd see in a card catalog. You might also see a few snippets - sentences of your search term in context. If a publisher or author has given us permission, you'll see a full page and be able to browse within the book to see more pages. If the book is out of copyright, you'll see a full page and you can page forward or back to see the full book. Clicking on "Search within this book," allows you to perform more searches within the book you've selected. You can click on any of the "Buy this Book" links to go straight to an online bookstore where you can buy the book. In many cases, you can also click "Find this book in a library" to find a local library where you can borrow it."
 
"Google's Image Search is the most comprehensive on the Web, with billions of images indexed and available for viewing. To use Image Search, select the "images" tab or visit http://images.google.com. Enter a query in the image search box, then click on the "Search" button. On the results page, just click the thumbnail to see a larger version of the image, as well as the web page on which the image is located.

The images identified by the Google Image Search service may be protected by copyrights. Although you can locate and access the images through our service, we cannot grant you any rights to use them for any purpose other than viewing them on the web. Accordingly, if you would like to use any images you have found through our service, we advise you to contact the site owner to obtain the requisite permissions.

WARNING: The results you see with this feature may contain mature content. Google considers a number of factors when determining whether an image is relevant to your search request. Because these methods are not entirely foolproof, it's possible some inappropriate pictures may be included among the images you see. (The mature content filter is only available from an English interface.)"

What does copyright mean? Very briefly , the right of an author, artist, publisher (creator of the work) to retain ownership of works and to produce or contract others to produce copies. Copyright is an important consideration in Google Book Search and Google Image Search.

 

 

 _______________________________________________________________________________________________________

1 Battelle, John, The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture .
       ( New York: Portfolio, 2005) 69-77.

 

 

Specialized Search Engines

These engines have a specific focus:

 

Scirus - distinguishes itself from existing search engines by concentrating on scientific content only and by searching both web and journal sources. It enables scientists, students and anyone searching for scientific information to chart and pinpoint data, locate university sites, and find reports and articles in a clutter-free, user-friendly and efficient manner.
Searchedu.com - allows you to limit your searches to specific types of internet sites such as education , government , military, and e-books.
FedWorld.gov - comprehensive locator for government information.

 

A link for finding specialized search engines: http://dmoz.org/Computers/Internet/Searching/Search_Engines/Specialized/

Meta Search Engines
 

Unlike search engines, meta search engines do not compile their own databases; they send the search simultaneously to the major search engines and return the results from several different search engines. Because meta search engines tend to equalize the search to the lowest common denominator, they are best used for simple searches. Very complex search logic or searches with several words will usually not produce good results in a meta search engine. Also meta search engines usually only retrieve about 10% of any of the results in any one of the search engine databases. 1

Although the University of California at Berkeley Library has created a table of features for meta search engines, UC Berkeley's library does not recommend using meta search engines. This is the link to this table: http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MetaSearch.html#Recommend

Yippy Dogpile SurfWax

________________________________________________________________________________________

 

1 Barker, Joe, "Meta-Search Engines," Finding Information on the Internet: A TUTORIAL , 10 May 2000,
        Library, UC Berkeley, 15 June 2000 <http://www.lib.berkeley.,edu/TeachingLI/Guides/Internet/MetaSearch.htm>

 

 

 

Proprietary Databases
 
 

Al though they may appear to be normal web sites, Proprietary Databases are not. Proprietary means exclusively owned, private. If you have access to searching one of these databases, it is because someone has paid for that privilege. These databases are not searched by web search engines and, therefore, their information does not show up in normal web search results. These databases make up part of what is referred to as The Invisible or Deep Web . The proprietary databases that are available to COS students are linked to the COS Library Web Page under the category: Academic Electronic Databases

 

Proprietary Databases are one part of the invisible web (invisible to search engines) . Usually these databases require a paid subscription. Sometimes the site may be password-protected for members only of an organization; sometimes the site may offer free searching but registration is required; often newspaper sites require registration.
General search engines don't necessarily search file formats such as .PDF, audio, video, or images, so information in these formats also become part of the invisible web.
Database-driven web sites are also part of the invisible web, where the information only becomes available after a search is done at that particular site. Thomas: Legislative Information on the Internet , a database accessing major legislation, the Congressional Record, committee information and historical documents, and The Internet Movie Database are examples of this type of database.

 

The Library site at the University of California at Berkeley offers a more in depth discussion of the Invisible Web: http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html .

 

_________________________________________________________________________________
The above information on the Invisible or Deep Web was taken from the following:

 

Clyde, Anne. "The Invisible Web" Teacher Librarian. April 2002.

 

 

For questions and comments please E-mail ginah@cos.edu

 

 

 

Last Updated: 10/29/2014 8:03 AM