| Daily we use Internet services and search engines in particular when searching for data. The search results are normally called hits and are provided in the form of a list. The information may consist of web pages, images, information and other types of files. Some search tools also collect data available in databanks or open directories. In comparison to Internet directories which are maintained by human editors, search tools operate automatically or are a mix of human and algorithmic input.
Web search tools function by storing data about a huge number of web pages which they retrieve from the INTERNET. These pages are retrieved by An Internet crawler, also known as a spider. It is an automated Web browser which follows every link it finds. Afterwards the content of each page is analyzed to determine how to index it. Words, for example, are taken from titles, headings and subheadings or special fields called meta tags. Data about web pages are saved and stored in an index catalogue for further use in queries. Some search engines, such as Google, store the whole or part of the source page (differently called a cache) as well as information about web pages, while others, such as AltaVista, store every word of every page they have found. The cached page always holds the actual search text, since it is the one that was actually indexed. So, it can be very helpful when the content of the current page has been changed and the search words are no longer in it.
Once a web user has typed key words in the search field, the tool browse through its catalogue and shows a listing of best-matching web pages according to its parameters, normally with a short summary containing the title of the document and sometimes parts of the text. Some search engines have introduced an advanced feature called proximity search that allows users to define the distance between search terms.
The relevancy of the result set determines the usefulness of a search engine. Since there can be millions of web pages that contain a particular search word or word combination, web pages can be divided into relevant and irrelevant ones. The majority of search engines employ techniques to grade the results to feature the "best" results first.
How a search engine decides which pages are the best matches, and in what arrangement the results should be shown, differs from one engine to another. The methods also change in time, because the use of the Internet changes and new techniques are employed. |