If you’ve spent any time on the Internet, you may have heard a little about spiders, crawlers, and
robots. These little creatures are programs that literally crawl around the Web, cataloging data so that
it can be searched. In the most basic sense all three programs — crawlers, spiders, and robots — are
essentially the same. They all “collect” information about each and every web URL.
This information is then cataloged according to the URL on which they’re located and are stored in
a database. Then, when a user uses a search engine to locate something on the Web, the references
in the database are searched and the search results are returned.
Every search engine contains or is connected to a system of databases, where data about each URL
on the Web (collected by crawlers, spiders, or robots) is stored. These databases are massive storage
areas that contain multiple data points about each URL.
The data might be arranged in any number of different ways, and will be ranked according to a
method of ranking and retrieval that is usually proprietary to the company that owns the search
All of the parts of the search engine are important, but the search algorithm is the cog that makes
everything work. It might be more accurate to say that the search algorithm is the foundation on
which everything else is built. How a search engine works is based on the search algorithm, or the
way that data is discovered by the user.
In very general terms, a search algorithm is a problem-solving procedure that takes a problem, evalu-
ates a number of possible answers, and then returns the solution to that problem. A search algorithm
for a search engine takes the problem (the word or phrase being searched for), sifts through a data-
base that contains cataloged keywords and the URLs those words are related to, and then returns
pages that contain the word or phrase that was searched for, either in the body of the page or in a
URL that points to the page.
This neat little trick is accomplished differently according to the algorithm that’s being used. There are
several classifications of search algorithms, and each search engine uses algorithms that are slightly
different. That’s why a search for one word or phrase will yield different results from different search
engines. Some of the most common types of search algorithms include the following:
A list search algorithm searches through specified data looking for a single
key. The data is searched in a very linear, list-style method. The result of a list search is
usually a single element, which means that searching through billions of web sites could
be very time-consuming, but would yield a smaller search result.
Envision a tree in your mind. Now, examine that tree either from the roots out
or from the leaves in. This is how a tree search algorithm works. The algorithm searches a
data set from the broadest to the most narrow, or from the most narrow to the broadest.
Data sets are like trees; a single piece of data can branch to many other pieces of data, and
01 1 9:30 8