Searching the Internet
Nowadays most people don’t need reminding about how important an information source the Internet is. But it’s essential to search on the Internet effectively and thoughtfully; and to look at what search engines are and how they work. Good search techniques are important here, too.
Most of you will be familiar with search engines. Search engines are only one tool you can use to find websites – subject directories and the invisible web are another. This workshop will look at the differences between these tools.
Search Engines
What is a search engine?
A Search engine is an index of websites. Search engines are created by software called `spiders' or `web crawlers'. They randomly select web pages to include in the index. Search engines are larger than Subject Directories, but still only cover a small proportion of the Internet. They generally cannot index information held in databases contained on web pages.
How to use a search engine
Search engines can be searched by entering keywords. Be very specific when using a search engine as they index hundreds of millions or even billions of websites. The more words you search for, the less results you will get.
Some handy search hints to remember are:
- Search engines will find results containing all of the words you type.
- You will often need to use "quotation marks" around a phrase to find exact matches.
- The advanced search offers more specific search options. This may include searching particular domains (e.g. .edu) or websites, using boolean expressions, searching for particular types of files, dates, languages and more.
Each Search engine works in a different way. Check the Help section or Search Tips to find the most effective way to search a particular search engine.
A list of search engines can be found at CDU’s Computing, Internet and Search Engines page. For the purposes of this session we will be using Google.
Advantages and disadvantages of using search engines
ADVANTAGES |
DISADVANTAGES |
| Indexes for search engines can be very large because they are created by software. | Sites are randomly selected and indexed for keywords by software. They are not evaluated for quality or relevance. |
| Search by keywords | Need to be very specific when selecting keywords as there are so many web pages. |
| Search engines are much larger than subject directories. | Search can result in far too many results to check. |
| Large number of results for a search. | Results may not be arranged in relevant order. See section on ranking for more information. You may find the most useful site at the end of the list. You may also find irrelevant sites. |
Subject Gateways and Directories
Another good way of locating useful information is to use a subject gateway. Around the world higher education institutions and other bodies, for example national libraries, have created what are called “subject gateways”. Very often these are more than lists of links, they may also include evaluation of the sites content and usefulness and may be updated on a regular basis.
A Subject Directory is an index of websites arranged by topic. Sites included in the index are selected and evaluated by people according to specific selection criteria.
Subject directories work on a hierarchical menu system, from large subject area to precise topic. Many subject directories also include a search service to find keywords.
Subject directories are most useful for finding general topics because they are usually collections of the author's recommended websites on a topic. If you are looking for a very specific topic, use a subject specific directory or a search engine instead.
Subject Gateways and Portals have the same features and structures as Subject Directories. They are designed to concentrate on a particular discipline or subject rather than on many subjects.
Some examples of these are SOSIG for Social Sciences, EEVL for Engineering. A directory of subject gateways may be found at Pinakes. Whilst you can use more familiar directory sites such as Yahoo, these may not be as useful as the more formal gateways.
A list of search engines can be found at CDU’s Computing, Internet and Search Engines page.
Advantages and disadvantages of subject directories
| ADVANTAGES | DISADVANTAGES |
|---|---|
Web sites are selected and evaluated by people who are often experts on that subject. |
Are small collections due to the fact that the process of selecting and evaluating sites is labour intensive - and may be subjective and biased. They also may not be kept up to date and may not contain all of the relevant sites on a topic. |
Can be browsed from a main subject area, making it easier for people who may be unfamiliar with a topic. |
It may be difficult to work out which main subject will contain the topic you want. |
Websites on similar topics are grouped together by subject in the directory. |
Sites may be classified by different subjects in different directories. There is no standard subject groupings. |
Reduce irrelevant search results, as subject directories are often fairly small collections of the author's recommended sites on a topic. |
Searching the directory only searches the information contained in records in the directory. Records contain information such as the page title, subject areas and a description of the site. Pages within websites are not often included. |
Source: Sherman, Chris and Price, Gary (2001) The Invisible Web: uncovering information sources the search engines can't see, CyberAge Books, Medford, N.J.
How to use a subject gateway
To use a subject directory think broadly and laterally - which general area is your topic going to be placed in. You can also use a keyword search if there is a search facility available.
Librarians Index to the Internet is an example of subject directory.

Subjects are arranged by broad categories. Sub topics are listed underneath each main topic. Choose the main topic to see more sub topics.
e.g. looking for stem cell research your could check the main sections for:
- Science
Sub categories in which you may find information include:
- Genetics
- Stem Cells
Invisible Web
The Invisible Web, also called the ‘deep web’, consists of information that is not found by traditional search engines such as Google.
It is made up of information contained in databases, archived materials, and websites that have not been indexed by search engines, information that is protected by firewall security and interactive tools.
Some of these invisible web resources require a subscription to access, but many do not. Once you locate a freely available invisible web resource you can search within it.
Links to invisible web databases are available via CDU’s Computing, Internet and Search Engines links. Some particularly good examples of invisible web databases are:
The Educator’s Reference Desk
Contains over 2000 lesson plans and over 3000 links to education information. Also provides access to the ERIC database – the world’s largest source of information on education research and practice.
Infomine
Infomine is a virtual library of Internet resources relevant to faculty, students and research staff at the university level.
PubMed
Provides access to over 14 million medical citations, including links to full text articles and related resources.
Further References
To find out more about the invisible web, check the invisible web tutorial produced by the University of California's Berkeley Library at (*this link will open in a new window)
You can also find more information in the following title, which is available at the CDU Library:
Sherman, Chris and Price, Gary (2001) The Invisible Web: uncovering information sources the search engines can't see, CyberAge Books, Medford, N.J. Call Number 025.04 SHER
How to find information in the invisible web
Invisible Web Directories provide links to invisible sources of information, arranged by subject area. These subject areas can be very broad, so you may need to think broadly and laterally to find relevant sources.
You will usually not find a webpage that directly answers your question by using an invisible web directory. Instead, you will find a list of other searchable internet sources, such as databases, that should provide useful information for your topic.
Advantages and disadvantages of using the invisible web
ADVANTAGES |
DISADVANTAGES |
| Invisible web resources specialise in particular subject areas, therefore you will get more comprehensive and relevant results. | Not all invisible web resources may be listed on an invisible web directory, so you may still not be finding relevant resources. Tip - use a search engine to find invisible web resources by searching for general keywords with the term 'database', or use a subject directory as many resources they index are from the invisible web. |
| Search interfaces on invisible web resources are designed for the type of search you are doing. eg searching the phonebook will find phone numbers more successfully than a search engine. |
Can be difficult to know which invisible web resource to select to search from an invisible web directory. |
| Invisible web resources are often created by organisations and institutions that are authorities on the subjects they cover, therefore are often more reliable sources. | Information on the invisible web needs to be evaluated as carefully as not all invisible web sources are credible. |
| A lot of information that is on the invisible web cannot be found elsewhere on the web. | Information may change quickly and become unavailable, or may become part of the visible web. |
Source: Sherman, Chris and Price, Gary (2001) The Invisible Web: uncovering information sources the search engines can't see, Information Today Inc, New Jersey
How to use the invisible web
Example: Pub Med
PubMed is a an example of search engine for the invisible web.

Using our example from the evaluation of information sources you can find materials on “stem cell research” that will not be found using a traditional search engine like google.
Reflection
We must be aware that, in practice, search engines, web directories, portals, etc – however helpful they are for many purposes – do not function according to some kind of pure information science. Various commercial, political or other interests may well come into play in the way they display search results, carry advertisements, have “sponsored links” not always properly identified, include or exclude websites according to some policy or hidden bias, etc. Also, in order to gain more influence and/or advertising revenue, some of them tend to exaggerate their strengths and capabilities and play down their weaknesses or limitations…