search engines details
The information on the page is largely out of date.
it is being left here for historical reasons for
another few months, after which it will be deleted.
AlltheWeb
- AltaVista
- AOL
- Ask Jeeves
- dmoz (The Open Directory)
- FAST
- Google
Gigablast
- Inktomi
- LookSmart
- MSN
- Netscape
- Overture
- Teoma
- WiseNut
- Yahoo
booleans
- capitalisation
- contents
- dynamic content
- frames
- inclusion / exclusion
load time
- meta tags (keywords / descriptions)
- meta tags (robots)
- plurals
- spam
spell checks
- stemming
- stopwords
- truncation
- word frequency
search engine info links
- middle of page
- bottom of page
***************************************
ALL THE WEB
likely to be using Yahoo search engine (Feb '04)
offers a "convert" facility (Jul '03)
indexes Excel and Powerpoint files (Jul '03)
powered by FAST (FAST offers paid inclusion) (Jul '03)
owned by Yahoo (Jul '03)
suggests alternative spellings if search query appears misspelled (Jul '03)
supplies results to Terra Lycos in most countries (Jul '03)
can use search field as calculator with functions: * + / - ( ) ^ (May '03)
offers "URL Investigation" - enter URL and get lots of info (Mar '03)
uses proximity of search terms when ranking (Jan '03)
finds Word docs, PDF files & Flash (filetype:msword / pdf / flash) (Jan '03)
skins available (Jan '03)
supports Boolean (AND, OR, and ANDNOT) - set search specification menu to "boolean expression" (Jan '03)
uses "RANK" - gives preferences to search results containing given keywords (Jan '03)
meta description used on results page but doesn't affect rankings (Jan '02)
uses both page content & link analysis to determine relevancy (Sep '01)
ALTAVISTA
likely to be using Yahoo search engine (Feb '04)
owned by Yahoo (Jul '03)
suggests alternative spellings if search query appears misspelled (Jul '03)
case sensitive when quotes are used (also option in advanced search) (Jun '03)
database contains 540M image files + 11M audio/video files (Jun '03)
rumours that paid inclusion pages given preference (Dec '02)
doesn't spider sites on free web hosts (Jul '02)
claims to use over 100 different factors in its ranking algorithm (Jul '02)
600M pages indexed (Apr '02)
associated custom logos/icons/taglines/text links can be purchased (Nov '01)
neither regional nor global databases updated May-October 2001 (Nov '01)
uses "*" as a wildcard - need minimum of 3 chars - eg "sma*" find "small" and "smart" (Oct '01)
uses proximity operator "NEAR" - finds query terms within 10 words of each other (Oct '01)
only indexes the first 100K of a page (Oct '01)
doesn't crawl dynamically allocated URLs but they can be submitted manually (Sep '01)
denies that newly submitted pages initially ranked lower (Sep '01)
uses link popularity - but first takes into account relevancy of external link text (Sep '01)
but newly submitted pages may fare badly until links found to them (Sep '01)
may get different search results at different times of the day (uses different servers) (Aug '01)
directory pages from LookSmart - Looksmart sites also appear in search results (Aug '01)
uses meta description (Aug '01)
"Partner Listings" come from GoTo - (AltaVista's own ads called "Featured Listings") (Aug '01)
indexes framesetting pages, and linked to frames as tho they were separate pages (Aug '01)
ensure important content on high level pages (Aug '01)
crawl level & depth varies, therefore submit important pages separately (Aug '01)
takes 4-6 weeks for pages to appear in listings using Basic Submit (Jun '01)
Express Inclusion allows up to 500 submissions with weekly reindexing (Jun '01)
Express Inclusion - from $39 (1st URL) to $12 (URLs 101-500) per URL for 6 months (Jun '01)
image search (displays images) (May '01)
redirects to local country version (May '01)
limit on number of submitted pages per site per day removed (Apr '01)
uses periods and semi-colons for phrases - see Research Buzz article (Apr '01)
550M pages indexed (Mar '01)
in simple searches, first finds the exact phrase, then all the words (Mar '01)
include keywords in a) domain name, b) file name c) first line of text, d) title (Dec '00)
favours longer pages? (850 words +) (Nov '98) - or 500 max? (Apr '00)
only takes into account two occurences of keywords - excessive repetitions may be penalised (Mar '00)
AOL
AOL recommnded sites + results from Google (Aug '02)
uses paid listings from Google (May '02)
AOL bought Netscape in 1998 (May '02)
"web sites" results from dmoz (updates every 2 weeks) (May '01)
"web pages" results from Inktomi (Feb '01)
has optional popularity search (Sep '00)
ASK JEEVES
owns Excite & iWon (Mar '04)
no paid submission for large clients (1000+ pages) (Mar '04)
owns and uses results from Teoma (Jan '03)
paid listings from Google (Jul '02)
owns and supplies results to Ask.com (Jun '02)
paid submission - $30 1st URL, then $18 (per year) - reindexs every 7 days (Feb '02)
about 5 percent of its knowledgebase is made up of paid inclusion answers (Nov '00)
DMOZ (OPEN DIRECTORY PROJECT)
entry in dmoz no longer boosts Google rankings? (hopefully!) (Mar '09)
allegations of corrupt editors manipulating submissions and entries (Mar '09)
long ago admitted being unable to cope with all the submissions it receives (Feb '09)
need to become an editor to be certain of getting a site included in the directory (Feb '09)
there is a forum for dmoz users at http://www.resource-zone.com/ (Jul '03)
a site can be submitted to both a topical category and a geographical category (Jul '02)
used by AltaVista, Google, HotBot, Netscape, Oingo, Ask Jeeves + 200 other sites (Nov '01)
2.8M entries (Aug '01)
prefer top level domains, but will accept multiple submissions from a domain if unique content (May '01)
keywords need to be in submitted title & description (May '01)
doesn't like dynamic content ("?" in URL) (May '01)
claims 99.75% of its sites are live (May '01)
FAST
indexes Word docs (filetype:msword), PDF files (filetype:pdf), Flash (filetype:flash) (Jan '03)
supplies Hotbot Europe with results (Oct '02)
2 billion pages indexed (Jun '02)
updates 30% of its index every 7 days, the rest every 28 days (Mar '02)
paid inclusions 38 Euros per URL - ensures spidering every 24 hours (Jan '02)
FAST database used by all Lycos Europe sites, Tiscali, T-Online, & Evreka (Jan '02)
doesn't read meta keywords (Jan '02)
powers PepeSearch (Oct '01)
indexes all the text regardless of how big the page (Oct '01)
indexes "stop" words ("and" "the" etc) (Oct '01)
doesn't crawl dynamic links (Sep '01)
uses both page content & link analysis to determine relevancy (Sep '01)
GIGABLAST
powers Clusty, Snap and Blingo (Jan '05)
1 billion pages indexed (Jan '05)
GOOGLE
indexes up to 101K of page text (Mar '04)
no longer supplying Yahoo with search engine results (Feb '04)
now using "stemming" - will find plurals etc (Nov '03)
uses Boolean ".OR." (Jul '03)
translation facility (Jul'03)
doesn't index meta tags or use meta data when ranking (Jul '03)
supplies paid listings to Lycos Europe Tripod sites (Jul '03)
directory data from the Open Directory (Jul '03)
provides backup results for Yahoo search in most countries (Jul '03)
the #1 search engine by queries worldwide (Jun '03)
offers spell check (alternative spellings for possible typos) (Jun '03)
cached versions of spidered pages available (Jun '03)
paid listings to Earthlink, AOL, Ask Jeeves / Teoma, AT&T, InfoSpace, Yahoo Japan (Jun '03)
rumoured to downgrade pages with affiliate links coded with javascript "onmouseover" (May '03)
position of AdWords listings depends on bid amount and the clickthru rate (Jul '03)
AdWords can be geographically targeted (Jul '03)
provides results to AOL CompuServe AOL.com & Netscape (Aug '02)
only indexes the first 100K of a web page (Jul '02)
bans sites which offer incentives for people to link to them (Jun '02)
provides paid Q&A service (May '02)
allows software programs to query Google's database directly via Google Web APIs (May '02)
uses asterisk (*) as a wild word (Jan '02)
index contains 22M pdf files + doc ppt xls rts & ps files (Nov '01)
crawls every 28 days but updates take a little longer (Sep '01)
indexes dynamically generated pages (Sep '01)
page ranking takes into account internal as well as external links (Jun '01)
links from pages that are heavily linked to are ranked more highly (Jun '01)
links from pages containing many links are ranked less highly (Jun '01)
takes into account word proximity when ranking pages re query phrases (Jun '01)
words in title, heading, large fonts & bold are given higher rank consideration (Jun '01)
all pages on a site are ranked individually (Jun '01)
relevance of link text improves ranking (Jun '01)
submitted site not included in index till Google discovers a link to it (Jun '01)
indexes alt text used with graphics (Jun '01)
entering first & last name + zip brings up telephone no - + vice versa (Apr '01)
doesn't allow unauthorised access by meta search engines (Feb '01)
INKTOMI
uses meta description tag (Jul '03)
suggests alternative spellings if search query appears misspelled (Jul '03)
shares results with LookSmart (Jul '03)
owned by Yahoo (Jun '03)
charges $38.95 pa for 1st submitted page and $24.95 for each subsequent page (Jan '03)
indexes meta tags but treat them as less important than page text (Oct '02)
rumoured to only index one page of non-paying sites (??) (Aug '02)
PositionTech offers subscribers free html validation + swopping of submitted URLs (May '02)
Inktomi has its own Map of the Web that includes some 1.7 billion documents - various databases (Mar '02)
Best of Web (BOW) - 110M docs - refresh every 9 days
GEN3 - 500M docs - refresh every 30 days
Eurocluster - 110M docs - refresh every 21 days
Asia Pacific cluster (APAC) - 55M docs - refresh every 21 days
to get into BOW need links from good sites (eg ones already in BOW) (Mar '02)
free submissions via affiliates "Add URL"s are penalised (Mar '02)
paid submissions thru PositionTech, Network Solutions, (US) & WebGravity (Europe) (Mar '02)
sites found via Inktomi spiders ranked more highly (Mar '02)
indexes a limited number of dynamically generated pages per site (Sep '01)
also offers cost (0.25c) per click pricing ("Index Connect") for listing 1,000+ pages (Sep '01)
used by HotBot, LookSmart, NBCi, Overture, iWon, MSN, AOL (articles only), Goo, iAtlas, eSpotting (Jun '01)
automatic proximity searching introduced (Apr '01)
paid inclusion sites reindexed every 2 days and visited weekly for the following year (Mar '01)
incorporates the UltraSeek team (Aug '00)
doesn't index frames, but indexes "noframes" text (Jul '00)
recommends checking partner sites every 2 weeks to make sure submitted url hasn't dropped out (Jul '00)
LOOKSMART
shares results with Inktomi (Jul '03)
owns WiseNut (Jul '03)
charges 0.15 per clickthru for commercial sites (Jun '03)
also used by MSN, Excite AltaVista, IWon, Netscape and 370+ ISPs (Jun '02)
criticised for new business practices re introduction of ppc (May '02)
supplies results to InfoSpace's meta search at Excite WebCrawler & Dogpile (May '02)
owns Zeal and integrates Zeal's contents into the LookSmart directory (Apr '02)
changed from paid submission to ppc for commercial submissions (Apr '02)
ppc costs - $49 setup + minimum $15 per month (Apr '02)
criticised for association with scumware company eZula (Apr '02)
provides main results for MSN (Dec '01)
2.8M sites in directory (Aug '01)
LookSmart won't accept sites containing pornography, gratuitous violence & illegal activities (Jul '01)
directory results first + option to view pages from Inktomi (Jun '01)
manual reviews (quality sites only) - no redirects/mirrors (Jun '01)
request re change to category & description actioned within a week! (May '01)
submissions must not be be irrelevant to the US (Mar '01)
free submissions for registered non-profit sites (reviewed within 8 weeks) (Jun '00)
owned by CNN & Time Warner (May '00)
MSN Search
results from Inktomi and LookSmart (Jul '03)
the #3 search engine by queries worldwide (Jun '03)
they currently outsource all their searches to Inktomi and LookSmart (Jun '03)
ads embedded through a partnership with Overture (Jun '03)
their informal offer to buy Google was refused - now looking to compete (Jun '03)
default MSN search on next version of MSIE? (Jun '03)
top 3 Overture links for selected keywords displayed as "sponsored sites" (Feb '02)
distinguishes between singular & plural query terms (Dec '02)
uses "*" as a truncation symbol (Oct '01)
has the largest database of any Inktomi partner (Oct '01)
owns HotMail and ICQ (Jun '01)
first results from LookSmart, then Inktomi & Direct Hit & reviewed sites, (+ AltaVista?) (Jun '01)
link to top 10 Direct Hit results (Mar '01)
6/15 duplicated entries for my classical tabs page! (Mar '01)
submissions to MSN go to LookSmart (Oct '99) & Inktomi (May '00)
NETSCAPE
secondary results come from Google (Jul '01)
updates from dmoz every 2 weeks (May '01)
ranks dmoz keyword relevancy 1) re dmoz category, and then 2) re sites within that category (Feb '01)
OVERTURE
owned by Yahoo (Jul '03)
supplies paid listings to Yahoo, MSN, HotBot, Lycos Europe (Jul '03)
position in results table delpends solely on amount bid (Jul '03)
"autobidding" feature - charged 0.01 more then the next bid down (Jul '03)
minimum Overture bid USD 0.10 (Jun '03)
TEOMA
paid inclusion (Jul 03)
suggests alternative spellings if search query appears misspelled (Jul '03)
supplies results to, & is owned by Ask Jeeves (Jul '03)
utilizes the Meta keyword tag (Jan '03)
500M pages indexed (Jan '03)
uses "Subject-Specific Popularity" (like PageRank, but subject clustering) (Jan '03)
WISENUT
owned by LookSmart (Jul '03)
relaunched with new web crawl (Oct '02)
1.5B pages indexed (Aug '01)
no Booleans (Aug '01)
YAHOO
indexes pdf & MS docs (Mar '04)
indexes keywords meta tag (Mar '04)
indexes the full text of web pages up to 500K (Mar '04)
still using Google for image search (but for how much longer?) (Mar '04)
now using it's own search engine (not Google's) (Feb '04)
supplementary search results from Google in most countries (Jul '03)
owns Overture, AltaVista, AlltheWeb, Inktomi (Jul '03)
the #2 search engine by queries worldwide (Jun '03)
said to periodically change it's way of ranking results of directory searches (Jun '02)
click thru popularity may affect ranking of directory search results (Jun '02)
$299 annually for 'Directory Listings' service for submitting businesses (Apr '02)
free submission option remains for other areas of the Yahoo.com site (Apr '02)
Yahoo editors are rumoured to use Netscape when reviewing sites (Jun '01)
owns GeoCities and eGroups (Jan '02)
90% of its websites said to be live (May '01)
gives preference to top level domains (Mar '00)
***************************************
AlltheWeb
- AltaVista
- AOL
- Ask Jeeves
- dmoz (The Open Directory)
- FAST
- Google
Gigablast
- Inktomi
- LookSmart
- MSN
- Netscape
- Overture
- Teoma
- WiseNut
- Yahoo
booleans
- capitalisation
- contents
- dynamic content
- frames
- inclusion / exclusion
load time
- meta tags (keywords / descriptions)
- meta tags (robots)
- plurals
- spam
spell checks
- stemming
- stopwords
- truncation
- word frequency
search engine info links
- top of page
- bottom of page
***************************************
BOOLEANS
AltaVista allows good use of Booleans (Feb '02)
AltaVista default changed from "OR" to "AND" Feb '02 (Feb '02)
Google uses OR - default is "AND" (Oct '01)
All The Web default is "AND" (Oct '01)
CAPITALISATION
general rule: usually lower case will find anything, capitals may only find capitals
AltaVista is case sensitive when quotes are used (Jun '03)
AltaVista offers full case-sensitive searching (with advanced/exact phrase searches) (Sep '01)
HotBot and MSN Search will search on mixed-case terms (Sep '01)
the rest treat all query terms as tho they have been typed in lower case (Sep '01)
COMMENTS
indexed by some search engines
not indexed by AltaVista, Google (Jun '02)
DYNAMIC CONTENT
Google explores dynamically allocated URLs and will crawl dynamic pages (Aug '01)
Inktomi will crawl a limited number of dynamic pages (sep '01)
AltaVista won't crawl them, but will accept them if they're manually submitted (Sep '01)
Fast does not crawl dynamic pages (Sep '01)
FRAMES
AltaVista indexes frames as separate pages, therefore pages may need robots meta tag
INCLUSION / EXCLUSION
AltaVista & Google use "+" and "-" for words/phrases in query term to be included / excluded
LOAD TIME
Yayhoo, LookSmart, AltaVista dislike poor site design including slow-loading pages (Jan '02)
META TAGS - KEYWORDS / DESCRIPTION
keywords meta tag now used by Inktomi,Teoma & Yahoo (Mar '04)
keywords meta tag has little effect on rankings (Oct '02)
description meta tag used by AltaVista, Teoma
commas in meta tags usually treated as spaces
don't use same word stem twice running (plurals, different tenses etc)
recognised but not given preference over titles/text in AltaVista
the precedence of sources re descriptions of pages in search engine results is as follows (Nov '01) -
meta tags
All The Web
AltaVista
Direct Hit
Teoma
|
page content
Google
(Teoma)
WiseNut
|
LookSmart
MSN
Overture
|
dmoz
AOL
(Google)
|
Google and Teoma combine info from 2 sources for their descriptions
Inktomi gives precedence to results from LookSmart when supplying data
otherwise Inktomi uses meta tags rather than page content
META TAGS - ROBOTS
not honoured by Teoma? (Sep '01)
PLURALS ("swim" finds the same results as "swims")
the following consider the singular to be different from the plural -
AltaVista, AOL, dmoz, FAST, Google, Inktomi, IWon, LookSmart, Lycos, Yahoo
the following consider the singular to be the same as the plural -
Direct Hit (Hotbot), MSN
therefore best use both singular and plural forms of keywords (Dec '02)
SPAM
invisible text (eg white on white) seen as spam by all (Dec '01)
also beware (eg) white text on black table background on white page
tiny text seen as spam by AltaVista, WebCrawler
redirection <META REFRESH> seen as spam by AltaVista
SPELL CHECKS
AlltheWeb, AltaVista, Google, Inktomi & Teoma suggest alternative spellings (Jul '03)
STEMMING ("swim" find "swimming", "cat" finds "cats" etc)
Google performs stemming (Nov '03)
MSN offer stemming option (Jan '02)
generally stemming can no longer be relied on
STOPWORDS (not indexed)
Google considers "and", "a", "an" "the" & "i" to be stop words.
Inktomi considers "and", "a", "an" and "the" to be stop words
AltaVista considered "and", "a" "the" & "i" to be stop words
Direct Hit considered "and", "a" and "an to be stop words
Fast indexes everything (Oct '01)
TRUNCATION
AltaVista & MSN use "*" as a truncation symbol (Oct '01)
WORD FREQUENCY
one keyword/phrase each in -
- URL (domain/folder/filename - hyphens better than running words together)
- title tag
- meta description & keyword tags
- first heading tag
- first link text
- early in page text
- alt text
- comments text
- hidden tags (input... value)
LINKS
http://www.researchbuzz.com - Tara Calishain
http://www.1stSearchRanking.com - Sumantra Roy
http://searchenginewatch.com - Danny Sullivan
http://www.rankwrite.com/ - Jill Whalen & Heather Lloyd-Martin
http://www.wilsonweb.com/webmarket/searchengine.htm - Ralph Wilson
http://www.associateprograms.com/search/newsletter068.shtml - "Real Names"
http://www.northernwebs.com/set/setsimjr.html
http://searchenginewatch.internet.com/webmasters/meta.html
http://searchenginewatch.com/facts/math.html
http://searchenginewatch.com/features
http://searchenginewatch.com/facts/powersearch.html
http://searchenginewatch.com/facts/assistance.html
http://searchenginewatch.com/facts/boolean.html
http://searchenginewatch.com/resources/tutorials.html
http://searchenginewatch.internet.com/webmasters/spiderchart.html
http://searchenginewatch.com/standards/
|