Directories
General Search Engines
Meta Engines
Contextual
Human Powered
Specialized
Weblogs
WAP
Bookmarks
Experimental
Forums
Meaning Based
Newsletters
Reference
Resources: General
Resources 2
Search Engine Positioning
Stats
Traffic
Submission URLs
Bookmark page
ALT.INTERNET.SEARCH-ENGINES: CHARTER & FAQ
Download text file version here
___ __ __ ___ ___ ____ _______________ ____ ___ ____ __ _ _ _ ____ __ __ ___ ___ ____ _______________ ____ ___ ____ __ _ _ _ _ ___ __ __ _| |__ __ __ __ _ ___ __ __ _| ALT.INTERNET.SEARCH-ENGINES |__ __ _ _ _ _ ___ __ __ _| CHARTER & FAQ |__ __ __ __ _ ___ __ __ _|_ ___ ____ _______________ ____ ___ _|__ __ __ __ _ TABLE OF CONTENTS I. What is alt.internet.search-engines? 1. Officially appropiate topics. 2. Unofficially appropriate topics. 3. Statement on advertising. 4. Basic behaviour in alt.internet.search-engines. 5: Quoting techniques. II. Frequently asked questions. 1. What is a portal/directory? 2. What is a search engine? 3. What is cloaking? 4. What is search engine optimization? 5. Why is search engine ranking important? 6. What keywords or phrases should I optimize my web site for? 7. What is "robots.txt"? 8. Will a search engine spider my frames page? 9. How can I start my own search-engine? 10. Virtual Hosts / individual IP addresses 11. What is this Dmoz/Open Directory Project (ODP) everyone rants about? III. Other resources on the net 1. List of Search engines. 2. Cloaking Tutorial + FAQ 3. Keyword Research 4. Meta Tags 5. Search Engine Newsletters 6. Search Engine Optimization Newsletters 7. Discussion Forums 8. General Tips + Tricks IIII. More info on this FaQ 1. Current version and posting-frequency. 2. Suggestions and changes. 3. Changes in versions 4. Contributors Appendix 1: The original charter. --------------------------------------------------- - - - - - - - - - - PART I - - - - - - - - - - - - - - - -WHAT IS ALT.INTERNET.SEARCH-ENGINES? - - - - --------------------------------------------------- The newsgroup alt.internet.search-engines was first created on 23 Feb, 1999 by Imran Ghory as "A group for discussing search engines". _________________________________________________________________ I.1. Officially appropriate topics This is the definition from the original charter of the group. The following will be on-topic: - Announcements relating to search engines. - Discussion on how to use search engines efficiently. - Discussion of getting URLs added to search engines. - Comparisons of search engines. - Analysing of webpages in reference to search engines. - Questions(and answers) on search engine use. - General discussion of search engines. The following will be off-topic: - Adverts - Development of search engines. (Use comp.infosystems.search) The following are not allowed in the group even if they are on-topic, - Binaries. - Excessive cross posts(ECP). - Posts containing or in HTML. _________________________________________________________________ I.2. Unofficially appropriate topics. The definition "discussion of search-engines" has unofficially been adjusted by the group and is recommended to be read as follows: The purpose of this group is to discuss and debate all subjects related to search engines and the technology associated with them. This wide subject ranges from search engine optimization to 'where is the submit button in AltaVista's new layout'. This group is not intended for the posting of ads and it is suggested that you instead use the newsgroups intended for them. A general rule for deciding if your post is well regarded or not is that it must at least include a question or answer to a question that is in some way connected to the search engines. Do not post messages like "fast search-engines www.something.com". If you have found a good search engine - and only want to share it with the others (meaning not discussing it) - See the contact information in the bottom of this page, then it will be included in the FaQ or will be added to one of the websites under "other net resources". _________________________________________________________________ I.3. Statement on advertising. There is a general consensus that off-topic advertising (i.e. not relating to the purpose of the group) should not be allowed. 1: Nobody wants to read advertisements for products which they probably would not be interested in. 2: People PAY to use newsgroups. It is a waste of their money when you post your advert. (and that will most likely keep them from buying your stuff anyway). Because of this - please understand that alt.internet.search- engines is NOT the right place to advertise your products. If you are publishing a new book related to the search engines, or have created a site with quality content related to search engines or have written an article concerning search engines, you are welcome to post an announcement about it. Just don't impersonate someone that has 'found a great new site' and make sure the information you're offering is indeed relevant. Also, please don't post the same announcement several times. We'll all see it the first time and will read it if we feel like doing so. Posting the same announcement again will just upset people who have already read it. There are some users whose servers does not store messages for more than a month. If you have a site or program that you think is _VERY_ relevant to people using the group - send it to the maintainer of this FaQ and it will be included in the next version. This FaQ is posted 3-4 times a month to the group. _________________________________________________________________ I.4. Basic behaviour in alt.internet.search-engines Much of the info in the following will be common sense to most people, but sometimes someone tends to forget and sometimes people are new to the net and need introduction. If you are into the basic netiquette, you can skip reading the following. What is netiquette: Netiquette is a set of norms that you should follow when acting online. It is a set of rules that you can chose not to follow. But not following these is like farting in a restaurant or scratching your bottom before shaking hands. - Spam and inflammatory messages: Don't join a group just to post inflammatory messages - this upsets most system administrators and you could lose access to the net. - Keeping the group clear: Try to keep your questions and comments relevant to the focus of the discussion group. - When someone posts an off-subject note, and someone else criticizes that posting, you should NOT submit a gratuitous note saying "well, I liked it and lots of people probably did as well and you guys ought to lighten up and not tell us to stick to the subject". You can read more about USENET rules at: http://www.faqs.org/faqs/usenet/ _________________________________________________________________ I.5. Quoting techniques When quoting another person, edit out whatever isn't directly applicable to your reply. Don't let your mailing or Usenet software automatically quote the entire body of messages you are replying to when it's not necessary. Take the time to edit any quotations down to the minimum necessary to provide context for your reply. Nobody likes reading a long message in quotes for the third or fourth time, only to be followed by a one line response: "Yeah, me too." When quoting part of a message (part by part) use [..] or
or similar tags where you have cut sections and parts of the message. Keep your message short and devoid of redundancy. If you do not edit your message. Try imagining. You have 3kb´s of unessesary text in your message. That is probably nothing, you may think - But if 1500 people download your message, that would add up to 4500 KB!. Irrelevant downloads aside - it can be rather annoying trying to follow a badly quoted discussion. --------------------------------------------------- - - - - - - - - - - PART II - - - - - - - - - - - - - - - - - - FREQUENTLY ASKED QUESTIONS- - - - - - - --------------------------------------------------- - At current time the FaQ only contains some basic info on the different terms used concerning search engine discussions. _________________________________________________________________ II.1. What is a portal/directory? Portals/directories are "search engines" that require submission of our site to see it included in their database. They normally have quite a number of editors reviewing the sites before they will be accepted. Many of these are business to business or otherwise focussed on a certain topic. One of the most popular directories is http://www.yahoo.com. Many portals are linked with search engines that provide search results if the directories do not have any sites within a specific search term. _________________________________________________________________ II.2. What is a search engine? A search engine is a huge database that is constantly updated by "spiders" - these are robots or automated programs that constantly crawl the www following links on home pages (web sites). These capture the text on webpages and, based on different algorithms, they output results when people search in them. Some very popular search engines are http://www.google.com, http://www.altavista.com and http://www.webcrawler.com. Many search engines cooperate with portals/directories for providing their users an alternative way of finding information instead of relying on search by keyword. _________________________________________________________________ II.3. What is cloaking? Cloaking is a technique used by some web sites to feed different content to search engine spiders (see above) and to human visitors. This may be employed to improve ranking for a site as the output to the search engines will usually be optimized, targeting their specific ranking algorithms. Another major use of cloaking is to protect web page code from being stolen by competitors. Finally, cloaking may be required to work around browser incompatibility issues, non-spiderable page code (e.g. graphics rich sites, splash pages, Flash, Java, JavaScript, etc.), dynamic page delivery, etc. Please notice that many search engines do _not_ approve of this practice while a few others encourage it. This is mainly so if cloaking is employed in a misleading ("spammy") way, e.g. by redirecting surfers to content they did not target when clicking the displayed search result URL. _________________________________________________________________ II.4 What is search engine optimization? Search engine optimization or search engine positioning is the art and the science of constructing or organizing web pages in a way to help them achieve good rankings with the search engines. All search engines follow their own, proprietary ranking algorithms which are continuously tweaked and improved upon. These algorithms being treated as trade secrets, the search engines will obviously not divulge their details. This makes professional search engine optimization very similar to reverse engineering: some experts will run test pages and even whole test domains for the sole purpose of determining individual search engines' ranking behavior. This may involve questions like which engine values meta tags, titles, alt tags, link popularity, click-through frequency, etc. Hence, efficient optimization can turn into a very involved affair requiring lots of specialist knowledge, up-to-date information, statistical analysis, etc. The more competitive the WWW becomes, the harder it gets to achieve decent rankings in those areas where many sites are vying for attention. _________________________________________________________________ II.5 Why is search engine ranking important? Surveys and studies have shown that surfers searching the engines for keywords or phrases will typically click through to those sites featured highest. Page one to page three rankings will make for appr. 90% of all search engine generated user traffic. What this boils down to is that your web site will not generate any traffic worth mentioning if it is featured lower than (typically) Top 30. So if you want your site to be known and to draw lots of visitors, a good ranking with the major search engines is crucial. _________________________________________________________________ II.6 What keywords or phrases should I optimize my web site for? Regardless of whether you have a commercial or a non-profit or amateur web site: picking the keywords or search phrases for optimization of your site is crucial. A frequent mistake among webmasters is gauging the popularity of keywords biased by their own tunnel view of what people should be interested in. Luckily, many search engines (major and minor) offer real time search monitoring on special pages (so-called "voyeur" function or pages). There is also an abundance of real life search phrases databases (both free and commercial) available on the net. Finally, you can make use of special software which can help you automate the process. For a fairly extensive overview of real life keyword research resources see "Keyword Research" in the resources section below. _________________________________________________________________ II.7. Will a search engine spider my frames page? They will if you link all your subpages from the text within the noframes tag. However - It will not index your frameset, but each single page. This means that users entering you site will most likely NOT load the frameset. You can use JavaScript to check that the frameset is loaded. However that presents 2 problems: 1. Most of them do not work very well. 2. The client side redirection might get your page banned from the search engine. It is recommended that you (concerning SEO, not page design) do not employ frames. If you chose to do so nevertheless, it is highly recommended that you include navigation elements within your framed page as well so the user can navigate the site without the frameset. _________________________________________________________________ II.8 What is "robots.txt"? The Robots Exclusion Protocol is a method that allows you to tell visiting spiders what to index and what to leave alone. You can exclude a particular spider or all spiders (that follow the standard) from your entire site, from particular directories, or from particular files. - Should I create a robots.txt file? Only if you want crawlers to stay away from your site (or parts of it such as password restricted areas, graphics directories, etc.) - Can I leave the robots.txt blank? Yes, but that will cause some spiders to leave without indexing. - How should my robots.txt look like? Check here: http://info.webcrawler.com/mak/projects/robots/exclusion.html as this page features links to relevant sites. - Can I prevent indexing by other means than robots.txt? Yes, you can use:
in your header. However, not all robots respect this. _________________________________________________________________ II.9 How can I start my own search-engine? Robots (also known as spiders, wanderers, worms, crawlers and gatherers) follow links from one web page to another. They work with indexing code to store data for later searching. There is a good deal of free open source code available - you don't have to start from scratch. You can find a long range of search engines in the programming language best suited to your needs at: http://www.searchtools.com/robots/robot-code.html _________________________________________________________________ II.10 Virtual Hosts / individual IP addresses It is a common problem that search engines will occasionally index one site and redirect to another. Usually this issue relates to problems with the HTTP/1.1 standard. The World-Wide Web Consortium strongly recommends that web servers use virtual hosts, so as not to waste additional IP addresses simply for Web hosting. This means that hundreds of domains can reside on the same ip address. The problem results from the fact that not all Search Engines honor the HTTP/1.1 standard which allows for this particular implementation, or, in rare instances, that the web hosting services have misconfigured their servers. Avi Rappoport have done research that shows that AltaVista, Excite, FAST, Google, Northern Light, Go (the engine formerly known as Infoseek), etc. do not have any problems with this at all. The only spiders that failed to send the proper headers were MOMspider/1.10 libwww-perl/0.40 and PerlMan Surf; additionally, similar problems with the French search engine Voila have been reported. How can you resolve this problem? Simply put, leave your current web hosting service if they fail to address the issue, or get an individual IP address. Contacting the search engines directly might also produce results. This problem should evaporate in 2001 at the very latest, as indivdual ip addresses are getting ever more rare and the Search Engines simply cannot avoid adapting to the standards of the World-Wide Web Consortium. _________________________________________________________________ II.11 What is this Dmoz/Open Directory Project (ODP) everyone rants about? Throughout last year there has been a lot of discussion about the Open Directory Project (dmoz.org) that delivers directory data to some of the major search engines. It is currently the largest index on the web, with more than 2 million unique sites and almost no dead links. There are basicly two camps participating in this discussion: 1. People defening the ODP. 2. People that hate the ODP. There are lots of problems under discussion - the following is intended as a short, non partisan summary. 1. The directory is owned by AOL/Warner/Netscape, one of the major players in the internet market, but the directory is driven entirely by volunteers that do not get paid for their efforts. Some people consider this an "abuse" of volunteers for commercial purposes. 2. The Dmoz/ODP people are apparently constantly seeking new editors, but rejecting 90% of all applications. It is known that people who are is very qualified for a particular category (that does not have any editor yet) are being rejected with the same standard reply. Many people have experienced thei applications not getting any response at all. 3. Dmoz fires many editors without reasons given. This contention should be taken with a grain of salt as representing only one of two sides to this story. Editors have been known to abuse their privileges before and have been fired for this, while others seem to have been fired without apparent reason. 4. ODP is quite inaccessible. They don't publish any phone numbers, their address seems to be treated as a state secret and is virtually impossible to obtain. Some people consider this a bad thing, while others disagree. It is a common occurrence that you do won't get any feedback from editors, explaining why your submissions have been turned down and what you could to to rectify the situation - even if you have been indexed in all the other major directories. 5. The editors have incredible power. If they manage to get in charge of a category in which they themselve have a vested interest (typically a web site of their own), they can "cool" their site and change the descriptions of the other sites in that category and keep competitors from getting their page indexed at all. It is an established fact that this happens, but it is also common knowledge that the Dmoz administration are working on preventing this. Needless to say, many people nurture hard feelings towards Dmoz. Either for never getting any feedback on their site submissions (e.g. an explanation for not being indexed) or for their applications for editorship, or for getting the same boiler plate reply that all people get. On the other side there is the "pro-Dmoz" front consisting either of established editors or people who are simply in more or less ardent favour of the index. --------------------------------------------------- - - - - - - - - - - PART III. - - - - - - - - - - - - - - - - - OTHER RESOURCES ON THE NET- - - - - - - --------------------------------------------------- This sections contains links to other resources on the net. ___________________________________________ III.1. Search engines. Some popular search engines: - http://www.altavista.com/ - http://www.alltheweb.com/ - http://www.directhit.com/ - http://www.excite.com/ - http://search.go.com/ - http://www.google.com/ - http://www.goto.com/ - http://www.hotbot.com/ - http://www.lycos.com/ - http://www.northernlight.com/ - http://raging.com/ Portals and directories - http://www.dmoz.com/ - http://www.looksmart.com/ - http://www.snap.com/ - http://www.yahoo.com/ Submission URLs - http://searchenginebase.com/sbsubmissions.html You can find a comprehensive list of all major and many minor search engines at: http://www.searchenginebase.com/ III.2. Cloaking Tutorial + FAQ - http://fantomaster.com/fafaqcloak1.html - http://www.spiderhunter.com/ III.3. Keyword Research Resources - http://fantomaster.com/fasmbres03.html#voyeur III.4. Meta Tags - The Definitive Resource: http://vancouver-webpages.com/META/ III.5. Search Engine Newsletters Newsletters Featuring Search Engine News (in alphabetical order) - Actu Moteurs (in French): http://www.abondance.com/ - Google Friends Newsletter: http://www.google.com/ - Pay Per Click Search Engines Update: http://PayPerClickSearchEngines.com - Search Engine Guide: http://www.searchengineguide.com/ III.6. Search Engine Optimization Newsletters Newsletters on Search Engine Optimization (in alphabetical order) - fantomNews: http://fantomaster.com/fantomnews.html - RankWrite: http://www.rankwrite.com/ - Search Engine News: http://www.searchengine-news.com - Search Engine Optimization and User Interface: http://www.cre8pc.com/seui.html - Search Engine Quarterly: http://www.searchengineworld.com/ - Search Engine Watch: http://www.searchenginewatch.com/ - The Spider Report: http://spider-food.net/ III.7. Discussion Forums (in alphabetical order) - AIM-Pro: http://www.aim-pro.com/cgi-bin/Ultimate.cgi - Market Position Talk: http://www.marketpositiontalk.com/forums/ - SearchEngineBase Forum: http://searchenginebase.com/discussions.html - SearchEngine Discussion Forum: http://searchenginediscussion.com/cgi-bin/ubb/Ultimate.cgi - SearchEngineForums: http://www.searchengineforums.com/ - SearchEngineMatrix Forum: http://www.searchenginematrix.com/ - SearchEngineWorld Forum: http://www.webmasterworld.com/index.cgi III.8. General Tips + Tricks (in alphabetical order) - http://www.aim-pro.com/ - http://fantomaster.com/ - http://www.searchengineworld.com/ - http://spider-food.net/ - http://www.spiderhunter.com/ III.9. Search engine spider verification service - http://spiderscouts.com/ --------------------------------------------------- - - - - - - - - - -PART IV. - - - - - - - - - - - - - - - - - - MORE INFO ON THIS FAQ - - - - - - - - --------------------------------------------------- Current release: 1.07 _________________________________________________________________ IV.1. Current version and posting-frequency. The current version of this document can always be found at the following: WWW - http://searchenginebase.com/aise-charter-faq.html - http://search.mermaidconsulting.com/altinternetsearchengines.txt - http://www.geocities.com/ranktips/faq.htm USENET Posted three to four times per month to alt.internet.search-engines _________________________________________________________________ IV.2. Suggestions and changes. Suggestions and changes should be posted to alt.internet.search-engines with a title of "FAQ-Suggestion" or sent by email to the current maintainer: fantomasterNOSPAM@NOSPAMfantomaster.com (for: Frequently Asked Questions) or: martinNOSPAM@NOSPAMmermaidconsulting.com (for: the rest) _________________________________________________________________ IV.3. Changes in versions 1.07: Edited: I. Edited: II. Added: II.10 and II.11. Edited: III. Added: III.9. Edited: IV. 1.06: Edited III.1. 1.05: Removed: III.9. (due to update matters) 1.04: Added: III.9. (thanks Ash) 1.03: Added: II.9 (thanks Avi Rappoport) 1.02: Proofread and typos removed. (Thanks, Dirk!) 1.01: Added: II.4 -> II.6 III.2 -> III.8 Edited: I.4. _________________________________________________________________ IIII.4. Contributors Contributors include: Imran Ghory, Rupert Bowling, Ashley Williams, Lauri Harpf, Avi Rappoport, Uksitesubmit, Ash Williams, James Cox, Dirk, Ralph aka fantomaster, Martin Rytter Jensen, and many others. _________________________________________________________________ Appendix 1: The original charter. Charter: A group for discussing search engines. The following will be on-topic: Announcements relating to search engines. Discussion on how to use search engines efficiently. Discussion of getting URLs added to search engines. Comparisons of search engines. Questions(and answers) on search engine use. General discussion of search engines. The following will be off-topic: Adverts Development of search engines.(Use comp.infosystems.search) The following are not allowed in the group even if they on-topic, Binaries. Excessive cross posts(ECP). Posts containing or in HTML. Justification: Conducting a DejaNews search on the term "Search Engine" it turned up exactly 20461 messages between the period 1st January 1999 and the 1st February 1999, showing an average of 660 messages a day. These messages were widespread over several hundred newsgroups, but the largest concentration of them were in comp.infosystems.www.* and alt.internet.* hierarchies. The group comp.infosystems.search is not suitable for this purpose as it focuses on the development and administration side of search engines. Proponent: Imran Ghory
_________________________________________________________________