Q. If a search engine using a "spider" or "crawler" to index web pages finds another search engine, does the entire index of that search engine get added to its own? Also, do these "spiders" of different search engines swap information with each other, so to speak?
A. No, what you describe doesn't happen. For example, let's say AltaVista's spider comes to the Google home page. It doesn't then index the around 2 to 3 billion web pages that Google claims to have recorded. Legal issues aside, there's no way for AltaVista's spider to follow "links" to these pages.
Think about it. You only get information from Google (or other search engines) by entering what you are looking for into a search box. For AltaVista to get all of the pages Google knows about, it would have to input millions of search queries into the search box in order to get back listings. In addition, these listings would only summarize the pages within the Google index. They wouldn't have the source code of those pages that Google has actually recorded.
Having said this, there are some ways a spider could index another search engine's listings. For example, let's say you do a search on Google for "shoes," then you want to make sure that others can easily see the results. You could copy and paste the URL that appears in your browser after doing the search, which would look something like this:
Now, let's say you put that URL into a web page. Another spider might then find it, and if it chooses to follow it, then it would receive a "page" of results at Google for a search on "cars."
I've seen this occasionally happen, but it's relatively rare. However, out of curiosity, I did some very quick looking to find examples of it happening. These examples were done by using special search commands rather than via a typical query that might be done by an ordinary searcher.
This search at AllTheWeb.com:
shows some examples of where AllTheWeb.com has followed links to index search results pages at Google.
And this search at Google:
shows some example where, though Google hasn't spidered results pages at AllTheWeb, it has at least recorded links in its index that will bring up those results.
Of course, someone really intent on replicating all the pages that Google has could do it to some degree. They could generate a variety of queries to bring back all the pages that Google knows about on different web hosts, then pull back the "cached" copies that Google makes available for these pages. However, it's likely Google would detect the noticeable activity placed on their servers and block it. In addition, it would probably be easier for the company to just spider the web themselves!
Finally, none of the major web crawlers swaps information they've indexed. However, by using a "metasearch engine," you can send one query to multiple search engines and get combined results. The page below lists some meta search engines:
Metacrawlers & Metasearch Engines
Q. How often can you submit your URL to search engines before it gets to be too often? Weekly, daily, hourly? And what are the consequences if you submit too often? Does how often you submit have any influence on how high you rank in the search engine results list?
A. Submission to the crawler-based search engines really doesn't mean much, to begin with. Instead, crawlers tend to follow links to understand which pages are most essential for them to include. Several do offer "free Add URL" pages, and an easy list of these is at http://searchenginewatch.com/webmasters/crawlers.html. There are no longer any particular submission "limits" to worry about, but I can't stress enough that submitting all your pages or the same page each week, day or hour is going to do little to ensure that the pages get included or rank well. Instead, most people should find that a one time submission of their key pages is enough. One caveat. If you were to submit many pages on a regular basis, it might be that the crawlers could decide to take a closer look at your web site, since search engine spammers are often associated with those who submit frequently. This doesn't mean your site will be penalized as being spam. It just means you could invite some extra attention brought your way -- and if you are indeed doing something wrong, that's attention you probably don't want.
Q. I know the difference between a directory & a search engine, but how can you tell by looking at the site? Also, is there a site that lists directories.
Yes, it can be confusing! One solution is to find the Add URL page for the site you are investigating. Does it ask you to provide a description of your web site? If so, then you are almost certainly dealing with a directory. Also, if you find there's a "category" structure, where sites are arranged by topics, then you are also probably dealing with a directory. Life does get difficult given that places such as Google have both "crawler" and "directory" results. For further guidance, look at http://searchenginewatch.com/webmasters/, where the numbered guide will list major directories and crawlers. For a tabular format, take a look at http://searchenginewatch.com/webmasters/results.html.
Q. Recently, I looked at the source of a web page, and found multiple lines for keywords, such as:
What I'm wondering; what is the outcome of the example I pasted in above, regarding the multiple keyword lines.
A. It's hard to know exactly how the few remaining crawlers that use the meta keywords tag would deal with this, but most likely, they'd only record information in one of the tags -- probably the last one. The format is definitely that there should be only one meta keywords tag, if you are going to use it, and that all the terms you wish to include be part of that single tag.
Q. Do search engines care if more content is on the homepage versus the rest of the site? For instance, in Google, would a site with 10-pages of text on the homepage rank higher than a site with a 1-text-page homepage and 9 other 1-text-pages.
A. Crawler-based search engines rank pages on a page-by-page basis, not on a "site" basis. In other words, they don't try to figure out how many pages of content you have on different topics, then perhaps reward a site with lots of content on a particular topic or "theme." Instead, each of your pages will standalone on the page's particular merits. Having said this, if your site had 10 pages that were content rich versus 1 content rich home page and 9 "text-light" pages, I'd expect you to do better with the ten content rich page. That's not because they'd work together as a team but rather because individually, the content rich pages each have a better chance of doing well than text-light ones.
Q. Is there a straightforward (i.e., easy way!) to find our where a particular web site ranks in a certain search engine or directory -- and what category it's in? For example, I know that my web site is listed in Yahoo since it appears when I key the domain name in the search box, but I don't know what category it's been put into.
A. You aren't actually in Yahoo, according to the check I did. You are in Google's results. If you were in Yahoo, then when you search for your domain name, under the description of your listing would be a "More sites about" link, prefaced by a red arrow. Clicking on that would take you to where your site "lives" in the Yahoo directory. As you have no such link, you aren't in the human results of Yahoo. To learn how to check at other search engines, see http://searchenginewatch.com/webmasters/checkurl.html
Q. Our competitors have submitted our site to an internet porn ring in an attempt to get us thrown out of the search engines. Is there anything we can do?
A. In general, you really shouldn't get penalized for people linking to you. That's outside your control. As long as you aren't linking back into the porn network, that ought to be enough to isolate you from any damage.
Introducing SES Online
Want to view one of the sessions you missed or listen to an especially informative presenter a second time? SES New York sessions are available for purchase on ClickZ Academy's new e-Learning site. SES is now Online!