AltaVista's Search By Language Feature

AltaVista's Search By Language Feature

From The Search Engine Report
June 4, 1997

AltaVista introduced the ability to search in on of 25 different languages in July 1997. It uses a dictionary-based methodology to identify the dominant language of a page when it is indexed. This allows a search to be narrowed to pages written in French, Spanish or a number of other languages.

The advantage of this may not be self-evident, at first. After all, a search with any search engine for something like "Suchmaschinen," which is German for "search engines," is almost certainly only going to find pages written in German. Pages in English or other languages simply won't have that term on them.

However, imagine that you only speak German and want information about Berlin. An ordinary search for "Berlin" will bring up pages in English, German and perhaps other languages.

By specifying that a search be done only among German language pages, you are able to narrow in on pages you can actually read.

The ability to search by language also lets you measure the preponderance of English on the web.

Using Advanced Search, place a * in the search box and click on Search. You'll be shown how many pages in there are in the index. Now do the same thing, but choose a language. Now you'll be shown a count of all documents in that particular language. Divide by the number of total pages, and it gives you a percentage of penetration.

For those that are curious, my spot checks in early Aug. 1997 found 78% of the web pages in AltaVista were English, 4% were German, 1.5% were Spanish and 0.25% were Finnish.

The AltaVista language search feature is much different than using the country-specifics versions that other search engines offer. In those cases, results are usually taken from the main listings but filtered by domain.

For example, a French search engine might find matches from French domains, such as those ending in .fr.

A language-specific search on AltaVista is completely content based. No domain filtering is not involved. Consequently, French language pages may be found that might otherwise be missed.

"For example, more than half of all Web pages written in French are located outside of the .fr (France) domain," says Louis Monier, chief technical officer for AltaVista. "Conversely, over 30 percent of the pages in the .fr domain are not written in the French language. With the new multilingual search capability of AltaVista Search, users have a powerful advantage to be able to quickly and accurately find pages in their chosen language anywhere on the Web."

Languages included are Chinese, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish and Swedish. More are planned to be added.

More Information


Alta Vista Help Page

Be sure to choose "Expand Topics" to see additional help information and examples that you might overlook, or use the link below.

AltaVista Preferences Control Panel

Chose your preferred language or languages to search within, set you search mode, display and refine options.