A special report from the Search Engine Strategies conference in Boston, MA, March 4-6, 2003.
Macromedia Flash and other non-HTML formats can pose problems for search engines, unless you take appropriate steps to optimize the content.
"Search engines were originally built to index and serve HTML documents," said Tim Mayer, Vice President of Web Search at FAST. "Now the web has become more diverse in content types, knowing how to treat Flash and other types of content has become more important for search engines."
"These other content types present different challenges to the search engines," Mayer continued. "For example, Flash files generally contain too little text whereas PDF documents contain too much text. The technology to include differing content types and score them appropriately will become even more important as new areas in web search become more important -- such as real time data which will provide the challenge of lacking inbound links."
Participants in this panel discussed how crawlers interact with non-HTML content, and offered a number of workarounds and optimization tips.
Search Engines and Flash
Flash is the leading vector graphics technology for creating design-focused web sites. Over 98 percent of Internet users can view Flash content with the Flash player software already installed in their browsers. Over 490 million people use the Flash player.
Gregory Markel, Founder/President of Infuse Creative, an entertainment and technology consulting company, discussed issues related to search engine visibility and Flash sites. "The good news is that FAST Search and Google can follow embedded links within the [Flash” files," he said.
FAST built its Flash indexing capabilities using the Macromedia's Flash search engine software developer's kit (SDK). The SDK was designed to convert a Flash file's text and links into HTML for indexing.
"Not all search engine spiders have the ability to crawl or index Flash, he said. "As far as I am able to determine, Google has not included the Flash-SDK setup for indexing, like FAST. But Google can follow embedded links."
Markel warned that the Macromedia's SDK solution is far from perfect. "All it does is it takes whatever [content” is there, and converts it to an HTML version. But the converted HTML doesn't include anything you actually need to do well in the search engines. No title tags, alt tags, body text, etc. SDK is a step in the right direction, but has a long way to go."
"One of the big problems with Flash content is that it's very hard to find," stated Tim Mayer. "We have a lot of Flash content in the FAST index, though I've rarely come across a Flash file, myself, in the main search results."
One of the reasons for the paucity of optimized Flash files is that the search engine industry hasn't adopted SDK as the standard, explained Mayer. "The SEOs out there don't know that we're actually going to index their files," he said, "so they don't prepare them in an optimized way (for the SDK). This will change as more search engines adopt this."
Mayer recommended that webmasters restrict the text to what they want indexed. For example, making the "skip intro" a graphic file is better than making it a text link. "Keep what you want as indexable as text," said Mayer. "Make what you don't want as Flash graphics. We will also take out links from Flash sites and follow them."
Multimedia files and Search Engines
Few search engines provide search for audio and video file formats. Currently, AltaVista, FAST Search, and Singingfish support the following multimedia formats: Windows Media (Windows Media Encoder), RealMedia, MP3, and Quicktime.
Multimedia content can provide a better user experience in some industries.
"Multimedia content enables an immersive and emotive user experience beyond text-based content," said Ken Berkun, Founder and VP of Strategy at Singingfish. "A 30 second music clip is a strong advertisement for a CD."
When you create a multimedia file, you have the opportunity to give it metadata. "Give each file a title, copyright stream, author, description, and keywords, said Berkun. "Every single one of these medias have these fields."
"You would be surprised at how little that's done," continued Berkun." The most popular titles in our database is the default - nothing. People spend thousands or even hundreds of thousands of dollars on the production and don't even take the time to add a title."
In addition to having metadata in the multimedia file, Berkun also advised to have actual content on the HTML page that contains the file. Include accurate anchor text and an around multimedia files. And make sure that your entire web site is spiderable.
PDF Documents and Search Engines
Shari Thurow, Marketing Director of Grantastic Designs, a full-service search engine marketing and design firm, addressed PDF files and the search engines. "PDF stands for portable document format, which is a universal file format that preserves fonts, colors, graphic images, and formatting of any source document," she said.
Unlike Flash documents, PDF documents frequently appear within regular search results. Thurow stated that she typically finds PDF-formatted technical manuals, white papers, press kits, and spec sheets in the top 30 search engine results pages in both Google and FAST Search. Because of this, Thurow commonly optimizes PDF documents for various industries.
Thurow also stressed the importance of using the Robots Exclusion Protocol on some PDF documents. "Search engines have made it clear that they do not want redundant content in their indices," she said. "Even having both a PDF and HTML version of the same content is redundant. For that reason, I place the Robots Exclusion Protocol on the PDF version. HTML format is better for the search engines, anyway, since HTML files tend to be smaller in file size."
PDF documents can be submitted through the normal submit URL forms and the paid inclusion programs at the major search engines.
AlltheWeb - Add URL
Sites using Flash and PDF should submit to AlltheWeb, which will index them for its own site and make them available to its search partners, including Lycos.
Google - Add URL
Google will follow links embedded in Flash content - submit these pages to have Google's crawler extract these links for indexing. Also submit PDF documents to this URL.
Singingfish - Add URL
If Singingfish has not discovered your multimedia files through the natural spidering process, use this URL to submit. Paid inclusion program also available.
Macromedia Flash Software Developer Kit
More information about Macromedia tools that allow search engines to index Flash content.
Learn more about Adobe PDF documents.
Multimedia Search: Singingfish
Search Engine Watch Editor Danny Sullivan takes a look at Singingfish.
Craig Fifield is Product Designer for Microsoft bCentral's Small Business Web site optimization and submission service, Submit It!
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.
| Google: Not quite ready for an IPO... |
ZDNet May 6 2003 12:39PM GMT
| Is Your Website Banned by the Search Engines?... |
Search Engine Guide May 6 2003 10:48AM GMT
| When Google Isn't Enough... |
Business 2.0 May 6 2003 9:29AM GMT
| Search Engine as OS... |
eWeek May 6 2003 8:19AM GMT
| History of Search Engines and Public Relations... |
Traffick May 6 2003 5:36AM GMT
| Hitwise Launches Online Service in the USA... |
dmnews.com May 6 2003 5:22AM GMT
| University plans bot museum... |
USA Today May 5 2003 12:41PM GMT
| Inside Google... |
SiliconValley.com May 5 2003 12:34PM GMT
| Google: An engine of change... |
SiliconValley.com May 5 2003 12:34PM GMT
| Google listens to your questions... |
The Register May 5 2003 9:55AM GMT
| Signs of a Revival for Online Ads... |
New York Times May 5 2003 5:52AM GMT
| How to avoid misquoting Google... |
CyberJournalist.net May 5 2003 5:28AM GMT
| UK internet surfers are weird... |
Web-User May 5 2003 0:16AM GMT
Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!