A special report from the Search Engine Strategies conference in Boston, MA, March 4-6, 2003.
Macromedia Flash and other non-HTML formats can pose problems for search engines, unless you take appropriate steps to optimize the content.
"Search engines were originally built to index and serve HTML documents," said Tim Mayer, Vice President of Web Search at FAST. "Now the web has become more diverse in content types, knowing how to treat Flash and other types of content has become more important for search engines."
"These other content types present different challenges to the search engines," Mayer continued. "For example, Flash files generally contain too little text whereas PDF documents contain too much text. The technology to include differing content types and score them appropriately will become even more important as new areas in web search become more important -- such as real time data which will provide the challenge of lacking inbound links."
Participants in this panel discussed how crawlers interact with non-HTML content, and offered a number of workarounds and optimization tips.
Search Engines and Flash
Flash is the leading vector graphics technology for creating design-focused web sites. Over 98 percent of Internet users can view Flash content with the Flash player software already installed in their browsers. Over 490 million people use the Flash player.
Gregory Markel, Founder/President of Infuse Creative, an entertainment and technology consulting company, discussed issues related to search engine visibility and Flash sites. "The good news is that FAST Search and Google can follow embedded links within the [Flash” files," he said.
FAST built its Flash indexing capabilities using the Macromedia's Flash search engine software developer's kit (SDK). The SDK was designed to convert a Flash file's text and links into HTML for indexing.
"Not all search engine spiders have the ability to crawl or index Flash, he said. "As far as I am able to determine, Google has not included the Flash-SDK setup for indexing, like FAST. But Google can follow embedded links."
Markel warned that the Macromedia's SDK solution is far from perfect. "All it does is it takes whatever [content” is there, and converts it to an HTML version. But the converted HTML doesn't include anything you actually need to do well in the search engines. No title tags, alt tags, body text, etc. SDK is a step in the right direction, but has a long way to go."
Markel also shared his experiences with Flash sites and the pay-per-click search engine programs at Overture and Google (AdWords).
"For a Flash site to be allowed in Overture, it must have a 'skip introduction' feature on its splash/home page," he said. "Otherwise the site will not be allowed into Overture."
"A database Flash site could contain tons of valid content," Markel further explained. "But it still appears as one page. Let's say you're trying to get a ranking for one term, e.g., 'blue suede shoes.' If an Overture editor goes to your web site and doesn't see any blue suede shoes reference on your home/splash page, even if you have references inside your site, the editor could initially disallow the site."
Overture has recently relaxed these guidelines. Flash sites might not have the same issues they had 4-5 months ago. Regardless, be prepared to go over any potential Flash issues with an Overture editor, should they arise.
Markel had similar experiences with Google Adwords.
"We set up this client with a Flash site," he said. "We kept getting 'URL not working' notices from Google, and our listings would be turned off. But there were never any problems on our end. We drew a conclusion that Google may have sent some automated bot out to make sure the site was functional, and maybe make sure that we weren't doing any tricky re-direct. The Flash site had a 'sniffer page' that was throwing that bot off! We eventually were able to solve that problem by modifying the submitted URL."
Skilled Flash webmasters typically create a single web page to detect whether or not end users have Flash media player installed on their browsers. Markel calls these "sniffer" or "detection" pages. Most of the time, the "sniffer" is going to be a site's home page.
"In a worst case scenario, there is no content on the 'sniffer' home page," said Markel. "The page only exists to determine whether end users have Flash or not (or what type of Flash and accessory plug-ins, like Shockwave) and where they should be directed. An empty, vacuous Flash detection page can be a huge problem right before the search engine spider even gets to the actual site."
Using redirects also aggravates the problem.
"If there is a real sophisticated hierarchy of redirects, such as those used to determine what version of Flash the end user has, the redirect scripts can send a red flag to certain search engine spiders," he stated. Thus, detection pages on a Flash site can be a big problem.
Markel recommends the following solutions for sites designed in Flash:
- Run a spider simulation on your Flash site. The simulator will tell you what's going wrong on your site, what the search engines spiders will or won't see.
- Make some design considerations while building the site, modifying the pages to include the traditional HTML elements that the spiders love that address your keywords. Fix your web sites as you are building them, rather than after.
- Make sure all of your pages are linked to each other.
- Use the Macromedia SDK to batch-version your Flash files to make them HTML version. Remember, the pages will not be optimized. SDK will not add important tags needed for effective optimization.
- You can surround your Flash site with optimized framesets, use layering or Z-layer positioning, or cloak, though the major search engines frown upon some of these tactics. Some tactics are also more of the most expensive solutions.
"One of the big problems with Flash content is that it's very hard to find," stated Tim Mayer. "We have a lot of Flash content in the FAST index, though I've rarely come across a Flash file, myself, in the main search results."
One of the reasons for the paucity of optimized Flash files is that the search engine industry hasn't adopted SDK as the standard, explained Mayer. "The SEOs out there don't know that we're actually going to index their files," he said, "so they don't prepare them in an optimized way (for the SDK). This will change as more search engines adopt this."
Mayer recommended that webmasters restrict the text to what they want indexed. For example, making the "skip intro" a graphic file is better than making it a text link. "Keep what you want as indexable as text," said Mayer. "Make what you don't want as Flash graphics. We will also take out links from Flash sites and follow them."
Multimedia files and Search Engines
Few search engines provide search for audio and video file formats. Currently, AltaVista, FAST Search, and Singingfish support the following multimedia formats: Windows Media (Windows Media Encoder), RealMedia, MP3, and Quicktime.
Multimedia content can provide a better user experience in some industries.
"Multimedia content enables an immersive and emotive user experience beyond text-based content," said Ken Berkun, Founder and VP of Strategy at Singingfish. "A 30 second music clip is a strong advertisement for a CD."
When you create a multimedia file, you have the opportunity to give it metadata. "Give each file a title, copyright stream, author, description, and keywords, said Berkun. "Every single one of these medias have these fields."
"You would be surprised at how little that's done," continued Berkun." The most popular titles in our database is the default - nothing. People spend thousands or even hundreds of thousands of dollars on the production and don't even take the time to add a title."
In addition to having metadata in the multimedia file, Berkun also advised to have actual content on the HTML page that contains the file. Include accurate anchor text and an around multimedia files. And make sure that your entire web site is spiderable.
PDF Documents and Search Engines
Shari Thurow, Marketing Director of Grantastic Designs, a full-service search engine marketing and design firm, addressed PDF files and the search engines. "PDF stands for portable document format, which is a universal file format that preserves fonts, colors, graphic images, and formatting of any source document," she said.
Unlike Flash documents, PDF documents frequently appear within regular search results. Thurow stated that she typically finds PDF-formatted technical manuals, white papers, press kits, and spec sheets in the top 30 search engine results pages in both Google and FAST Search. Because of this, Thurow commonly optimizes PDF documents for various industries.
"The most important thing to remember about optimizing PDF documents is that they must contain keyword-rich text," emphasized Thurow, "which is quite similar to HTML-page optimization." Thurow recommends not formatting PDF documents with software such as Adobe Illustrator and Photoshop, but rather text-based graphics software such as Adobe InDesign or Quark Xpress. "Even Word is fine for formatting PDFs," she said.
Thurow also stressed the importance of using the Robots Exclusion Protocol on some PDF documents. "Search engines have made it clear that they do not want redundant content in their indices," she said. "Even having both a PDF and HTML version of the same content is redundant. For that reason, I place the Robots Exclusion Protocol on the PDF version. HTML format is better for the search engines, anyway, since HTML files tend to be smaller in file size."
PDF documents can be submitted through the normal submit URL forms and the paid inclusion programs at the major search engines.
AlltheWeb - Add URL
Sites using Flash and PDF should submit to AlltheWeb, which will index them for its own site and make them available to its search partners, including Lycos.
Google - Add URL
Google will follow links embedded in Flash content - submit these pages to have Google's crawler extract these links for indexing. Also submit PDF documents to this URL.
Singingfish - Add URL
If Singingfish has not discovered your multimedia files through the natural spidering process, use this URL to submit. Paid inclusion program also available.
Macromedia Flash Software Developer Kit
More information about Macromedia tools that allow search engines to index Flash content.
Learn more about Adobe PDF documents.
Multimedia Search: Singingfish
Search Engine Watch Editor Danny Sullivan takes a look at Singingfish.
Craig Fifield is Product Designer for Microsoft bCentral's Small Business Web site optimization and submission service, Submit It!