Michael Kanellos writes about the Marvel project at IBM that’s building a search engine to retrieve audio and video material. Content will be automatically scanned, parsed and indexed for concepts. In other words, automatically adding desciptive metadata to each video file.

He writes that a full-fledged Marvel might be some 3-5 years away.

Subject metadata, what librarians often refer to as cataloging, is often searchable in many fee-based (Factiva, LexisNexis, Dialog, etc.) full text databases, library catalogs, along with specialized open web databases. In some cases the metadata is added by a human indexer who reviews each document and in other cases created automatically by scanning and parsing the text. Terms are selected from an agreed up list or thesaurus. Here’s the thesaurus of approved terms that’s used to index material in a well-known education database.

The New York Times Historical Database from ProQuest that indexes every page (full text and full image) of the newspaper (ads and editorial) from Vol. 1, No. 1 through 2001 was created using a combination of these techniques. The ability to search by subject indexing along with the option to conduct a full text makes for more precise searching that what you encounter web running a search on a general purpose web engine.

Btw, ProQuest offers searchable “historical” databases from other newsapers including The Wall Street Journal and Washington Post.

Kanellos writes that for the “most part” this multimedia material and the method IBM is developing with Marvel, “can’t easily be retrieved today on the Net.” Yes, for the most part this is a correct statement. However, it doesn’t mean that several very interesting search tools (both free and fee) aren’t already online. These toools do not offer subject searching (like Marvel) but they do allow you to search the full text from various TV and radio shows. If you’re interested in trying a few them out take a look at this post from about a week ago where I’ve linked to several resources.

Chris also just touched on multimedia search in this post about Comcast.

