Going Beyond FTC Paid Inclusion Disclosure Guidelines

In the second part of this series on paid inclusion, I wrote that while Yahoo currently meets US Federal Trade Commission recommendations on disclosing paid inclusion, I wanted to see them go beyond this.

To me, the FTC guidelines are dated. They were drafted in 2002 when paid inclusion was still fairly new, when people had more faith in search engine promises that rank boosting within paid inclusion wouldn't happen, and when the main concern was on fixing the disclosure situation with paid placement.

Today, I'd argue there's a lot less faith that paid inclusion content won't be boosted. Because of this, I see disclosure beyond what the FTC requires as essential for any search engine that wants to use paid inclusion yet retain the trust of its searchers.

Lack Of Faith

Since paid inclusion was introduced, I've written several articles showing how paid inclusion content has managed to skyrocket to the top of the results at times on different search engines.

Explanations such as "blending problems" and "link flux" have been trotted out to account for these situations. I can believe and understand the explanations. Nevertheless, they've also been convenient problems for the paid inclusion programs to have, as they've helped generate revenue.

Toward the end of last year, a BusinessWeek investigation found that some advertisers were convinced paid inclusion helped them rank better, further hurting faith that rank boosting doesn't happen.

No doubt, some advertisers may mistakenly assume they got a rank boost. For example, a person in the BusinessWeek story did paid inclusion with LookSmart and found an immediate gain with MSN.

This wasn't because paid inclusion listings were boosted at MSN. Instead, it was because at that time, MSN placed ALL LookSmart listings (paid inclusion or not) before the crawler-based backup results from Inktomi. In other words, it wasn't paid inclusion URLs that were boosted. It was LookSmart URLs that were boosted, and plenty of these did not come from paid inclusion.

Regardless of advertiser confusion, it's impressions that count. If enough people believe paid inclusion provides a boost, then that slowly gets taken as fact. In addition, there are plenty of sophisticated advertisers who are not confused and fully believe paid inclusion helps them rank better. I've talked with some myself at our regular search conferences.

Show, Don't Tell

The solution to the faith problem is better disclosure. Let the general public easily see what paid inclusion URLs show up in the results. That will let those who care about the issue make their own assessment, rather than having to just trust the word of a search engine.

In particular, I'd like to see "inline" disclosure provided. This means disclosure that happens right within the search results. Somehow, someway, make it easy to spot paid inclusion URLs.

It needn't be blatant. A simple icon next to a paid inclusion URL would be enough to help those who care. In my keynote earlier this year at Search Engine Strategies in New York, I offered up a "blue dot" example. Here's how it might look using Yahoo results for kameleon remote that I generated last week:

0406-bluedots

See the two blue dots after the second and fifth listings? They are flagging that these are paid inclusion URLs. How do I know? There are certain clues that can be found in the listing URLs, as explained next.

URL Deciphering At Yahoo

Let's break down the Yahoo URL code, at least to the degree we can. Look at the first site listed in the screenshot above. See the URL shown, www.remotecentral.com? Now do the search for kameleon remote yourself. You should see that site listed first or very near the top.

Hover your mouse over the URL, and if you use Internet Explorer, you'll see something completely different in the status bar at the bottom of your browser (use the menu option View, then Status Bar, if you don't have this switched on). Alternatively, right click on the URL, select Copy Shortcut, then paste into something like Notepad. You'll see the actual URL looks like this:

http://rds.yahoo.com/S=2766679/K=kameleon+remote/v=2/
SID=e/TID=DFX5_30/l=WS1/R=2/SS=16752724/H=0/*-
http://www.remotecentral.com/

Lots of things going on here, some of which are easy to guess at, some of which aren't:

  • http://rds.yahoo.com is Yahoo sending the click through its own tracking system, to monitor what people are choosing in the results.
  • S=2766679 is unknown and could be unique to me.
  • K=kameleon+remote shows the search terms that were entered.
  • v=2 is unknown to me.
  • SID=e may be for recording a session ID in some way, to track a period of time when someone has searched. I've seen it also show up as SID=w when I logged out of Yahoo, and it stayed that way when I logged back in.
  • TID=DFX5_30 is unknown but possibly tied to individual users. When I use a browser with cookies disabled, I don't see it.
  • l=WS1 indicates that this is the first link associated with the listing. Hover over any category links shown, and you'll see those are l=WS2. Any "More pages from this site" links are l=WS3. Any "Cached" links are l=WS4, and so on.
  • R=2 shows the ranking this URL had in the search results. In other words, it was ranked in spot 2.
  • SS=16752724 appears to be a unique code assigned to this URL because it also happens to appear in the human-compiled Yahoo Directory
  • H=0 is unknown, but I also see H=1 appear for top results of popular queries such as cars (all 20 results are H=1), britney (results 1-8) or movies (results 1-13). This might indicate that these results have been hardcoded to appear in response to particular queries, in the way that MSN editors used to do.

    Why? Look at cars, and you'll see Car Talk (cars.cartalk.com) ranking well and with an H=1 code. OK, so maybe all good sites like Car Talk are simply given some type of behind the scenes ranking flag that the H=1 code indicates. Well, now search for car talk. There's the Car Talk site again, but this time with an H=0 code. That suggests the code is tied to particular queries, not to particular sites.
  • *- appears to be a marker indicating the start of the destination URL
  • http://www.remotecentral.com is the destination URL, which matches the actual URL that Yahoo shows searchers for the listing.

Now look at the breakdown for the fifth URL, which I've shown in the screenshot as being from paid inclusion. I'll bold the significant differences:

  1. http://rds.yahoo.com
  2. S=2766679
  3. K=kameleon+remote
  4. v=2
  5. SID=e
  6. TID=DFX5_30
  7. l=WS1
  8. R=5
  9. H=0
  10. MI=ic
  11. *-
  12. http://rdre1.yahoo.com/click?u=
  13. http://www.shopping.com/xGS-Kameleon_Remote_Controls˜NS-1˜linkin_id-3057993&y=02060DEDFB2E2F26&i=483&c=11250&q=02^SSHPM[L7t˜
    rzszpq?mzrpkz6&e=utf-8&r=4&d=wow-en-us&n=E9NK5H0CR3340HVV&
    s=886&t=&m=40C6E77A&x=01DABDA21111A128 is

MI Codes & Paid Inclusion

Element 10 is a key code, which seems to indicate the paid inclusion status of a URL. Yahoo's never responded to my request for clarification on this, but I have had another contact confirm what some of these mean. Here are the codes, then a breakdown follows:

  • MI=ic
  • MI=sitematch
  • MI=sm
  • MI=ss
  • MI=free
  • MI=other
  • MI=cc

MI=ic: I know this represents a paid inclusion URL, and it probably stands for Index Connect, the former Inktomi CPC-based paid inclusion program. Given this, MI=ic probably represents paid inclusion URLs in the successor to that program, the trusted feed Site Match Exchange program sold by Overture.

MI=sitematch: I can see this showing up occasionally in an analysis of recent data provided by Yahoo Watch (more about this below). SiteMatch almost certainly stands for Site Match, Yahoo's basic paid inclusion program sold through Overture that charges both a flat fee and a CPC component.

MI=sm: I've seen this in the past, but it doesn't appear in the recent Yahoo Watch data. Almost certainly, this was an older code used to flag Site Match listings.

MI=ss: This code seems to correspond to URLs that were in the former Inktomi flat fee-based Search Submit program, which would explain the SS acronym. This WebmasterWorld.com thread has some speculation about this. It appears only a few times in the Yahoo Watch data.

MI=free: This apparently stands for a URL Yahoo has gathered for free through its CAP program.

MI=other: This apparently stands for a URL that also offers an RSS feed, which you can see an example of if you search for new york times or hover over the URL of any other listing that shows a "RSS: View as XML - Add to My Yahoo [Beta”" line as part of its listing.

MI=cc: I've never seen it, nor do I see it showing in the recent Yahoo Watch data, but it was spotted in the past by others as part of that previous WebmasterWorld.com thread I noted.

Yahoo Watch Data

So what's this Yahoo Watch data I'm talking about? It covers about 700 queries made on Yahoo Watch from between late June 12 through early June 16. Yahoo Watch has scanned the queries and automatically tries to identify paid inclusion URLs.

Of these, the MI=ic trusted feed code appeared the most in listings, over 1,500 times. In contrast, the MI=sitematch code only appeared 51 times by my count and the MI=ss code appeared only three times.

What's this mean? Potentially, it shows that using the trusted feed program is a much better way to do well with paid inclusion than with the basic program. However, that's also a dangerous assumption.

It's a relatively small sample I've used this time from Yahoo Watch (I haven't yet explored some additional data going back months). Also, the queries at Yahoo Watch may not be representative of those on Yahoo itself. More important, it may simply be the case that there are far more URLs listed through the trusted feed program than with the basic program. That would have an obvious impact.

Here are some last stats to consider. I said about 700 queries were covered in the data. That's 70,000 listings in all, since Yahoo Watch pulls back the top 100 results for each query processed. What percentage of those were found to be in different categories?

  • 1%: Content through free inclusion program for non-profits and others
  • 2%: Content through paid inclusion programs
  • 97%: Content listed for free through having been crawled

Again, all the qualifications about queries at Yahoo Watch not being representative of Yahoo itself apply. Having said this, most of the listings so far are clearly coming from sites that have been crawled for free.

That 97 percent figure also sounds very similar to statements from Yahoo that "more than 99 percent" of its index contains content that comes from free crawling. A good stat, but it's not that relevant. Many pages in a search engine's index never actually make it into the top search results. The real question is, what percentage of top results that users actually see come from crawling versus paid inclusion. The limited Yahoo Watch data shows that so far, it's not very much.

Finally, you can keep an eye on the counts yourself at Yahoo Watch, to some degree. Search for anything using the search proxy, then scroll down to the 100th result from Yahoo. You'll see a big blue box with lots of numbers. You can see how many times MI=ic appeared since the counting for all searches was last reset, as well as for MI=sitematch.

Average ranks for these URLs are also calculated, though I think this would be more useful if Yahoo Watch only pulled back the default 20 results that Yahoo itself provides. That would produce an average rank for just the first page of results, which is what most people -- whether searchers or advertisers -- care about.

Other Paid Inclusion Indicators

Element 12 indicates another redirection being done internally to Yahoo, this time to track the click for billing purposes. That type of redirection is a hallmark of paid inclusion URLs. The rdre portion apparently stands for when the results come from Yahoo's data center on the East Coast of the United States. Similarly, rdrw portion stands for results served off the West Coast.

Element 13 is the destination URL. But in contrast to our first example, notice how the destination URL is completely different from what Yahoo shows the searcher. This is another hallmark of paid inclusion. It shows that the company purchasing the listing wants to track the URL on their end but wants to hide the unattractive URL from the searcher. Paid inclusion allows for this.

By the way, an OCS element has also been spotted occasionally in the past, as this thread from WebmasterWorld.com discusses. No one seems to have figured out what this means, however.

Problems With Deciphering

Certainly anyone could just go through and try to do a manual check using URL clues themselves. The Yahoo Watch search proxy makes this even easier, spotting suspected paid inclusion URL strings and putting a big $ symbols in front of them.

There remain problems with this, however. Daniel Brandt, who runs the Yahoo Watch search proxy, and I had a number of back and forth emails earlier this year as we tried to determine what was paid based on URLs strings. I've seen similar puzzlement on search forums, such as here on WebmasterWorld.com.

If it's hard enough for those who live and breathe search to make these determinations, what hope is there for the average consumer? In addition, the exact strings used that identify paid inclusion URLs are subject to change and potentially could be entirely hidden.

No one should have to decipher such things. A few blue dots or something similar would easily let someone see the impact of paid inclusion on results. If you consistently see lots of dots, you might feel paid inclusion is getting weighted too heavily. If you see only a few or some scattered ones, you might have more faith that no boosting is happening.

Yahoo Response

So how about it, Yahoo? Can we have a few blue dots?

"Obviously, we've had this debate internally, and there are pros and cons," said Tim Cadogan, vice president of search for Yahoo, when I spoke with him about the issue earlier this year.

One chief reason Yahoo says for why it is so far not doing inline disclosure is a fear of influencing searchers somehow against these results.

"The fact that someone paid for a value added service doesn't have anything to do with how they are ranked. It's not a meaningful part of the equation. Therefore, sharing that information with the user doesn't give a lot of insight," Cadogan said. "If we put a disclosure on each result, you can then argue about the relative ranking for eternity."

I agree entirely. There are some people who will simply view any paid inclusion URL as somehow tainted. If even one shows up, some will argue that it may have been given an unfair boost.

Unfortunately, that's also a consequence of mixing the paid inclusion URLs with the free listings. There's no way I see around it. But at least with disclosure, people can't argue that anything is being hidden or hard to discover.

Similar Yahoo Directory Labeling

Interestingly, in the Yahoo Directory, somewhat similar labeling doesn't appear to have been a problem. In 2001, Yahoo opened a Sponsored Sites program that let people to jump to the top of pages showing its human-compiled directory listings. Only those who were already editorially approved to be in a particular category of the directory could ALSO do the ad program for the category page.

In addition to showing in a special sponsored area, those in the program also have the word [SPONSOR” put next to their editorial listing. There's no disclosure reason to do this, since sites are ONLY added to the editorial area if editorially approved. You can see how this happens for sites in the Jobs category at Yahoo.

At the time, I wondered if some advertisers would dislike having the sponsor tag associated with their editorial listings, for fear of making them seem somehow tainted. For its part, Yahoo editor-in-chief Srinija Srinivasan said she thought the addition would help both users and sponsors:

"I hope that being a Sponsored Site says something about your site and being a business that wants to be around that differentiates you in a good way."

Since then, I can't recall any complaints from advertisers or searchers about the listing enhancements. This suggests that perhaps similar enhancements or disclosure of paid inclusion URLs in Yahoo's web results might not cause them to suffer from a taint factor.

Too Much Info? Make It An Option!

Yahoo also argues that inline disclosure might be too much information, something especially an issue given that the company is currently beta testing some cleaner presentations of its results.

"The page is already complex, and we don't want to overload users. It's not clear that it would helps the vast majority of users," Cadogan said.

A solution I suggested was to make disclosure an option. For users who care, why not let them set their preferences to have blue dots or some type of equivalent inline disclosure be displayed.

To me, it's a perfect solution to balancing too much information with providing the information some may care about. It's certainly better than forcing people to do URL deciphering or to depend on third party sites to get disclosure. Cadogan said the idea was something that would be considered.

It's also important to note that the disclosure could be done creatively. Yahoo's content acquisition program is designed to bring in content from all over the web that might otherwise be missed. As I've said earlier in this series, why not be proud of this by perhaps a showing a special CAP logo or a "Quality Reviewed" mark to subtly flag any CAP content. That makes it a feature for users, though I'd still argue that paid CAP content would need to be distinguished from unpaid.

Segregate Paid Inclusion?

What about the idea of completely segregating paid inclusion results from those listed for free? Yahoo sees disadvantages to this.

"Introducing another section would be quite confusing, I think," Cadogan said. "In the past, I can't think the directory and web layers were optimal. I don't think we'll want to go back to something like that."

Cadogan is referring to how Yahoo used to first show results from its human powered directory, then show results only after this that came from crawling the web. It stopped doing this late 2002.

I agree that adding a separate layer isn't optimal. It is certainly in Yahoo's financial interest to have paid inclusion content mixed in with the free content. However, pulling it out could also arguably hurt relevancy and in fact penalize some sites unfairly.

By the way, you might be wondering if the FTC has any new plans regarding paid inclusion disclosure given all the recent news. Last year, a News.com article found the FTC wanting the search engines to help consumers by providing as much disclosure as possible, but there was no suggestion of new guidelines.

I followed up with an FTC contact recently and was told the situation remains the same: there are no new actions or disclosure guidelines to speak of relating to paid inclusion.

Mixing Feeds Is The Future

Cadogan sees the ability to mix data from various feeds together into a single list as essential to search success. Right now, Yahoo is working to blend free feed content, paid feed content and crawler content. Additional data might also be mixed, such as local content, product listings, news headlines and so on.

The challenge in mixing multiple databases is that it's hard to easily rank everything according to the same rules. Feed content may lack link analysis data or page formatting information that search engines use to rank ordinary HTML pages. How do you come up with a system that blends two completely different data types together yet somehow ranks everything properly?

For Ask Jeeves, it was a step too far. A day after Yahoo unveiled its paid inclusion program in March, Ask Jeeves made a timely announcement that it was dropping its own feed-based paid inclusion. The main reason was that Ask Jeeves found it too difficult to mix the different data types fairly.

It's a challenge for Yahoo, as well, but one the company want to keep working on.

"We agree, feeds aren't simple. That's why we are making it our investment. We can understand why people might find it difficult. But if we don't take on that challenge, we're going to limit what search engines can do," Cadogan said.

In the next part of this series, we'll look more closely on how Yahoo's CAP program works to take in feed content from paid and non-paid partners. We'll also look at the efforts Google is doing to reach the same goals but without involving payment.

Want to discuss issues of disclosure raised in this article? You can do so in this special thread in our Search Engine Watch Forums and even vote on some ideas. Stop by and visit here: How (or Should) Paid Inclusion Be Disclosed?

In the second part of this series, I also talked about ideas of helping webmasters get support for their free listings outside of offering paid inclusion. I'll be revisiting this issue more in the next installment of this series. You can suggest services you'd like to see in this forum thread: What Organic Search Support Services Would You Want?

Finally, we have a third thread inviting site owners using Yahoo's paid inclusion programs to vote on and share what impact it's had on their free listings, if any. You'll find it here: Will Yahoo Search keep its promise?