Mission Possible: Managing Millions of Keywords

Probably the most complicated aspect of operating and optimizing large websites, is harnessing a basic understanding all of your keywords and their performance. This can relate to either a paid campaign or an organic SEO strategy, and is easier said than done when it comes to managing millions.

Many questions immediately surface — for example, how do you tell the difference between ‘head’ and ‘tail’ keywords? You need to consider these two types of queries from both a performance perspective, and a keyword research perspective. On an e-commerce site, you can do this by separating your page types. Most large e-commerce sites will have three main page types: a category page, a refinement or attribute page and a product page.

For simplicity’s sake, you can almost be sure that the category pages will drive 80% of your head terms. The remaining 20% may relate to products that are listed within the pages and could relate to multiple products. The refinements or attributes pages can be considered a heavy mix between head and top-level tail terms. Such an example would be under a category such as DVD Players, a head term would be Blue Ray or HD DVD, and the tail attributes would be a very specific attribute, such as price range or color. Product brand or model based keywords are almost always within your tail mix.

Now that we have identified three basic page types and the keywords that surround them, how do you monitor them? You could attempt to do this with a spreadsheet, but if you have used Excel on a massive scale, you know that once you exceed 100,000 lines of data it becomes a nightmare to manage. Your best bet is to use a database server; it could be something simple as a Microsoft SQL server. Or something more complicated, but with advanced performance and capacity, like an Oracle server.

Now that you have your list of top keywords in an area that you can query against, you need to populate this table with information from your traffic. You have several options to populate this type of data:

  1. Add a custom pixel to all of your pages and collect data from each session.
  2. Use a standard analytics package that has the ability to export raw data in a form where you can merge it with your target keyword list.

You will need to be able to tie in each page to a key word or key phrase. Thus, a unique identifier per page is required to tie in the data.

At this point, you should be thinking about what data you would like to know about your users. Most sites will do well with knowing where the user came in from, i.e. a natural search engine or a paid search listing. This will be provided within your referrer data and will be very easy to match. You may also want to collect browser data and an IP address so you can keep track of where most of your users are coming from. Tagging your users with a persistent cookie can also be helpful. It will make it easier to see if an algorithmic search user returned by a bookmark or a direct type in. This data is helpful to understand retention, which will be a key metric if you start A/B testing.

You should always collect as much data as possible from your paid search vendors such as cost per individual click and collect the position of where each keyword was listed. This data is invaluable to understand if you can bid up on certain keywords or loss leaders to drive more traffic.

The last area where you need to collect data is on the conversion of each user. You should always collect revenue data and have a pre-calculated gross margin number as well. By doing this, you can see which of your categories are the largest moneymakers and which ones are not performing well.

Now that you have collected data, you will need an appropriate front end system to work with the data and make it viewable for analysis. There are many software applications for such a task — ranging from the simple, yet slow Microsoft Cube server to the Mercedes of the front end packages: Micro Strategies. These end user solutions are not inexpensive, and you should carefully budget for this investment based on how you need to grow. If you under-invest, you can be wasting countless hours waiting for the server to return data. These solutions can send you emails when sigma’s change by a certain percentage. You can also configure graphs that can easily be imported into a power point presentation, which is convenient, since many large companies require decks to explain data rather than a document or a basic email message.

In the end, managing millions of keywords requires extensive planning and organization, but is not an impossible mission.

Related reading

Simple Share Buttons