Psychic Search: A quick primer on search suggestions
I recently spoke with a coworker who was skeptical that the search suggestion feature we'd implemented in our company intranet could be effective. "I could be searching for anything—it couldn't possibly know what I'm going to type in." Smiling smugly, I asked him to think of something he might want to find on the company intranet, something he thought he would really need.
Glancing around his desk, he pointed to an empty box and said, "I need to order new business cards." I asked him to start typing "business cards" into the search box one letter at a time. He typed in "b" and up popped the list, and the first search it suggested was "business cards." Still unconvinced, my coworker said "Suppose I lost the key to my desk." He typed in "k" and the search engine immediately suggested "key request." "I want to know which holidays we have off this year;" the letter "h" immediately brought up "holiday schedule 2010."
"Okay, what if I want to enroll in a yoga class?" I thought he had me, but the first item listed under "y" was in fact "yoga." He played it cool, but the expression on his face told me he was puzzled, taken aback, and possibly just a bit scared.
More Than Just a Magic Trick
That was a fun moment, but suggest functions shouldn't be mistaken for parlor tricks. Anyone who has looked at search logs closely has probably surmised that user skill is a pervasive problem. It's very common for users to submit poorly phrased queries—often just a single word, or an imprecisely phrased idea—that bring back tons of results that have nothing to do with what the user really wanted to find. It's also common that users give up instead of trying to compose their search a second time.
Suggestions help resolve that problem. By providing users with better phrasings, they make it much easier for users to be successful on the first try. In practice, it solves the problem of turning an abstract idea into concrete words so concisely and so effectively that it's quickly becoming accepted as an essential feature of any search engine.
It's getting to the point where sites that don't have a suggest function implemented yet are starting to look a little behind the times. The good news is that they're not especially difficult to implement, and with the right planning they can be wildly successful.
How to Read a User's Mind
Predicting what a user wants to find is actually pretty easy, because probability is on your side. If you take a list of the most commonly submitted searches and chart them by their popularity, you get a shape that looks a lot like this:
This means that there are a very small number of search phrases that a large number of people are submitting, and there are also a very large number of search phrases that only a few people are submitting. The really important lesson here is that without knowing anything about a random user, it's possible to know something about what they're likely to search for. If they provide even just a little bit of additional information—such as a few characters in the search box—the odds narrow so dramatically that it's overwhelmingly likely that the search engine can accurately guess what they're trying to find.
To get this effect to really work, the function needs to return suggestions matching the character string the user has entered, sorted by popularity. This is almost always better than sorting the suggestions in some other way (e.g., alphabetically) because it stacks the deck in your favor. The original suggest function that Wikipedia implemented made the mistake of returning search strings sorted alphabetically. The result was that if the user typed in "abraham", it returned:
I have no idea who Abraham "Chick" Kazen is, and I'm betting that no one at Wikipedia knows either. Mr. Kazen was first in the list because the code orders quotation marks before letters in an alphabetical sort, even though it's extremely unlikely that a user would actually submit that search. Happily, Wikipedia since fixed this and now typing just two letters returns what we all instinctively feel is the right answer:
There are a few cases where you might instead order the suggestion list alphabetically—for example, with a corporate directory. But most of the time it's the wrong way to go.
Major Types of Suggest Functions
There are three principal ways suggest functions can work. Which one should be used depends upon the nature of the information that users are searching.
An exploratory function works best when many of the things users are trying to find have no official name. In these cases, people enter keywords that approximate the idea they have in their heads. For example, users of a college website who want to find a map of the buildings might search for:
- campus map
- building locations
- directions to buildings
- places on campus
- finding your way around
Given the enormous number of other things people could be searching for on a college website, there is an infinity of possible phrases.
It's impossible to work with a list of potential searches that's infinitely long, so it has to be cut off somewhere. Fortunately, the magic of probability makes it possible to cut the list fairly short and still provide the vast majority of users with good suggestions. Even for a site that sees more than a million unique searches in a year, often just the first few thousand from the list of the most common searches will suffice. This can be small enough to store the complete list on the client side, so there's absolutely no lag as the user types in the search.
The suggestion list needs to be scrubbed to remove multiple word forms (e.g., singular or plural), misspellings, closely related phrasings, and other common problems. But the shortness of the suggestion list makes this fairly easy.
For other searches, everything the user might try to find has a specific name. This is the case, for example, with websites that are principally product catalogs, such as Apple or Amazon. Other examples of known items include movie titles, airports, and major world cities. In these contexts, suggest functions can help people remember what something is called, eliminate misspellings, and help people figure out what searches will actually give them useful results.
For such known-item searches, truncated lists don't work because the absence of an item implies that it's not available. Since the list needs to be comprehensive, it can be very long indeed (just think of every product that Amazon sells), so it often can't all be stored on the client side. Instead, it would need to be retrieved from the server in real-time. This may introduce some lag, but it can be made more efficient by limiting the number of strings shown at any one time. Ten has become the industry standard.
These are searches that the user has submitted in the past. People are likely to search for something that they've looked for in the past, like a particular destination in a mapping application. A system can make itself much more personally relevant when it retains a memory of the things that a user has done before, and then makes it easier for the user to do them again.
It still makes the most sense to sort the list of historical searches first by the number of times the user has submitted them. But when two searches have been submitted the same number of times, consider breaking the tie based on recentness.
Some clever designers have seized the best of all worlds by creating hybrid approaches that first display searches that the user has submitted in the past, followed by a list of the most popular searches submitted by other people.
Designing by the Numbers
Suggest functions exploit quantitative information in a way that has only become possible through the enormous volume of usage of the modern Web. They're a creative application of data that's just sitting out there, waiting for innovative minds to find ways to make it useful. There's a real beauty and elegance to this kind of a strategy, not to mention the fun of knowing what your users are going to say before they even say it.