Over on the Asilomar email list I started exploring a method of adding search to a site that has grown to need it. The point is to get a pretty good idea of what should be done to arrive at good results in a logical way, rather than just install a search engine and improve through trial and error. With the help of Jonothan and Iwan here’s seeds of a method I’ve collected so far:
Let’s assume we’re working with conventional search engine technology, e.g. mostly relevancy-based.
We have to begin either at the point the content is going into an index or the user’s goals, and because I’m user-centered boy I’ll start with the users. We can learn about their goals, things like do they want precision or recall, what are their most popular search terms, how many queries do users submit in a session, do users repeat queries over multiple sessions, how do the queries change over time, and so on.
Using the user goals we can construct a strawman user interface.
Next we look at the content. We can ask what format is it in, how much is there, how will the volume change over time, for dynamic content will the search index rebuild anew or cumulatively grow, how clean is it (ROT), how often does it change, and so on.
With the strawman UI and a rough idea of what the index looks like we can simulate some search tasks and the result sets. We might then consider how metadata and tweaks to the ranking algorithm could help.
At some point we have to install the search engine, index the content, and try some queries. Then we might use a systematic approach to tweaking results:
Too many results? Try cleaning or otherwise reducing content, changing weighting, change the ranking algorithm, or use a more restrictive search form UI (e.g. more fields that must be selected)
Too few results? Try adding synonyms or a less restrictive search form UI. This might also be a sign that you don’t in fact need a search engine.
Is accuracy bad? Change the algorithm, change the weighting, add metadata, use best bets, improve the search form UI.
Users can’t fulfill goals even if results are good? Try improving the results UI.
I’m sure I’m missing stuff, but it’s a start.