Clustering Search Results
I've done some trial searching both on the Internet, using Vivisimo's Clusty search engine, and in an intranet environment, and frankly I don't get it. That said, I've heard from a small number of people who say that they use Vivisimo all the time and find it really valuable for getting the "lay of the land" on a particular topic.
A couple statements in the interview bugged me. First, there was the obligatory Google-is-good-and-all-but-we're-better triangulation.
Google's overwhelmingly dominant search engine ranks a Web page based largely on how many other Web pages are linked to it, much as a scientist is sometimes ranked by how often his research is cited by other scientists.
Having a site with PageRank™ approaching zero, I can tell you that actual query relevance is still very much at play, or my pages wouldn't show up for any search.
Then there's the Serendipity Demo, which usually works to get people excited about a discovery tool:
But Dr. Valdes-Perez said that by clustering Web pages into themes, Vivisimo can sometimes reveal connections that people wouldn't have seen otherwise.
To demonstrate that, he recently used the search terms "Osama bin Laden" and "Madonna" for a group in Washington D.C.
One of the themes that was generated was "niece," he said, and when he opened that folder, it revealed Web sites about a niece of the terrorist "who actually hates him but has aspirations to be a pop singer like Madonna," Dr. Valdes-Perez said.
Interesting. So bin Laden has a niece ... who has a life ... and doesn't like him. And that helps us how?
Part of the challenge in enterprise search is guiding people to the right tool for the question at hand. What are the types of questions that are best answered by a clustering engine (making it worth the investment)? What are the requirements for a successful implementation (size of collection, type of information, etc.)? I'm still trying to figure this one out.