Jul
26

July 26, 2006

Missing

Here's a search story with a happy ending.

missing.jpg

Gladiator Search

A friend of mine, Mike Rouch, implemented an interesting search solution for RealAge.com. The RealAge library consists of articles and other reference material on a wide variety of health and medical topics that have been selected by their researchers as relevant and reliable. The search engine lets you treat individual search hits like Roman gladiators. You give the thumbs-up to results that were helpful and the thumbs-down to results that were not. Over time, the system uses your preferences to adjust your results.

Real Age Search screenshot

I think this is an interesting approach, and will be interested to see how well it works.

Jul
24

July 24, 2006

SEO Pricing

As an enterprise search guy, I'm not so interested in SEO for marketing as much as for helping employees find what they need and helping the company make their information investments worthwhile. However, I found SEOmoz's advice on pricing SEO work very interesting, particularly for the helpful lists of deliverables.

Putting the price tag right out there is great too, but leaves me wondering who has that kind of money?!

Jul
20

July 20, 2006

From the willful ignorance department...

Peggy Noonan, meet Al Gore.

Noonan said this today:

During the past week's heat wave--it hit 100 degrees in New York City Monday--I got thinking, again, of how sad and frustrating it is that the world's greatest scientists cannot gather, discuss the question of global warming, pore over all the data from every angle, study meteorological patterns and temperature histories, and come to a believable conclusion on these questions: Is global warming real or not?

...

If global warming is real, and if it is new, and if it is caused not by nature and her cycles but man and his rapacity, and if it in fact endangers mankind, scientists will probably one day blame The People for doing nothing.

But I think The People will have a greater claim to blame the scientists, for refusing to be honest, for operating in cliques and holding to ideologies. For failing to be trustworthy.

Ummm, yeah, Peggy, the scientists held that meeting, and ummm, they came to a conclusion. The reason you didn't catch what they were saying is because you were too busy demonizing them, remember? From your article on the 2004 tsunami, for example:

What to say of those who've latched on to the tragedy to promote their political agendas, from the U.N. official who raced to call the U.S. "stingy," to the global-warming crowd, to administration critics who jumped at the chance to call the president insensitive because he was vacationing in Texas and didn't voice his sympathy quickly enough? Such people are slyly asserting their own, higher sensitivity and getting credit for it, which is odd because what they're actually doing is using dead people to make cheap points.

I'll bet you already knew, Peggy, that tsunamis don't actually have anything to do with climate! But that wouldn't stop you from making your own cheap points, would it? No, if The People fail to act in the face of a clear warning from climate scientists, the paid political obfuscators -- like you -- who duped them, should bear much of the blame.

Jul
19

July 19, 2006

Microsoft is getting all Ballmerish about enterprise search.

When I read that Microsoft is serious about competing with Google in the enterprise search arena, I couldn't help but snicker, knowing that their enterprise search solution is SharePoint.

Well, it prompted me to do some reading, and it looks like they've been busy. This could get interesting.

SharePoint portals seem to be wildly popular for departmental collaboration, but from an intranet search engine administrator's perspective they can be a bit of a nightmare. They have some crawling gotchas that you need to be aware of, and they also seem to encourage the posting of lots of large Microsoft Office documents.

Norvig the Heretic

Peter Norvig, whose challenge to Tim Berners-Lee regarding the Semantic Web was reported by CNET News only because he works for Google, listed three important problems for the meta-utopian dream.

"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. The second problem is competition. Some commercial providers say, 'I'm the leader. Why should I standardize?' The third problem is one of deception. We deal every day with people who try to rank higher in the results and then try to sell someone Viagra when that's not what they are looking for. With less human oversight with the Semantic Web, we are worried about it being easier to be deceptive," Norvig said.

While we're somewhat shielded from that last problem in the enterprise search world, the first two, incompetence and lazyness, are enough to keep us busy. If you have multiple authors to work with, prepare to spend significant resources correcting all the different ways they devise to screw up structured metadata.

One of my favorites is the "template" method. You may never see the original document it comes from, but you'll suddenly get lots of Word documents that all have the same title (unrelated to the visible content of the document), or lots of HTML pages with the same meta tag block. You either have to fix these or exclude them from your index, or you ruin the good times for everyone.

The most effective solution I've seen to this problem so far is better authoring tools, combined with as much automation (adding metadata without pestering the user for it) as possible. While entity extraction is improving and can add some value, it would sure be nice if it were implemented like spell-checking software, in cooperation with the author. I've only seen it used too far down stream.

I can relate to the way Norvig began his comments. "What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web."

I get the same crap about similar stuff.

Them: "We're authoring everything in such-and-such XML schema."

Me: "Great, here's the best way to transform it to standards-compliant HTML and publish it so it's accessible and useful to people and search engines."

Them: "What do you have against XML?"

Me: "garrrh!"